20 December 2018

Scheme Programming Language

With the beginning of this year I "fell" into Scheme. I blame SoCraTes BE 2015 for that. (SoCraTes is a group of international unconferences focusing on Software Craft and Testing. They are about sustainable creation of useful software in a responsible way and usually run as self-organised Open Space.) During that SoCraTes I happened to enter a session where a small group worked on some Scheme code. The Scheme programming language is one of the two main dialects of Lisp and I recognise Lisp when I see it. The session was so much fun that it continued for the whole of the second day. We used available slots in the schedule of the open space, and in cases there were none, we continued coding Scheme in the hallway, the kitchen and other places. (Now such things only happen on SoCraTes and I encourage you to attend one if possible.)

SICP Exhibit, MIT Museum Includes a battered copy of SICPThis SoCraTes experience broke the ice with Scheme, i.e. it gave me enough exposure to a new language and enabled me to continue on my own. For the next two years I kept playing with Scheme, experimenting with code katas, e.g. the Bank-OCR Kata, porting well known exercises like Parrot, Gilded Rose and Game Of Life and I even solved some Project Euler problems using Scheme.

SICP
Earlier this year I started reading SICP. SICP stands for Structure and Interpretation of Computer Programs, a computer science text originally used in introductory courses taught at the MIT in the eighties. It had been on my reading list since 2010, when Uncle Bob recommended it in one of his Clean Coder videos. SICP had been recommended again and again and I was very happy to find the time and energy to read it.

SICP is one of the greatest books - if not the greatest book - I have ever read. It is fast paced and written in the style of all scientific books - too dense and hard to follow. Reading it is hard work. I used to like such books when I was studying applied mathematics many years ago - actually I was eating such books for breakfast ;-) The high density gives me new information fast. As I like to read books from cover to cover, I do not mind that the information is not accessible directly. Yes it is challenging to read. And it is very insightful. I appreciate most the skill of the authors to define abstractions. I am stunned again and again how they name their composed methods.

The Scheme Programming Language
SICP is strongly related to Scheme as it is one of the bibles of the Lisp/Scheme world. SICP uses Scheme for its code samples and exercises as Gerald Jay Sussman, one author of SICP, is also one of the creators of Scheme. Scheme is a version of Lisp and as such a divine language per se. It is a great language and unlike Lisp it follows a minimalist design philosophy. I really like its simplicity: There are no reserved words, not much syntax, just atoms, pairs and lists.
(define atom?
  (lambda (x)
    (and (not (pair? x))
         (not (null? x)))))
Atoms are numbers, strings, booleans, characters and symbols. Symbols i.e. names usually represent other atoms or functions. Maybe you do not like all the extra parenthesis, but it is compact and uniform. Because of the uniformity of the S-expression, it is easy to create parsers, embedded languages and even full featured Schemes, e.g. in Python, Haskell or any language. (To be fair, the last link contains implementations of Lisp not Scheme in 73 languages.)

History
As I said before Scheme is based on Lisp. The Lisp programming language was created 60 years ago by John McCarthy. And Lisp is very special, it seems to transcend the utilitarian criteria used to judge other languages. In 1975 Scheme was created by Gerald Jay Sussman and Guy Lewis Steele. Watch this great talk by Guy Steele about issues associated with designing a programming language.

Ancient HistoryStandards
One thing which confused me a lot when I started playing with Scheme were its versions. The Scheme language is standardised by IEEE and the de facto standard is called the Revised n Report on the Algorithmic Language Scheme (RnRS). The most widely implemented standard is R5RS from 1998. I use that mainly because the first Scheme implementation I downloaded was an R5RS compliant one. R5RS files use the .scm filename extension. The newer R6RS standard, ratified in 2007, is controversial because it departs from the minimalist philosophy. It introduces modularity which is breaking everything. Being used to the strong compatibility of Java or at least the major compatibility of Python I did not expect versions to be incompatible at all. R6RS files use the .ss, .sls and .sps filename extensions. The current standard is R7RS from 2013, a smaller version, defining a subset of the large R6RS version retaining the minimalism or earlier versions.

(Almost) Complete List of Special Forms and Functions
I did not read any tutorials or books on the Scheme language - I was exploring the language on my own. Still I wanted to know which library functions I could use. Similarly I used to list all classes available in each Java release. I searched for an exhaustive list of Scheme functions. There are many Schemes and I did not want to depend on any implementation specific extensions. In the end I decided to scrape the standards.

Scheme makes a difference between forms and functions. Function calls evaluate all arguments before control is passed to the function body. If some expressions should not be evaluated - as in if or switch (which is cond in Scheme) - a special form is needed. Special forms evaluate their arguments lazily. For more information see Why is cond a special form in Scheme.

R5RS Forms and Functions
I scraped the list of special forms and built-in functions from the R5RS HTML documents. The list is incomplete since I had to rely on formatting of the document. It misses define and the abbreviations like ' and @, but looks pretty good (to me) otherwise. Browsing the 193 forms and functions gives an idea of built in data types, i.e.
boolean
char
complex (number)
exact (number)
inexact (number)
list
number
pair
procedure
rational (number)
string
symbol
vector
as well as possible conversions between them available in any R5RS compliant Scheme.
char->integer
exact->inexact
inexact->exact
integer->char
list->string
list->vector
number->string
string->list
string->number
string->symbol
symbol->string
vector->list
The Leeds LibraryR6RS Forms and Functions
As mentioned earlier, R6RS is much larger than R5RS. It defines 630 forms and functions, most of them in the (new) libraries. The standard separates built-in forms and functions from the ones defined in libraries. (This is still very small, considering that the Java 8 core library contains 6000 public classes.) My list of forms and functions contains all, built-in and library alike. It looks complete, but I am not sure, I did not work with R6RS. From a quick glance R6RS adds bitwise and floating point operations as well as different types of byte vectors, enum-set and hashtable. When looking at my list now, I see that I should have scraped the names of the library modules as well. (I added that to my task list for 2019 ;-)

Scheme Requests for Implementation
Next to the RnRS standards, there are the SRFIs, a collection of concrete proposals and reference implementations. People implementing Scheme chose to implement SRFIs or not. Most Schemes support some of then, see Arthur A. Gleckler's report of SRFI support by Scheme implementations in 2018. Some Schemes have package managers which allow to download and instal packages. Usually some of those are SRFI implementations. I guess I also need to scrape these.

Conclusion
Scheme (Lisp) is so much fun, especially when you do not have to deliver anything. Most people I meet got in touch with Lisp during university but never followed up. Some are even afraid of it. Since 2016 I run Scheme coding sessions at every unconference or SoCraTes event I attend. I invite people to mob with me and I do all the typing. We always have a good time, and - after some warm up with Scheme - people really like it. They all enjoy the opportunity to dive into Lisp again. Will you?

11 November 2018

Widespread Architectural Changes using IDEs

This is the second part of my list of tools to automate Widespread Architectural Changes. Now Widespread Architectural Change is a category of architectural refactoring which do not change the overall structure but need to be applied in many places consistently across a whole code base. For example, consider changing or unifying coding conventions or fixing violations. The main challenge of these changes is the high number of occurrences.

The goal of this article is to introduce you to ways to automate such changes. I have been using some of these techniques because as Code Cop I value code consistency, still this is a raw list. I have only used a few of the mentioned tools and need to explore many in more detail.

In the first part I listed some basic options, e.g. how to support manual changes with fast navigation, search and replace across the whole file system, use scripted search and replace using regular expressions, Macros and finally JetBrains' Structural Search and Replace. This time I focus on options available for modern IDEs. Tools like Eclipse or IntelliJ use an internal representation of the code, the Abstract Syntax Tree (AST). Structural Search and Replace works the AST and therefore belongs into this group as well.

Rescripter (Eclipse)
Modern IDEs support many Refactoring steps - why not use the existing tools to make our job easier? One of such tools is Rescripter, built for making large-scale changes that you can describe easily but are laborious to do by hand. Rescripter is an Eclipse plugin which runs JavaScript in the context of the Eclipse JDT. David Green added some helper objects to make searching, parsing and modifying Java code easier and wrapped a thin JavaScript layer around it. Rescripter scripts are working the AST.

Here are two examples from Rescripter's documentation: Finding a method named getName inside the class Person:
var type = Find.typeByName("Person");
var method = Find.methodByName(type, "getName").getElementName();
and adding a method getJobTitle to that type:
var edit = new SourceChange(type.getCompilationUnit());
edit.addEdit(ChangeType.addMethod(type, "\n\tpublic String getJobTitle() {\n" +
                                        "\t\treturn this.jobTitle;\n" +
                                        "\t}"));
edit.apply();
Rescripter is an old project and has not been updated since 2011. While it has some demo code and documentation, it is a bit raw. I guess it is not compatible with newer versions of Eclipse. Still - like Structural Search and Replace - it is very powerful. If you have a large code base and a repetitive change that can be expressed in terms of the AST, the high effort to get into the tool and create a script will pay off. (A useful plugin to help you working the AST is the AST View. The AST View is part of the JDT but not installed out of the box. It is visualising the AST of a Java file open in the editor.)

JackpotJackpot (NetBeans)
Another, very interesting tool in this area is the Jackpot DSL for NetBeans used in the IDE actions Inspect and Transform aka Inspect and Refactor. Jackpot is a NetBeans IDE module for modelling, querying and transforming Java source files. In the Jackpot context, a program transformation is a script or Java class which queries sources in Java projects for patterns, changes them, and then writes these changes back to the original source files. [...] Jackpot transformations tend to be applied globally, such as at a project level or across several projects. Win! Unfortunately there is very little information about it. It took me hours just to find a little bit of information about it:

Dustin Marx describes how to create NetBeans 7.1 custom hints. He agrees that the most difficult aspect of using NetBeans's custom hints is finding documentation on how to use them. The best sources currently available appear to be the NetBeans 7.1 Release Notes, several Wielenga posts (Custom Declarative Hints in NetBeans IDE 7.1, Oh No Vector!, Oh No @Override!), and Jan Lahoda's jackpot30 Rules Language (covers the rules language syntax used by the custom inspections/hints). The Refactoring with Inspect and Transform in the NetBeans IDE Java Editor tutorial also includes a section on managing custom hints. All listed documentation is from 2011/2012. The most recent one I found is a short Jackpot demo from JavaOne Brazil 2016.

NetBeans hints are warnings that have a quick fix. For example the hint "Can Use Diamond" finds places where the diamond operator of Java 7 can be used instead of explicit type parameters. When the offered action is taken, the code is migrated. In the Inspect and Transform dialogue, the inspections can be managed. Custom hints are stored in .hint files. For example, Dustin Marx' hint to remove extraneous calls to System.currentTimeMillis() from new java.util.Date constructor is written as
<!description="Unnecessary use of System.currentTimeMillis on new Date">

new java.util.Date(java.lang.System.currentTimeMillis())
=> new java.util.Date()
;;
The Java Declarative Hints Format allows matching on variables, modifiers and statements. Fixes can be applied conditionally too.

I do not know if Jackpot hints can be applied across a whole code base at once. As they are inspections, I expect them to be displayed for the whole project - making them fast navigation markers at least. Anyway this is very exciting. It is so exciting that some people wanted to port it to Eclipse (but they never did).

Using Specific Refactoring Plugins
There are a few specific plugins that combine various Refactoring. Eclipse's Source Clean Up Action deserves a honorary mention: It fixes basic warnings on a whole code base, but is very limited. An interesting plugin for Eclipse is Autorefactor which aims to fix language/API usage to deliver smaller, more maintainable and more expressive code bases. Spartan Refactoring is another Eclipse plugin that performs automatic refactoring of Java source code, making it shorter, more idiomatic and more readable.

All these plugins change a predefined, limited set of code patterns, mainly focusing on technical debt. Maybe someone implemented a refactoring plugin for part of the widespread change you need to perform. Search the market places first.

Using Refactoring APIs of IDEs
A possible way is to write your own refactoring plugin. Maybe start with code of a refactoring plugin listed above. Both Eclipse and IntelliJ IDEA offer APIs to manipulating Java code, i.e. the JDT and PSI APIs. I have not done that because it seems too much effort for one time migrations and widespread changes. And reusing the available refactoring tools might raise some problems like waiting for user input which is problematic.

Using (APIs of) Refactoring Browsers
The Refactoring Browser was the first tool that automated Refactoring for Smalltalk. It set the standard for all modern Refactoring tools. Today we have Refactoring Browsers for many languages, e.g. CScout - the C Refactoring Browser, Ruby Refactoring Browser for Emacs or PHP Refactoring Browser which is controlled via the command-line and has plugins for Vim and Emacs. Stand-alone Refactoring tools should be easier to use and script than full blown IDEs, especially if they were designed for different plugins. The idea would be to create code which controls Refactoring Browsers to apply certain changes - again and again.

RopePython Refactoring Libraries
While searching for Refactoring Browsers I just learnt that there are at least three stand-alone Python Refactoring libraries: Rope comes with Vim and Emacs plugins and is also used in VS Code. The second is Bicycle Repair Man. Further there is Bowler which supposedly is better suited for use from the command-line and encourages scripting. All these libraries are rich in features and used by Vim users. (Yeah, Vim is still going strong.)

Rope (Python)
For example, Rope can be used as a library, which allows custom refactoring steps. Even more, it offers something similar to Jackpot's hints, Restructurings. For example, we split a method f(self, p1, p2) of a class mod.A into f1(self, p1) and f2(self, p2). The following Restructuring updates all call sites:
pattern: ${inst}.f(${p1}, ${p2})
goal:
 ${inst}.f1(${p1})
 ${inst}.f2(${p2})

args:
 inst: type=mod.A
The code to perform the Restructuring using Rope as a library is
from rope.base.project import Project
from rope.refactor import restructure

project = Project('.')

pattern = '${inst}.f(${p1}, ${p2})'
goal = '...'
args = '...'

restructuring = restructure.Restructure(project, pattern, goal, args)

project.do(restructuring.get_changes())
I never used that but it looks cool. The promise of scripted refactoring makes me excited. Another item to add to my to-research list. ;-) And there are more tools on my list, which will have to wait for part 3.

6 November 2018

What is Ethical Coding and how to get involved

Three years ago I started thinking about work with meaning. Since then I had great discussions with peer Software Professionals about our social responsibility and ethics in software development. Today I am happy to present a guest post written by Kayleigh Alexandra, a content writer for Micro Startups, a site dedicated to spreading the word about startups. I like seeing a focus on ethical approaches in startups, i.e. Social entrepreneurship. Kayleigh Alexandra shares her ideas about ethical coding in this introductory article. For her latest startup news and inspiring stories, visit the Micro Startups blog or follow along on Twitter.

Ball Round Binary (Image credit: Pixabay)Technology provides amazing possibilities, but its power also makes it a threat. Almost everything we do relies on complex interactions of computer systems, with those systems running on code from countless distinct sources - you cannot trust one chunk of code to meet the same standard as another.

Unknown quality is not the only concern we can have about code, however. A function devised to achieve one thing can be extended and repurposed to do numerous other things - does not the developer bear some responsibility in how it is used? Those who share that sentiment argue for ethical coding, targeting a future of much broader social awareness. But what exactly does ethical coding involve, what are some relevant cases to consider, and how can you get involved in it? Let's take a look.

What does ethical coding involve?
Ethical coding involves acknowledging and acting upon the social responsibilities of a developer - factoring it alongside corporate responsibility - seeking to adhere to a set of meaningful values. It has becoming more prominent in recent years due to rising social awareness in general.

Because of the layer of abstraction between low-level source code and the consequences of the projects people work on, it is easy for everyone in a development company to absolve themselves of any culpability. Ethical coding requires the executive level to take ownership of the entire company output, committing to holding every last employee to a revised standard.

A developer with an ethical approach will think carefully about what her software will be used (or has been used) to achieve, and adjust course accordingly. They will also have strong thoughts about the general responsibilities of the software development industry, making them eager to advocate for higher standards and more transparency.

Since ethical stances vary wildly, though, there can be no precise definition of ethical coding - the best we can do is to use a broad ethical framework that loosely mirrors the societal standard for other fields such as medicine: minimising harm, being truthful, and making life better. See the Programming Ethical Guidelines by ACM and IEEE.

The rise of ethical software development
As noted, social awareness of the many issues that face the world continues to rise, driven by the 24/7 connectivity of the Internet and the move into adulthood of an ethically-conscious generation. For better or worse, social media users now act as moral guardians of sorts. Because technology in general plays a huge role in industry and the opaque nature of many coding projects is at odds with a desire for transparency, the tech world has attracted a lot of flak. This has led to various projects that have sought to demonstrate that technology can be a force for good, such as the following:
  • The Hummingbird Clock. In 2016, as part of the Liverpool Biennial, this timepiece was installed as a commentary on government corruption and to serve as a tool for the public. Set up as three binoculars looking upon the Town Hall, it actually records the hum of the electrical grid and streams it online - since the government has long used that hum for surveillance, this was to make it available to anyone. As a result, anyone in legal difficulty can now use the hum to verify the time and date of a particular recording.

  • The Empathy Deck. Social media can be brutal at times, deeply unpleasant and unempathetic, because being online and having some degree of anonymity drives people to embrace their darkest impulses. The Empathy Deck account was set up to respond to tweets with thought-provoking cards assembled from pieces of an artist's body of work, commenting on the perils of automation while providing some relevant musings and snippets of poetry to brighten someone's day.

  • Cody Rocky. To many, the world of technology still feels quite dry and alienating, which can limit the types of people who choose to pursue it as a career. Cody Rocky is a programmable robot designed to appeal to children, lending a degree of accessibility to the field and making learning both productive and entertaining. It also crosses slightly into the nascent field of digital companionship.
In light of some high-profile fiascos making people much more concerned about the dangers of technology (such as the Cambridge Analytica scandal, or the death caused by a self-driven car), the pressure on developers to implement ethics policies has increased significantly. There is also the thorny issue of so-called hacktivism - is hacking done for a good cause any less objectionable? Who gets to decide where to draw the line?

Though industry-wide ethical frameworks have been suggested before, there is so much more impetus on companies to be proactive now. In the coming years, I think it is fairly likely that it will become standard for software developers - well, any companies in the digital industry - to lay out and commit to their ethical codes.

How to get involved in ethical coding
If you are a developer and you want to get involved in ethical coding, then you can. It is actually fairly simple, with the following two major steps:
  • Code ethically. This is straightforward enough. Whenever you create applications, be keenly aware of any and all ethical issues that may arise as a result. For example credit everyone whose code you adapt, be efficient to avoid needlessly taxing a system, maintain professional integrity, and refrain from working on any project that you expect to be used irresponsibly or immorally. You cannot know every potential issue, but you can do your best.

  • Contribute to ethical projects. These can be your own, or existing projects that need help. This is not about your code, but about the goals of the project: for instance, you could volunteer some time for a healthcare-related project, or help out a charity with its operations. It does not cost much to start your own business these days, so it is easy for someone with good intentions to launch a startup, but it is much harder to make it a contender. Your support could be the key to a business getting big enough to do some real good.
Beyond that, the specifics are entirely up to you. It is really a matter of getting involved with online communities, finding people who share your values, and slowly discovering how you can best use your time to engage in ethical coding.

Ethical coding is only going to grow as a concern while massively-influential fields such AI or the IoT get larger. If you develop software and are strongly ethical, give a lot of thought to how you can better reconcile your beliefs with your work. There is no reason why you cannot have professional and personal contentment at the same time.

31 October 2018

More on Architectural Refactoring

Last month I showed my categorisation of Architectural Refactoring to some friends in the Software Crafting community. Explaining my ideas and getting feedback helped me to sharpen the definitions further. In my original article I defined four groups of Architectural Refactoring. Here is what we discussed about each one of them:

1. Substitute Architectural Decision
In this category I collected Refactoring techniques which replace whole parts of an architecture, e.g.
  • Replacing one relational database system with another
  • Replacing the relational database with a NoSQL one
  • Replacing UI technologies, e.g. moving from JSF to Vaadin
Adrian Bolboaca further split these categories into groups and used the following three names:

Kinds of Architectural Refactoring (C) SoftDevGang 20181.1. Tool Change
A Tool Change affects the implementation of a "tool", e.g. the database system. The first change in the previous list - replacing one relational database system with another - is a Tool Change. Upgrading libraries or migrating between similar frameworks is a Tool Change too. Initially I had listed them as Widespread Architectural Change. While upgrading or migrating libraries might require similar changes throughout the whole project, they are motivated by the Architectural Decision to use a certain version of a library.

1.2. Technology Change
A Technology Change modifies a used technology. Items two and three of the previous list are Technology Changes. In my experience, most Architectural Refactoring is of this type.

1.3. (Other) NFR Caused Change
Changing a tool or technology is never done for its own sake. We might upgrade a framework for the improved performance of its new version, or switch to another data store because it is more reliable or its license is more suited for the business. NFRs (Non Functional Requirements) are requirements that specify how a system is supposed to be in contrast to functional requirements which define what a system is supposed to do. NFRs usually contain long lists of attributes describing a system like performance, reliability and also security, traceability and globalisation. Examples of such changes are changing aspects like logging or security or applying internationalization. Again I had initially listed them as Widespread Architectural Change.

2. Refactor Architectural Structure
The second category deals with architectural building blocks like components, layers or packages. There are two causes for changing these:

RSAR Package Tangle found at an unnamed client2.1. Cleaning up Legacy Code
When we start a new project, its architecture is clear. When the system becomes more complex, effects known as Software Architecture Erosion or Architectural Drifts happen. These effects can lead to systems where the realisation of the system diverges from the intended architecture with resulting negative impacts on quality attributes associated with the intended architecture. Uncle Bob uses other words: The architecture starts to rot like a piece of bad meat.

For example look at the package tangle on the right: Several teams were working on a shared code base under an aggressive schedule and produced an architecture where each package was depending on all others, resulting in the legendary Big Ball of Mud architecture. To clean up this mess components with clear interfaces had to be introduced.

2.2. Evolving an Architecture
Today we rather embrace evolving requirements than stick to a big up-front plan. Techniques like Evolutionary Design accommodate the need of changing the design while systems are growing. And also their architectures need to evolve, e.g. we need to extract a new component or aspect which became duplicated during addition of structure to support new features. With the rising popularity of Microservices and their focus on incremental change in the architecture, Evolutionary Architecture has become a reality.

3. Widespread Architectural Change
I am mostly interested in these changes because I want to automate them. I described some basic tools and some more powerful ones. Most examples of changes I used were motivated in the change of Architectural Decisions. Even coding conventions can be seen as part of the NFR Maintainability. There is definitely an overlap, as some architectural Refactoring and also a few structural ones, will be widespread. Adi drew me a Venn diagram to explain that overlap:

Overlap of Categories of Architectural Refactoring, by Adrian Bolboaca
4. Other Architectural Changes
I am still uncertain if there are other kinds of architectural changes. I miss examples. If you have some, please come forward!

19 October 2018

NATstyle Custom Reports

This summer I wrote about NATstyle, a utility to define and check the coding standard in your NATURAL program. Now NATURAL is not a language you will encounter frequently (hopefully not ;-) but it is an excellent show case for neither being intimidated by strange environments nor accepting idiosyncratic vendor views. First I showed how NATstyle can be used outside of the NATURAL IDE - a prerequisite for setting up Continuous Integration, then I experimented with ways to define my own custom analysis rules. Now is the time to talk about reporting.

Custom Stylesheets
NATstyle creates an XML report. Its structure is
<NATstyle>
  <file type='...' location='...'>
    <error severity='...' message='...' />
    ...
  </file>
  ...
</NATstyle>
NATstyle comes with a stylesheet NATstyleSimple.xsl. The XSLT transformation is not part of the NATstyle execution. Using custom stylesheets allows you to convert the XML into arbitrary reports.

Suitable Target Structure
When working on weird environments, I try to use standard tools as much as possible. Emulating common output formats is the way to go. For example in the case of test results, the JUnit XML format is used in similar scenarios, e.g. Erlang, Cobol and others.

What would be a good structure for reporting static code analysis findings? PMD is a common tool in the Java space and it creates XML reports which are understood by tools like Jenkins, making integration much easier. In the report, violations are grouped by files:
<pmd version="..." timestamp="...">
  <file name="SampleClass.java">
    <violation beginline="2" endline="16"
               begincolumn="8" endcolumn="1"
               rule="TooManyFields" ruleset="Code Size"
               externalInfoUrl="..." priority="3">
      Too many fields
    </violation>
  </file>
</pmd>
XML Parser
Changing the NATstyle result to PMD's format is possible using XSLT alone, but I favour XML parsers for more complex transformations. I established the following process to convert the results: As I used Ant to execute NATstyle on Jenkins, I created an Ant task NatStyleResultToPmdTask.java. (Ant declaration for that task is in ant/ant_NatStyleResultToPmd.xml.) The Ant task sets the input and output XML file names and invokes the NatStyleToPmd.java conversion which uses a SAX parser to parse the NATstyle result and feeds the violations into net.​sourceforge.​pmd.​Report#​addRuleViolation. Then core PMD classes create the final XML report. (The project containing the Ant task and complete, working setup is here.)

Conclusion
In legacy environments it is often possible to "bridge" to standard tools and make use of the rich tooling we have. We can apply modern practises like automated code reviews or Continuous Integration. Another example is bringing TDD and automated testing to the mainframe platform. It is 2018, we have all these great tools, let's make use of them!

8 October 2018

Interview Johannes Link

I am happy to announce my next guest in the series of interviews on ethics in Software Development and meaningful work: Johannes Link is a well known XP practitioner, trainer and frequent speaker from Germany.

Johannes, please introduce yourself.
I have been developing software in a professional context for more than 20 years and that is still one of my passions. My understanding of how to do programming was largely shaped by Extreme Programming. Especially its focus on quality and close collaboration are an intrinsic part of how I look on everything software-related.

ethicsI know that you are concerned with ethics in Software Development. Why do you care?
I think that everyone's personal views on ethics, morale, purpose of life, "good and bad" should also be reflected in their professional activities. Thus it is obvious to me that I cannot support activities in my job that I would reject when discussing politics with my friends. It came rather as a surprise to me when I learned that other people intentionally differentiate between their personal opinions and their professional activities.

What other topics are you concerned about?
Besides my intention to not harm society I would love to do something meaningful or even "to give back" to the world. I guess it came with age and with having children: Just implementing the next e-commerce shop or the next workflow-system for travel expenses won't excite me any more - regardless of the technology in use. That is why I am always looking for "purpose" in the projects I work on; it might be a personal purpose - e.g. building a new tool to make my life as a developer easier - or a more general purpose like teaching kids the fundamentals of programming.

There is also a "meta"-topic: Raising the awareness in our industry that what we do and how we do it shapes the future life of all. That is why we have to start thinking about responsibility in all our projects, and that contains aspects like security, usability, accessibility and risks.

Outside the topics discussed so far, what do you consider the biggest challenges (for humanity) of our times?
Preserving a world that is worth living in. It is not only the climate that is at stake but also democracy and other traits of humaneness that came with the Age of Enlightenment.

When I talk to people, many express concern about meat mass production or pollution, but almost nobody really acts on it. What could we do to engage in these topics?
I am probably as lazy as most when it comes to getting into action. I donate money and I sign petitions, but does that really help? Maybe it is even corrupting the purpose because it makes us feel better and takes some of the uneasiness away that could otherwise push us to real action.

That said, I do not think we can (and should) all fight the same battles. Choose one or a few topics in which your interest or your anger is the biggest and tackle those. As for me the rise of nationalism and xenophobia inside many countries that seemed to be very stable democracies until recently is finally driving me to speak up. Just watching is no longer enough.

Whistle-BlowerAs a software developer, what options do we have to act and do "the right things"?
You always have the possibility to question certain aspects of your work. For example, when you are supposed to collect vast amounts of data that could potentially be used to harm people, you can raise the related ethical and legal issues with your employer. In the end you must be willing to suffer the consequences and not all of us have the luxury (the savings) to do that.

What looks like technical decisions to us might have a real consequence for others: Choose a certain browser as target platform and you exclude some users from using the product. Switch from on-premise deployment to a cloud solution and you make some operation staff redundant. That is why I think that we cannot hide behind the "It is just technology" argument.

Whatever we do we should always ask ourselves:
  • Is there a conceivable way that the system to which I contribute will intentionally or unintentionally harm others? If so, how can it be changed to (at least) reduce the probability?
  • Am I competent enough to take the technical decisions that I take? This is especially relevant when it comes to security because even as senior developer it is hard to keep up-to-date in that field.
  • Is the quality we build high enough to not put the system's purpose at risk? This decision is so context-specific that we often get it wrong when switching domains, e.g. from web store to medical device.
I run into quality conflicts with my customers from time to time. Sometimes, as a last resort, I might refuse to go with a Product Owner's prioritisation because I consider fixing a certain technical debt or design flaw crucial for either the project's sanity or the team's capability to change the system in the future. Saying that I will not continue with the current prioritization can get me the accusation of blackmailing whereas I consider it to be an additional piece of information for the business side. And yes, in those cases I will stand with my word if we won't be able to find a common resolution; so far that happened only once.

How do you think about selecting industry, customer and project based on your values?
Well, we have to earn our daily pasta. When arguing these topics with others, I had to concede that the decision if a certain industry, company or project is in alignment with my ethical views is sometimes very tricky or even undecidable. It can be difficult to see if my contract is contributing to a good or a bad purpose. Sometimes it is obvious. In those cases you have the choice to quit a contract and to tell others why you did it.

There are a few industries that are no-go areas for me. In most other cases it is the concrete project that makes me decide. It is easier for me as an independent developer than it is for an employee, though.

Since the amount of information you get in advance of joining a project is somewhat limited I tend to find out more details through my network or go with my gut feeling. It has not happened yet that I wanted to leave a project for ethical reasons afterwards.

Boeing B-52 dropping bombsLet's be more specific: Would you work for an animal factory? Would you work for a company producing equipment for an animal factory?
Animal Factory: No. Company producing equipment for an animal factory: I might not even know. If I knew, it would probably depend on the type of equipment since a lot of equipment is very unspecific, e.g. an electric engine can be used for all kinds of things.

Do you have problems with any industries?
There is currently only one area that I would reject on principle: Military and weapons. I consider intelligence organisations to be part of military so add that to the list. And there are a bunch of companies whose offers I would reject without thinking twice.

I struggle with the distinction of "working for" and "buying from". Buying also means supporting, but I am certainly less consequential there.

Did you ever reject a project, that would bring you money based on your values?
Yes. I was recently contacted by an agent offering me a very interesting position both location-wise and considering the technologies in use. When I was told the name of the customer however, I had to reject, since the company's business model is known for harming people and mostly serving the rich. When I told the agent that I won't take the position "out of principal considerations" she was astonished and I explained my motives. She seemed to be really surprised that anyone would do that.

On the other hand, what would be industries, customers and projects that you would love to work on?
The medical field has a lot of potential to improve the lives of many. But the current equilibrium of forces usually takes care that the established industry players win and the patients lose. So that is tricky and I have been working in medicine-related projects a few times. I would love to do paid work for NGOs that work along my personal political views. I have not seen a single project in that area, though.

Thank you Johannes for sharing your views on this important topic.

2 October 2018

Developer Melange Episode 3

Yesterday the third episode of the Developer Melange podcast was published. Developer Melange is a monthly podcast which brings you regular discussions about software engineering topics. All of them, in one way or another, related to building great software products. It is recorded by my friends and fellow software crafters Paul Rohorzka, David Leitner and Christian Haas and they release around an hour of high quality discussion every month.

Hello MicI am supporting them since the very beginning. In fact I was playing with the idea of recording a monthly podcast already for some time. After my summer project I finally found the time to visit them during recording: We discussed the essence of Behaviour Driven Development, starting with Dan North's introductory article from 2006. It seems that we could not agree what the essence or core of BDD might be. In the second part we tried to answer the question if it is better to be a specialist or generalist in the field of software delivery. Of course the time was too short for such deep topics. ;-)

Get it here
Listen to this episode here. All previous (and future) episodes can be found on the Developer Melange home page and Developer Melange SoundCloud page. If you need an RSS feed for an old-fashioned podcast app, use Get RSS feeds from SoundCloud URLs, which is this RSS.

Feedback
The team and I are curious what you think about Developer Melange. You can reach us on Twitter or leave comments on SoundCloud. Tell us what you think! Any suggestions, e.g. topics you would like to hear about, and also criticism are highly welcomed. Stay tuned!

8 September 2018

IDE Support for Scripts

Last month I wrote about dealing with modified files during build in Jenkins. The solution uses Groovy scripts and the Jenkins Groovy plugin which allows execution of scripts in Jenkins' system context. These System Scripts have access to Jenkins' internal model like build status, changed files and so on. The model includes several classes which need to be navigated.

SupportIn many situations little additional scripts make our lives as developers easier: For example the Groovy System Script to customise Jenkins when building a NATURAL code base mentioned above, or a Groovy Script Transformation to perform repeated code changes in a large Java project using WalkMod, or a custom build during a Maven build, using any language supported by the Apache Bean Scripting Framework, e.g. Groovy, Ruby, Python or others. Now if the scripts are getting complicated IDE support would be nice.

Getting support for the language in the IDE
Depending on the language, most IDEs need installation of a plugin or extension. I will briefly describe the steps to set up Groovy support into Eclipse because it is more elaborate and the organisation I initially wrote this documentation used Eclipse. Getting plugins for IntelliJ IDEA is straight forward.
  • First obtain the version of Eclipse you are using. Customised distributions like the Spring Tool Suite, IBM/Rational products or NaturalONE follow different version schemes than Eclipse. Navigate to the Help menu, item About, button Installation Details, tab Features, line Eclipse Platform to see the real version. For example, NaturalONE 8.3.2 is based on Eclipse 4.4.
  • The Groovy plugin for Eclipse is GrEclipse. For each major version of Eclipse there is a matching version of GrEclipse. The section Releases lists the Update Site for each release. Find the suitable version. (If you use a newer Eclipse, maybe you have to use a snapshot version instead of a release version.) Copy the URL of the Update Site.
  • The section How to Install explains in detail how to continue and install GrEclipse.
After that Eclipse can talk Groovy.

Enabling support for the language in the project
Next we configure the project for the additional language. In Eclipse this is often possible through the (right click) context menu of the project, menu item Configure. I usually do not bother and edit Eclipse's .project file directly to add the needed natures and build commands. Adding Groovy to a non-Java project needs adding to .project,
<projectDescription>
  <!-- existing configuration -->
  <buildSpec>
    <!-- existing configuration -->
    <buildCommand>
      <name>org.eclipse.jdt.core.javabuilder</name>
      <arguments>
      </arguments>
    </buildCommand>
  </buildSpec>
  <natures>
    <!-- existing configuration -->
    <nature>org.eclipse.jdt.groovy.core.groovyNature</nature>
    <nature>org.eclipse.jdt.core.javanature</nature>
  </natures>
</projectDescription>
and .classpath
<classpath>
  <!-- existing configuration -->
  <classpathentry exported="true" kind="con" path="GROOVY_SUPPORT"/>
  <classpathentry exported="true" kind="con" path="GROOVY_DSL_SUPPORT"/>
</classpath>
Similar changes are needed for Ruby (using org.eclipse.​dltk.core.​scriptbuilder and org.eclipse.​dltk.ruby.​core.nature) and Python (org.python.​pydev.PyDevBuilder and org.python.​pydev.pythonNature). If I am not sure, I create a new project for the target language and merge the .projects manually. Sometimes other files like .classpath, .buildpath or .pydevproject have to be copied, too.

Enabling support for libraries
Adding a nature to .project gives general support like syntax highlighting and refactoring for the respective language but there is no completion of library classes. Especially when using new libraries, like Jenkins Core, using code completion helps exploring the new API because you get a list of all possible methods to call. The trick is to add the needed dependency to the project in a way that it is not accessible from or packaged together with the production code.

An Example: Adding Jenkins System Script support to a NaturalONE project
To add dependencies to the NaturalONE project earlier, I converted the Java/Groovy project to Maven (by setting another nature and builder) and added the needed classes as fake dependencies to the pom.xml.
<dependencies>
  <!-- other dependencies -->
  <dependency>
    <groupId>org.jenkins-ci.main</groupId>
    <artifactId>jenkins-core</artifactId>
    <version>2.58</version>
    <scope>test</scope> <!-- (1) -->
  </dependency>
  <dependency>
    <groupId>org.jenkins-ci.plugins</groupId>
    <artifactId>subversion</artifactId>
    <version>2.7.2</version>
    <scope>test</scope>
  </dependency>
</dependencies>
Groovy has optional typing and after adding the types to the definitions and arguments, Eclipse will pick up the type and we get code completion on available fields and methods:
import hudson.model.Build
def Build build = Thread.currentThread()?.executable
build._ // <-- code completion, yeah!
This made developing and understanding the script to find modified files much easier. The Jenkins dependencies in the pom.xml are never used, as the System Script runs inside Jenkins where the classes are provided. The dependencies are declared to "trick" Eclipse into analysing them and providing type information. The scope of the dependencies is set to test (see line (1) in code snippet above) so the dependencies are not packaged and cannot be called from the production code. This zipped repository contains the NATURAL project together with all the Eclipse configuration files.

Another Example: Adding WalkMod Transformation support to a Maven project
Another situation where I wish for script support in Eclipse is when writing WalkMod Script Transformations. WalkMod is an open source Java tool that can be used to apply code conventions automatically. Read this tutorial by Raquel Pau to see how WalkMod works. WalkMod allows for Groovy scripts to define code transformations which manipulate the AST of Java classes. Navigating the AST is difficult in the beginning.

After adding the Groovy nature as shown in the previous example, the relevant dependencies to get code completion for the AST are
<dependencies>
  <dependency>
    <!-- API -->
    <groupId>org.walkmod</groupId>
    <artifactId>walkmod-core</artifactId>
    <version>3.0.4</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <!-- AST -->
    <groupId>org.walkmod</groupId>
    <artifactId>javalang</artifactId>
    <version>4.8.8</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>com.squareup</groupId>
    <artifactId>javapoet</artifactId>
    <version>1.10.0</version>
    <scope>test</scope>
  </dependency>
</dependencies>
In case of Walkmod transformations, the interesting class is (org.walkmod.​javalang.ast.​CompilationUnit) and it is already in scope as variable node. To add type information we need to alias it:
import org.walkmod.javalang.ast.CompilationUnit
def CompilationUnit cu = node
cu._ // <-- code completion on available fields and methods
Conclusion
Supporting scripts in Groovy, Ruby or Python make our lives easier. With some additional extensions or plugins and some tweaks in the project configuration our IDEs will support the scripting language. Why should we accept code editing like in the early 1990s?

2 September 2018

Work Harder

With this article I want to prompt you to work hard(er). What does working hard mean? When I looked for different translations of German Strengt Euch an, I found several ones, which are all suitable: Work hard. Keep it tight. Push yourselves. Put some heart into it. Put some muscle in it. Put your backs to it. Make your best effort. Just play it like you fucking mean it. Obviously hard work needs your strength. It also needs your heart - motivation and dedication. It might test your limits, both physical and psychical ones.

Domesday BooksMy favourite example of hard work was writing books in the Dark Ages. Monks were copying books by hand, adding drawings and decorations as they went. An extreme example is the Codex Gigas, which was handwritten by a single, anonymous monk. Because the scribe was a monk he may only have been able to work for about three hours a day, and this means that the manuscript including decoration probably took at least 20 years to finish, and could even have taken 30. Now that is a whole lifetime for collecting and handwriting a single book. Imagine the dedication.

Maybe less extreme is the life of master craftspeople. Several years ago I wrote about a master craftsman hand-crafting rakes. He could retire any time, still chose to improve his methods and work hard all day.

Benefits of hard work
Most of you will agree, that hard work pays off. Here is a little TED talk by Richard St. John, who explains why it is so: Hard work is the real secret to success. I vividly remember a professor of Mathematical Analysis during my studies: At the beginning of a lecture he wrote a theorem on the blackboard, followed by an easy proof. The proof looked good and we students understood what was going on. Then he said "Was man leicht kriegt ist nichts wert" ("What you get easily is worth nothing"), pointed out a mistake in the proof and erased it. He used the remaining of the hour to sketch a valid proof which was complicated and much harder to follow.

PainHard work has more value
Theodore Roosevelt even said that "Nothing in the world is worth having or worth doing unless it means effort, pain, difficulty..." Maybe he was exaggerating, but a little pain never hurts. (Pun intended ;-) So is the motto in weightlifting training: No Pain, No Gain. Psychologist Katarina Veselko mentioned the topic in one of her talks. During the following conversation she explained: It not so much that we do not value things that are easy to get, it is more about the fact that when we need to invest a lot of energy, time, money, ... into achieving something, the result will be more valuable for us because we have invested a lot into it. It is some kind of Cognitive dissonance. She also believes that most things worth having are not easy to get; most things that are valuable to us are not achieved in a way that is always fun, fast and easy, but also includes obstacles and challenges.

Instant Gratification
The term Instant Gratification is often used to label the satisfactions gained by more impulsive behaviours: choosing now over tomorrow. And today it is harder to delay gratification than it used to be. We have been trained by technology and social media to expect results fast, without much effort. For example, there are more than 2 million hits on Google how to earn 5k extra, many of then offering "work from home, 30 minutes daily" up to "basically doing nothing". The same is true for losing weight without any diet and so on. If you want to read more on Instant Gratification and its problems, I recommend Courtney Ackerman's Definition with Examples.

Conclusion
It seems that working hard has become unpopular. Why work towards a long term goal when many things are handed over on a plate, or downloaded at the click of a button, or ours in twenty-four hours for just £9.99 extra? I do not think that life works like that. Why do I come up with this in my blog? I am writing this because I think we Craftspeople need to work harder. I am not talking about working more hours. In fact working overtime is against the idea of Craftsmanship because we need our free time to reflect, exchange and learn.

I am talking about pushing ourselves more: Following up on that refactoring you did not finish last week. Refactoring mercilessly, always pushing to keep the whole code base clean and consistent, even during architectural changes. I am talking about not accepting lesser standards because of low level, different or even obsolete technologies or environments. I am talking about overcoming pointless bureaucracy or managers who have no idea about code quality.

Defiance Cafe
I know it is hard. Years of arguments with colleagues, fighting superiors, struggling to grow and working broken processes have taken their toll. Sometimes I feel old. But I have to defy them. Let us continue pushing, trying to work harder and make an effort!

29 August 2018

Widespread Architectural Change Part 1

Last month I introduced Four Categories of Architectural Refactoring:
  • Substitute Architectural Decision
  • Refactor Architectural Structure
  • Widespread Architectural Change
  • Other Architectural Changes
Today I want to start discussing options to perform Widespread Architectural Changes. This category contains all kind of small changes repeated throughout the whole code base. The number of these changes is high, being hundreds or sometimes even thousands of occurrences that have to be changed in a similar way. Examples of such modifications are:
  • Migrating or upgrading (similar) APIs, frameworks or libraries
  • Changing or unifying coding conventions
  • Fixing compiler warnings and removing technical debt
  • Consistently applying or changing aspects like logging or security
  • Applying internationalization
  • Migrate between languages, e.g. SQL vs. HQL
Ways to do Widespread Architectural Changes (C) by SoftDevGang 2016All of them do not change the overall structure, they just work on the implementation. Often substituting architectural decisions or altering architectural structure contain similar changes.

Options
I have been using some techniques since 2004 because as Code Cop I value code consistency. Two years ago I had the opportunity to discuss the topic with Alexandru Bolboaca and other experienced developers during a small unconference and we came up with even more options, as shown in the picture on the right. The goal of this and the next articles is to introduce you to these options, the power of each one and the cost of using it. A word of warning: This is a raw list. I have used some but not all of them and I have yet to explore many options in more detail.

Supporting Manual Changes With Fast Navigation
The main challenge of widespread changes is the high number of occurrences. If the change itself is small, finding all occurrences and navigating to them is a major effort. Support for fast navigation would make things easier. The most basic form of this is to modify or delete something and see all resulting compile errors. Of course this only works with static languages. In Eclipse this works very well because Eclipse is compiling the code all the time and you get red markers just in time. Often it is possible to create a list of all lines that need to be changed. In the past have created custom rules of static analysis tools like PMD to find all places I needed to change. Such a list can be used to open the file and jump to the proper line, one source file after another.

As example, here is a Ruby script that converts pmd.xml which is contains the PMD violations from the Apache Maven PMD Plugin to Java stack traces suitable for Eclipse.
#! ruby
require 'rexml/document'

xml_doc = REXML::Document.new(File.new('./target/site/pmd.xml'))
xml_doc.elements.each('pmd/file/violation') do |violation|
  if violation.attributes['class']
    class_name = violation.attributes['package'] + '.' +
                 violation.attributes['class']
    short_name = violation.attributes['class'].sub(/\$.*$/, '')
    line_number = violation.attributes['beginline']
    rule = violation.attributes['rule']

    puts "#{class_name}.m(#{short_name}.java:#{line_number}) #{rule}"
  end
end
The script uses an XML parser to extract class names and line numbers from the violation report. After pasting the output into Eclipse's Stacktrace Console, you can click each line one by one and Eclipse opens each file in the editor with the cursor in the proper line. This is pretty neat. Another way is to script Vim to open each file and navigate to the proper line using vi +<line number> <file name>. Many editors support similar navigation with the pattern <file name>:<line number>.

Enabling fast navigation is a big help. The power this option is high (that is 4 out of 5 on my personal, totally subjective scale) and the effort to find the needed lines and script them might be medium.

Search and Replace (Across File System)
The most straight forward way to change similar code is by Search and Replace. Many tools offer to search and replace across the whole project, allowing to change many files at once. Even converting basic scenarios, which sometimes cover up to 80%, helps a lot. Unfortunately basic search is very limited. I rate its power low and the effort to use it is also low.

Scripted Search and Replace using Regular Expressions
Now Regular Expressions are much more powerful than basic search. Many editors allow Regular Expressions in Search and Replace. I recommend creating a little script. The traditional approach would be Bash with sed and awk but I have used Ruby and Python (or even Perl) to automate lots of different changes. While the script adds extra work to traverse all the source directories and files, the extra flexibility is needed for conditional logic, e.g. adding an import to a new class if it was not imported before. Also in a script Regular Expressions can be nested, i.e. analysing the match of an expression further in a second step. This helps to keep the expressions simple.

For example I used a script to migrate Java's clone() methods from version 1.4 to 5. Java 5 offers covariant return types which can be used for clone(), removing the cast from client code. The following Ruby snippet is called with the name of the class and the full Java source as string:
shortName = shortClassNameFromClassName(className)
if source =~ / Object clone\(\)/
  # use covariant return type for method signature
  source = $` + " #{shortName} clone()" + $'
end
if source =~ /return super\.clone\(\);/
  # add cast to make code compile again
  source = $` + "return (#{shortName}) super.clone();" + $'
end
In the code base where I applied this widespread change, it fixed 90% of all occurrences as clone methods did not do anything else. It also created some broken code which I reverted. I always review automated changes, even large numbers, as jumping from diff to diff is pretty fast with modern tooling. Scripts using Regular Expressions helped me a lot in the past and I rate their power to high. Creating them is some effort, e.g. a medium amount of work.

Macros and Scripts
Many tools like Vim, Emacs, Visual Studio, Notepad++ and IntelliJ IDEA allow creation or recording Keyboard and or mouse macros or Application scripts. With them, frequently used or repetitive sequences of keystrokes and mouse movements can be automated, and that is exactly what we want to do. When using Macros the approach is the opposite as for navigation markers: We find the place of the needed change manually and let the Macro do its magic.

Most people I have talked to know Macros and used them earlier (e.g. 20 years ago) but not recently in modern IDEs. Alex has used Vim Scripts to automate repetitive coding tasks. I have not used it and relating to his experience. Here is a Vim script function which extracts a variable, taken from Gary Bernhardt's dotfiles:
function! ExtractVariable()
  let name = input("Variable name: ")
  if name == ''
    return
  endif

  " Enter visual mode
  normal! gv
  " Replace selected text with the variable name
  exec "normal c" . name
  " Define the variable on the line above
  exec "normal! O" . name . " = "
  " Paste the original selected text to be the variable value
  normal! $p
endfunction
I guess large macros will not be very readable and hard to change but they will be able to do everything what Vim can do, which is everything. ;-) They are powerful and easy to create - as soon as you know Vim script of course.

Structural Search and Replace
IntelliJ IDEA offers Structural Search and Replace which performs search and replace across the whole project, taking advantage of IntelliJ IDEA's awareness of the syntax and code structure of the supported languages.. In other words it is Search and Replace on the Abstract Syntax Tree (AST). Additionally it is possible to apply semantic conditions to the search, for example locate the symbols that are read or written to. It is available for Java and C# (Resharper) and probably in other JetBrains products as well. I have not used it. People who use it tell me that it is useful and easy to use - or maybe not that easy to use. Some people say it is too complicated and they can not make it work.

Online help says that you can apply constraints described as Groovy scripts and make use of IntelliJ IDEA PSI (Program Structure Interface) API for the used programming language. As PSI is the IntelliJ version of the AST, this approach is very powerful but you need to work the PSI/AST which is (by its nature) complicated. This requires a higher effort.

To be continued
And there are many more options to be explored, e.g. scripting code changes inside IDEs or using Refactoring APIs as well as advanced tools outside of IDEs. I will continue my list in the next part. Stay tuned.

15 August 2018

Creating your own NATstyle rules

Last month I showed how to use NATstyle from the command line. NATstyle is the utility to define and check the coding standard of your NATURAL program. Today I want to explain how to customize and create your own rules beyond what is explained in the manual. I used NaturalONE 8.3.5.0.242 CE (November 2016). Likely there are more options and rules available in newer versions of NaturalONE and NATstyle.

Basic Configuration
NATstyle comes packaged inside NaturalONE, the Eclipse-based IDE for NATURAL. As expected NATstyle can be configured in Eclipse preferences. The configuration is saved as NATstyle.xml which is used when you run NATstyle from the right click popup menu. We will need to modify NATstyle.xml later, so let's have a look at it:
<?xml version="1.0" encoding="utf-8"?>
<naturalStyleCheck version="1.0"
                   xmlns="http://softwareag.com/natstyle/rules"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://softwareag.com/natstyle/rules checks.xsd">
  <checks type="source">
    <check class="CheckLineLength" name="Line length" severity="warning">
      <property name="max" value="72" />
      <property name="exclude" value="D3" />
    </check>
    <!-- more checks of type source -->
  </checks>
  <!-- more checks of other types -->
</naturalStyleCheck>
(In the source, the default configuration file is res/​checks.xml together with its XML schema res/​checks.xsd.)

Existing Rules
The existing rules are described in NaturalONE help topic Overview of NATstyle Rules, Error Messages and Solutions. Version 8.3 has 42 rules. These are only a few compared to PMD or SonarQube, which has more than 1000 rules available for Java. Here are some examples what NATstyle can do:
  • Source Checks: e.g. limit line length, find tab characters, find empty lines, limit the number of source lines and check a regular expressions for single source lines or whole source file.
  • Source Header Checks: e.g. force header or check file naming convention.
  • Parser Checks: e.g. find unused local variables, warn if local variable shadows view, find TODO comments, calculate Cyclomatic and NPath complexity, force NATdoc (documentation) tags and check function, subroutine and class names against regular expressions.
  • Error (Message File) Checks: e.g. check error messages file name.
  • Resource (File) Checks: e.g. check resource file name.
  • Library (Folder) Checks: e.g. library folder conventions, find special folders, force group folders and warn on missing NATdoc library documentation.
Same rule multiple times configured differently
Some rules like Source/Regular expression for single source lines only allow a single regular expression to be configured. Using alternation, e.g. a|b|c, in the expression is a way to overcome that, but the expression gets complicated quickly. Another way is to duplicate the <check> element in the NATstyle.xml configuration. Assume we do not only forbid PRINT statements, we also do not allow reduction to zero. (These rules do not make any sense, they are just here to explain the idea.) The relevant part of NATstyle.xml looks like
<checks type="source">
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="PRINT '.*" />
  </check>
  <check class="CheckRegExLine"
         name="Regular expression for single source lines" ... >
    <property name="regex" value="REDUCE .* TO 0" />
  </check>
</checks>
While it is impossible to configure these rules in the NaturalONE preferences, it might be possible to run NATstyle with these modified settings. I did not verify that. I execute NATstyle from the command line passing in the configuration file name using the -c flag. (In the source, see the full configuration res/​sameRuleTwice.xml and the script to to run the rules bin/​natstyle_sameRuleTwice.bat from the command line.)

rock piles of different sizeDefining your own rules
There is no documented way to create new rules for NATstyle. All rules' classes are defined inside the NATstyle plugin. The configuration XML contains a class attribute, which is a short name, e.g. CheckRegExLine. Its implementation is located in the package com.​softwareag.​naturalone.​natural.​natstyle.​check.​src.​source where source is the group of the rules defined in the type attribute of the <checks> element. I experimented a lot and did not find a way to load rules from other packages than com.​softwareag.​naturalone.​natural.​natstyle. All rules must be defined inside this name space, which is possible.

Source Rules
While I cannot see the actual code of NATstyle rules, Java classes expose their public methods and parent class. I did see the names of the rule classes in the configuration and guessed and experimented with the API a lot. My experience with other static analysis tools, e.g. PMD and Pylint and the good method names of NATstyle code helped me doing so. A basic Source rule looks like that:
package com.softwareag.naturalone.natural.natstyle.check.src.source; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerSourceImpl;
// other imports ...

public class FindFooSourceRule
  extends NATstyleCheckerSourceImpl { // 2.

  private Matcher name;

  @Override
  public void initParameterList() {
    name = Pattern.compile("FOO").matcher(""); // 3.
  }

  @Override
  public String run() { // 4.
    StringBuffer xmlOutput = new StringBuffer();

    String[] lines = this.getSourcelines(); // 5.
    for (int line = 0; line < lines.length; i++) {
      name.reset(lines[line]);
      if (name.find()) {
        setError(xmlOutput, line, "Message"); // 6.
      }
    }

    return xmlOutput.toString(); // 7.
  }
}
The marked lines are important:
  1. Because it is a Source rule, it must be in exactly this package - see the paragraph above.
  2. Source rules extend NATstyleCheckerSourceImpl which provides the lines of the NATURAL source file - see line 6. It has more methods, which have reasonable names, use the code completion.
  3. You initialise parameters in initParameterList. I did not figure out how to make the rules configurable from the XML configuration, which will probably happen in here, too.
  4. The run method is executed for each NATURAL file.
  5. NATstyleCheckerSourceImpl provides the lines of the file in getSourcelines. You can iterate the lines and check them.
  6. If there is a problem, call setError. Now setError is a bit weird, because it writes an XML element for the violation report XML (e.g. NATstyleResult.xml) into a StringBuffer.
  7. In the end the return the XML String of all found violations.
Finally the rule is configured with
<checks type="source">
  <check class="FindFooSourceRule"
         name="Find FOO"
         severity="warning" />
</checks>
(In the example repository, there is a working Source rule src/​com/​softwareag/​naturalone/​natural/​natstyle/​check/​src/​source/​FindInv02.java together with its configuration res/​customSource.xml.)

Parser Rules
Now it is getting more interesting. There are 18 rules of this type, which is a good start, but we need moar! Parser rules look similar to Source rules:
package com.softwareag.naturalone.natural.natstyle.check.src.parser; // 1.

import com.softwareag.naturalone.natural.natstyle.NATstyleCheckerParserImpl;
// other imports ...

public class SomeParserRule
  extends NATstyleCheckerParserImpl { // 2.

  @Override
  public void initParameterList() {
  }

  @Override
  public String run() {
    StringBuffer xmlOutput = new StringBuffer();

    // create visitor
    getNaturalParser().getNaturalASTRoot().accept(visitor); // 3.
    // collect errors from visitor into xmlOutput

    return xmlOutput.toString();
  }
}
where
  1. Like Source rules, Parser rules must be defined under the package ...natstyle.​check.​src.​parser.
  2. Parser rules extend NATstyleCheckerParserImpl.
  3. The NATURAL parser traverses the AST of the NATURAL code. Similar to other tools, NATstyle uses a visitor, the INaturalASTVisitor. The visitor is called for each node in the AST tree. This is similar to PMD.
Using the Parser
Tree, MuthillThe visitor must implement INaturalASTVisitor in package com.​softwareag.​naturalone.​natural.​parser.​ast.​internal. This interface defines 48 visit methods for the different sub types of INaturalASTNode, e.g. array indices, comments, operands, system function references like LOOP or TRIM, and so on. Still there are never enough node types as the AST does not convey much information about the code, most statements end up as INaturalASTTokenNode. For example the NATURAL lines
* print with leading blanks
PRINT 3X 'Hello'
which are a line comment and a print statement, result in the AST snippet
+ TOKEN: * print with leading blanks
+ TOKEN: PRINT
+ TOKEN: 3X
+ OPERAND
  + SIMPLE_CONSTANT_REFERENCE
    + TOKEN: 'Hello'
Now PRINT is a statement and could be recognised as one and 'Hello' is a string. This makes defining custom rules possible but pretty hard. To help me understand the AST I created a visitor which dumps the tree as XML file, similar to PMD's designer: src/​com/​softwareag/​naturalone/​natural/​natstyle/​check/​src/​parser/​DumpAstAsXml.java.

Conclusion
With this information you should be able to get started defining your own NATstyle rules. There is always so much more we could and should check automatically.

13 August 2018

This is the last time

It is August, it is hot and it is time to do some alternate work:

Bare Brickwork
Quick Fix
Looking at the picture made me think. Every industry seems to have its quick fixes: Engineers use duct tape and WD-40. In software development, we take short cuts by violating existing structure, e.g. adding another conditional resulting in convoluted logic. And in the construction industry it clearly is the usage of polyurethane or spray foam. As every tool it has its use, and like in the picture it is especially useful for temporary work. Unfortunately most professional construction workers I commissioned wanted to use it for everything - like duct tape: A hole in the wall? Why use brick and mortar when it is faster to put some foam into it. A gap in the boards? Why care to work accurately when it is easier to close the gap with foam afterwards. Not enough steam brake to cover the ceiling? Let's fill the remaining area with foam. You get the idea. While some people like the speed such quick fixes provide, e.g. Joel Spolsky praises the Duct Tape Programmer, I do not agree. Anyway, this project must come to an end and I sincerely hope it is the last time. ;-)