Code Cop: architecture

Showing posts with label architecture. Show all posts

7 January 2020

Visualising Architecture: GraphML Charting Module Dependencies

When working with existing code, analysing module dependencies helps me understand the bigger picture. For well defined modules like Eclipse OSGi or Maven there are tools to show all modules and their inter dependencies, e.g. the Eclipse Dependency Visualization tool or the Maven Module Dependency Graph Creator. For legacy technologies like COBOL and NATURAL - and probably some more platforms - we struggle due to the lack of such tools. Then the best approach is to extract the necessary information and put it into a modern, well known format. I wrote about such an approach in the past, e.g. loading references from a CSV into Neo4j for easy navigation and visualization. Today I want to sketch another idea I have used since many years: Using GraphML to chart module dependencies. For example, the following graph shows the (filtered) modules of a NATURAL application together with their dependencies in an organic layout.

Some modules of a NATURAL application in an organic layout, 2017

Some modules of a NATURAL application in an organic layout, 2017

Process
GraphML is an XML-based file format for graphs. [...] It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed, undirected, mixed graphs, hyper graphs, and application-specific attributes. That should be more than enough to visualize any module dependencies. In fact I just need nodes and edges. My process to use GraphML is always the same:

Start with defining or extracting the modules and their dependencies.
Create a little script which loads the extracted data and creates a raw GraphML XML file.
Use an existing graph editor to tweak and layout the diagram.
Export the the diagram as PDF or as image.

Get the raw data
Extracting the module information is highly programming language specific. For Java I used my JavaClass class file parser. For NATURAL I used a CSV of all references which was provided by my client. Usually regular expressions work well to parse import, using or include statements. For example to analyse plain C, scan for #include statements (as done in Schmidrules for Application Architecture. Unfortunately there is not much documentation available about Schmidrules, which I will fix in the future.)

Create the XML
The next task is to create the XML. I use a little Ruby script to serializes a graph of nodes into GraphML. The used Node class needs a name and a list of dependencies, which are Nodes again. save stores the graph as GraphML.

require 'rexml/document'

class GraphmlSerializer
  include REXML

  # Save the _graph_ to _filename_ .
  def save(filename, graph)
    File.open(filename + '.graphml', 'w') do |f|
      doc = graph_to_xml(graph)
      doc.write(out_string = '<?xml version="1.0" encoding="UTF-8" standalone="no"?>')
      f.print out_string
    end
  end

  # Return an XML document of the GraphML serialized _graph_ .
  def graph_to_xml(graph)
    doc = create_xml_doc
    container = add_graph_element(doc)
    graph.to_a.each { |node| add_node_as_xml(container, node) }
    doc
  end

  private

  def create_xml_doc
    REXML::Document.new
  end

  def add_graph_element(doc)
    root = doc.add_element('graphml',
      'xmlns' => 'http://graphml.graphdrawing.org/xmlns',
      'xmlns:y' => 'http://www.yworks.com/xml/graphml',
      'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance',
      'xsi:schemaLocation' => 'http://graphml.graphdrawing.org/xmlns http://www.yworks.com/xml/schema/graphml/1.1/ygraphml.xsd')
    root.add_element('key', 'id' => 'n1',
                     'for' => 'node',
                     'yfiles.type' => 'nodegraphics')
    root.add_element('key', 'id' => 'e1',
                     'for' => 'edge',
                     'yfiles.type' => 'edgegraphics')

    root.add_element('graph', 'edgedefault' => 'directed')
  end

  public

  # Add the _node_ as XML to the _container_ .
  def add_node_as_xml(container, node)
    add_node_element(container, node)

    node.dependencies.each do |dep|
      add_edge_element(container, node, dep)
    end
  end

  private

  def add_node_element(container, node)
    elem = container.add_element('node', 'id' => node.name)
    shape_node = elem.add_element('data', 'key' => 'n1').
      add_element('y:ShapeNode')
    shape_node.
      add_element('y:NodeLabel').
      add_text(node.to_s)
  end

  def add_edge_element(container, node, dep)
    edge = container.add_element('edge')
    edge.add_attribute('id', node.name + '.' + dep.name)
    edge.add_attribute('source', node.name)
    edge.add_attribute('target', dep.name)
  end

end

Layout the diagram
I do not try to layout the graph when creating it. Existing tools do a much better job. I use the yEd Graph Editor. yEd is a Java application. Download and uncompress the zipped archive. To get the final diagram

I load the GraphML file into the editor. (If the graph is huge, yEd needs more memory. yEd is just an executable Jar - Java Archive - it can get more heap on startup using java -XX:+UseG1GC -Xmx800m -jar ./yed-3.16.2.1/yed.jar.)
Then I select all nodes and apply menu commands Tools/Fit Node to Label. This is because the size of the nodes does not match the size of the node's names.
Finally I apply the menu commands Layout/Hierarchical or maybe Layout/Organic/Smart. In the end it needs some trial and error to find the proper layout.

Conclusion
This approach is very flexible. The nodes of the graph can be anything, a "module" can be a file, a folder or a bunch of folders. In the example shown above, the nodes where NATURAL modules (aka files), i.e. functions, subroutines, programs or includes. In another example shipped with JavaClass the nodes are components, i.e. source folders similar to Maven multi module projects. The diagram below shows the components of a product family of large Eclipse RCP applications with their dependencies in a hierarchical layout. Pretty crazy, isn't it. ;-)

Components of an Eclipse RCP application in hierarchical layout, 2012

Components of an Eclipse RCP application in hierarchical layout, 2012

30 August 2019

Visualising Architecture: Neo4j vs. Module Dependencies

Last year I wrote about some work I did for a client using NATURAL (an application development and deployment environment using a proprietary language), for example using NATstyle, adding custom rules and creating custom reports. Today I will share some things I used to visualise the architecture. Usually I want to get the bigger picture of the architecture before I change it.

Dependencies are killing us
Let's start with some context: This is Banking with some serious legacy: Groups of "solutions" are bundled together as "domains". Each solution contains 5.000 to 10.000 modules (files), which are either top level applications (executable modules) or subroutines (internal modules). Some modules call code of other solutions. There are some system "libraries" which bundle commonly used modules similar to solutions. A recent cross check lists more than 160.000 calls crossing solution boundaries. Nobody knows which modules outside of one's own solution are calling in and changes are difficult because all APIs are potentially public. As usual - dependencies are killing us.

Graph Database
To get an idea what was going on, I wanted to visualize the dependencies. If the data could be converted and imported into standard tools, things would be easier. But there were way too many data points. I needed a database, a graph database, which should be able to deal with hundreds of thousand of nodes, i.e. the modules, and their edges, i.e. the directed dependencies (call or include).

Extract, Transform, Load (ETL)
While ETL is a concept from data warehousing, we exactly needed to "copy data from one or more sources into a destination system which represented the data differently from the source(s) or in a different context than the source(s)." The first step was to extract the relevant information, i.e. the call site modules, destination modules together with more architectural information like "solution" and "domain". I got this data as CSV from a system administrator. The data needed to be transformed into a format which could be easily loaded. Use your favourite scripting language or some sed&awk-fu. I used a little Ruby script,

ZipFile.new("CrossReference.zip").
  read('CrossReference.csv').
  split(/\n/).
  map { |line| line.chomp }.
  map { |csv_line| csv_line.split(/;\s*/, 9) }.
  map { |values| values[0..7] }. # drop irrelevant columns
  map { |values| values.join(',') }. # use default field terminator ,
  each { |line| puts line }

to uncompress the file, drop irrelevant data and replace the column separator.

Loading into Neo4j
Neo4j is a well known Graph Platform with a large community. I had never used it and this was the perfect excuse to start playing with it ;-) It took me around three hours to understand the basics and load the data into a prototype. It was easier than I thought. I followed Neo4j's Tutorial on Importing Relational Data. With some warning: I had no idea how to use Neo4j. Likely I used it wrongly and this is not a good example.

CREATE CONSTRAINT ON (m:Module) ASSERT m.name IS UNIQUE;
CREATE INDEX ON :Module(solution);
CREATE INDEX ON :Module(domain);

// left column
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///references.csv" AS row
MERGE (ms:Module {name:row.source_module})
ON CREATE SET ms.domain = row.source_domain, ms.solution = row.source_solution

// right column
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///references.csv" AS row
MERGE (mt:Module {name:row.target_module})
ON CREATE SET mt.domain = row.target_domain, mt.solution = row.target_solution

// relation
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///references.csv" AS row
MATCH (ms:Module {name:row.source_module})
MATCH (mt:Module {name:row.target_module})
MERGE (ms)-[r:CALLS]->(mt)
ON CREATE SET r.count = toInt(1);

This was my Cypher script. Cypher is Neo4j's declarative query language used for querying and updating of the graph. I loaded the list of references and created all source modules and then all target modules. Then I loaded the references again adding the relation between source and target modules. Sure, loading the huge CSV three times was wasteful, but it did the job. Remember, I had no idea what I was doing ;-)

Querying and Visualising
Cypher is a query language. For example, which modules had most cross solution dependencies? The query MATCH (mf:Module)-[c:CALLS]->() RETURN mf.name, count(distinct c) as outgoing ORDER BY outgoing DESC LIMIT 25 returned

YYI19N00 36
YXI19N00 36
YRWBAN01 34
XGHLEP10 34
YWI19N00 32
XCNBMK40 31

and so on. (By the way, I just loved the names. Modules in NATURAL can only be named using eight characters. Such fun, isn't it.) Now it got interesting. Usually visualisation is a main issue, with Neo4j it was a no-brainer. The Neo4J Browser comes out of the box, runs Cypher queries and displays the results neatly. For example, here are the 36 (external) dependencies of module YYI19N00:

Called by module YYI19N00

As I said, the whole thing was a prototype. It got me started. For an in-depth analysis I would need to traverse the graph interactively or scripted like a Jupyter notebook. In addition, there are several visual tools for Neo4J to make sense of and see how the data is connected - exactly what I would want to know about the dependencies.

11 November 2018

Widespread Architectural Changes using IDEs

This is the second part of my list of tools to automate Widespread Architectural Changes. Now Widespread Architectural Change is a category of architectural refactoring which do not change the overall structure but need to be applied in many places consistently across a whole code base. For example, consider changing or unifying coding conventions or fixing violations. The main challenge of these changes is the high number of occurrences.

The goal of this article is to introduce you to ways to automate such changes. I have been using some of these techniques because as Code Cop I value code consistency, still this is a raw list. I have only used a few of the mentioned tools and need to explore many in more detail.

In the first part I listed some basic options, e.g. how to support manual changes with fast navigation, search and replace across the whole file system, use scripted search and replace using regular expressions, Macros and finally JetBrains' Structural Search and Replace. This time I focus on options available for modern IDEs. Tools like Eclipse or IntelliJ use an internal representation of the code, the Abstract Syntax Tree (AST). Structural Search and Replace works the AST and therefore belongs into this group as well.

Rescripter (Eclipse)
Modern IDEs support many Refactoring steps - why not use the existing tools to make our job easier? One of such tools is Rescripter, built for making large-scale changes that you can describe easily but are laborious to do by hand. Rescripter is an Eclipse plugin which runs JavaScript in the context of the Eclipse JDT. David Green added some helper objects to make searching, parsing and modifying Java code easier and wrapped a thin JavaScript layer around it. Rescripter scripts are working the AST.

Here are two examples from Rescripter's documentation: Finding a method named getName inside the class Person:

var type = Find.typeByName("Person");
var method = Find.methodByName(type, "getName").getElementName();

and adding a method getJobTitle to that type:

var edit = new SourceChange(type.getCompilationUnit());
edit.addEdit(ChangeType.addMethod(type, "\n\tpublic String getJobTitle() {\n" +
                                        "\t\treturn this.jobTitle;\n" +
                                        "\t}"));
edit.apply();

Rescripter is an old project and has not been updated since 2011. While it has some demo code and documentation, it is a bit raw. I guess it is not compatible with newer versions of Eclipse. Still - like Structural Search and Replace - it is very powerful. If you have a large code base and a repetitive change that can be expressed in terms of the AST, the high effort to get into the tool and create a script will pay off. (A useful plugin to help you working the AST is the AST View. The AST View is part of the JDT but not installed out of the box. It is visualising the AST of a Java file open in the editor.)

Jackpot (NetBeans)
Another, very interesting tool in this area is the Jackpot DSL for NetBeans used in the IDE actions Inspect and Transform aka Inspect and Refactor. Jackpot is a NetBeans IDE module for modelling, querying and transforming Java source files. In the Jackpot context, a program transformation is a script or Java class which queries sources in Java projects for patterns, changes them, and then writes these changes back to the original source files. [...] Jackpot transformations tend to be applied globally, such as at a project level or across several projects. Win! Unfortunately there is very little information about it. It took me hours just to find a little bit of information about it:

Dustin Marx describes how to create NetBeans 7.1 custom hints. He agrees that the most difficult aspect of using NetBeans's custom hints is finding documentation on how to use them. The best sources currently available appear to be the NetBeans 7.1 Release Notes, several Wielenga posts (Custom Declarative Hints in NetBeans IDE 7.1, Oh No Vector!, Oh No @Override!), and Jan Lahoda's jackpot30 Rules Language (covers the rules language syntax used by the custom inspections/hints). The Refactoring with Inspect and Transform in the NetBeans IDE Java Editor tutorial also includes a section on managing custom hints. All listed documentation is from 2011/2012. The most recent one I found is a short Jackpot demo from JavaOne Brazil 2016.

NetBeans hints are warnings that have a quick fix. For example the hint "Can Use Diamond" finds places where the diamond operator of Java 7 can be used instead of explicit type parameters. When the offered action is taken, the code is migrated. In the Inspect and Transform dialogue, the inspections can be managed. Custom hints are stored in .hint files. For example, Dustin Marx' hint to remove extraneous calls to System.currentTimeMillis() from new java.util.Date constructor is written as

<!description="Unnecessary use of System.currentTimeMillis on new Date">

new java.util.Date(java.lang.System.currentTimeMillis())
=> new java.util.Date()
;;

The Java Declarative Hints Format allows matching on variables, modifiers and statements. Fixes can be applied conditionally too.

I do not know if Jackpot hints can be applied across a whole code base at once. As they are inspections, I expect them to be displayed for the whole project - making them fast navigation markers at least. Anyway this is very exciting. It is so exciting that some people wanted to port it to Eclipse (but they never did).

Using Specific Refactoring Plugins
There are a few specific plugins that combine various Refactoring. Eclipse's Source Clean Up Action deserves a honorary mention: It fixes basic warnings on a whole code base, but is very limited. An interesting plugin for Eclipse is Autorefactor which aims to fix language/API usage to deliver smaller, more maintainable and more expressive code bases. Spartan Refactoring is another Eclipse plugin that performs automatic refactoring of Java source code, making it shorter, more idiomatic and more readable.

All these plugins change a predefined, limited set of code patterns, mainly focusing on technical debt. Maybe someone implemented a refactoring plugin for part of the widespread change you need to perform. Search the market places first.

Using Refactoring APIs of IDEs
A possible way is to write your own refactoring plugin. Maybe start with code of a refactoring plugin listed above. Both Eclipse and IntelliJ IDEA offer APIs to manipulating Java code, i.e. the JDT and PSI APIs. I have not done that because it seems too much effort for one time migrations and widespread changes. And reusing the available refactoring tools might raise some problems like waiting for user input which is problematic.

Using (APIs of) Refactoring Browsers
The Refactoring Browser was the first tool that automated Refactoring for Smalltalk. It set the standard for all modern Refactoring tools. Today we have Refactoring Browsers for many languages, e.g. CScout - the C Refactoring Browser, Ruby Refactoring Browser for Emacs or PHP Refactoring Browser which is controlled via the command-line and has plugins for Vim and Emacs. Stand-alone Refactoring tools should be easier to use and script than full blown IDEs, especially if they were designed for different plugins. The idea would be to create code which controls Refactoring Browsers to apply certain changes - again and again.

Python Refactoring Libraries
While searching for Refactoring Browsers I just learnt that there are at least three stand-alone Python Refactoring libraries: Rope comes with Vim and Emacs plugins and is also used in VS Code. The second is Bicycle Repair Man. Further there is Bowler which supposedly is better suited for use from the command-line and encourages scripting. All these libraries are rich in features and used by Vim users. (Yeah, Vim is still going strong.)

Rope (Python)
For example, Rope can be used as a library, which allows custom refactoring steps. Even more, it offers something similar to Jackpot's hints, Restructurings. For example, we split a method f(self, p1, p2) of a class mod.A into f1(self, p1) and f2(self, p2). The following Restructuring updates all call sites:

pattern: ${inst}.f(${p1}, ${p2})
goal:
 ${inst}.f1(${p1})
 ${inst}.f2(${p2})

args:
 inst: type=mod.A

The code to perform the Restructuring using Rope as a library is

from rope.base.project import Project
from rope.refactor import restructure

project = Project('.')

pattern = '${inst}.f(${p1}, ${p2})'
goal = '...'
args = '...'

restructuring = restructure.Restructure(project, pattern, goal, args)

project.do(restructuring.get_changes())

I never used that but it looks cool. The promise of scripted refactoring makes me excited. Another item to add to my to-research list. ;-) And there are more tools on my list, which will have to wait for part 3.

31 October 2018

More on Architectural Refactoring

Last month I showed my categorisation of Architectural Refactoring to some friends in the Software Crafting community. Explaining my ideas and getting feedback helped me to sharpen the definitions further. In my original article I defined four groups of Architectural Refactoring. Here is what we discussed about each one of them:

1. Substitute Architectural Decision
In this category I collected Refactoring techniques which replace whole parts of an architecture, e.g.

Replacing one relational database system with another
Replacing the relational database with a NoSQL one
Replacing UI technologies, e.g. moving from JSF to Vaadin

Adrian Bolboaca further split these categories into groups and used the following three names:

Kinds of Architectural Refactoring (C) SoftDevGang 2018

Kinds of Architectural Refactoring (C) SoftDevGang 2018

1.1. Tool Change
A Tool Change affects the implementation of a "tool", e.g. the database system. The first change in the previous list - replacing one relational database system with another - is a Tool Change. Upgrading libraries or migrating between similar frameworks is a Tool Change too. Initially I had listed them as Widespread Architectural Change. While upgrading or migrating libraries might require similar changes throughout the whole project, they are motivated by the Architectural Decision to use a certain version of a library.

1.2. Technology Change
A Technology Change modifies a used technology. Items two and three of the previous list are Technology Changes. In my experience, most Architectural Refactoring is of this type.

1.3. (Other) NFR Caused Change
Changing a tool or technology is never done for its own sake. We might upgrade a framework for the improved performance of its new version, or switch to another data store because it is more reliable or its license is more suited for the business. NFRs (Non Functional Requirements) are requirements that specify how a system is supposed to be in contrast to functional requirements which define what a system is supposed to do. NFRs usually contain long lists of attributes describing a system like performance, reliability and also security, traceability and globalisation. Examples of such changes are changing aspects like logging or security or applying internationalization. Again I had initially listed them as Widespread Architectural Change.

2. Refactor Architectural Structure
The second category deals with architectural building blocks like components, layers or packages. There are two causes for changing these:

RSAR Package Tangle found at an unnamed client

2.1. Cleaning up Legacy Code
When we start a new project, its architecture is clear. When the system becomes more complex, effects known as Software Architecture Erosion or Architectural Drifts happen. These effects can lead to systems where the realisation of the system diverges from the intended architecture with resulting negative impacts on quality attributes associated with the intended architecture. Uncle Bob uses other words: The architecture starts to rot like a piece of bad meat.

For example look at the package tangle on the right: Several teams were working on a shared code base under an aggressive schedule and produced an architecture where each package was depending on all others, resulting in the legendary Big Ball of Mud architecture. To clean up this mess components with clear interfaces had to be introduced.

2.2. Evolving an Architecture
Today we rather embrace evolving requirements than stick to a big up-front plan. Techniques like Evolutionary Design accommodate the need of changing the design while systems are growing. And also their architectures need to evolve, e.g. we need to extract a new component or aspect which became duplicated during addition of structure to support new features. With the rising popularity of Microservices and their focus on incremental change in the architecture, Evolutionary Architecture has become a reality.

3. Widespread Architectural Change
I am mostly interested in these changes because I want to automate them. I described some basic tools and some more powerful ones. Most examples of changes I used were motivated in the change of Architectural Decisions. Even coding conventions can be seen as part of the NFR Maintainability. There is definitely an overlap, as some architectural Refactoring and also a few structural ones, will be widespread. Adi drew me a Venn diagram to explain that overlap:

Overlap of Categories of Architectural Refactoring, by Adrian Bolboaca

Overlap of Categories of Architectural Refactoring, by Adrian Bolboaca

4. Other Architectural Changes
I am still uncertain if there are other kinds of architectural changes. I miss examples. If you have some, please come forward!

29 August 2018

Widespread Architectural Change Part 1

Last month I introduced Four Categories of Architectural Refactoring:

Substitute Architectural Decision
Refactor Architectural Structure
Widespread Architectural Change
Other Architectural Changes

Today I want to start discussing options to perform Widespread Architectural Changes. This category contains all kind of small changes repeated throughout the whole code base. The number of these changes is high, being hundreds or sometimes even thousands of occurrences that have to be changed in a similar way. Examples of such modifications are:

Migrating or upgrading (similar) APIs, frameworks or libraries
Changing or unifying coding conventions
Fixing compiler warnings and removing technical debt
Consistently applying or changing aspects like logging or security
Applying internationalization
Migrate between languages, e.g. SQL vs. HQL

Ways to do Widespread Architectural Changes (C) by SoftDevGang 2016

All of them do not change the overall structure, they just work on the implementation. Often substituting architectural decisions or altering architectural structure contain similar changes.

Options
I have been using some techniques since 2004 because as Code Cop I value code consistency. Two years ago I had the opportunity to discuss the topic with Alexandru Bolboaca and other experienced developers during a small unconference and we came up with even more options, as shown in the picture on the right. The goal of this and the next articles is to introduce you to these options, the power of each one and the cost of using it. A word of warning: This is a raw list. I have used some but not all of them and I have yet to explore many options in more detail.

Supporting Manual Changes With Fast Navigation
The main challenge of widespread changes is the high number of occurrences. If the change itself is small, finding all occurrences and navigating to them is a major effort. Support for fast navigation would make things easier. The most basic form of this is to modify or delete something and see all resulting compile errors. Of course this only works with static languages. In Eclipse this works very well because Eclipse is compiling the code all the time and you get red markers just in time. Often it is possible to create a list of all lines that need to be changed. In the past have created custom rules of static analysis tools like PMD to find all places I needed to change. Such a list can be used to open the file and jump to the proper line, one source file after another.

As example, here is a Ruby script that converts pmd.xml which is contains the PMD violations from the Apache Maven PMD Plugin to Java stack traces suitable for Eclipse.

#! ruby
require 'rexml/document'

xml_doc = REXML::Document.new(File.new('./target/site/pmd.xml'))
xml_doc.elements.each('pmd/file/violation') do |violation|
  if violation.attributes['class']
    class_name = violation.attributes['package'] + '.' +
                 violation.attributes['class']
    short_name = violation.attributes['class'].sub(/\$.*$/, '')
    line_number = violation.attributes['beginline']
    rule = violation.attributes['rule']

    puts "#{class_name}.m(#{short_name}.java:#{line_number}) #{rule}"
  end
end

The script uses an XML parser to extract class names and line numbers from the violation report. After pasting the output into Eclipse's Stacktrace Console, you can click each line one by one and Eclipse opens each file in the editor with the cursor in the proper line. This is pretty neat. Another way is to script Vim to open each file and navigate to the proper line using vi +<line number> <file name>. Many editors support similar navigation with the pattern <file name>:<line number>.

Enabling fast navigation is a big help. The power this option is high (that is 4 out of 5 on my personal, totally subjective scale) and the effort to find the needed lines and script them might be medium.

Search and Replace (Across File System)
The most straight forward way to change similar code is by Search and Replace. Many tools offer to search and replace across the whole project, allowing to change many files at once. Even converting basic scenarios, which sometimes cover up to 80%, helps a lot. Unfortunately basic search is very limited. I rate its power low and the effort to use it is also low.

Scripted Search and Replace using Regular Expressions
Now Regular Expressions are much more powerful than basic search. Many editors allow Regular Expressions in Search and Replace. I recommend creating a little script. The traditional approach would be Bash with sed and awk but I have used Ruby and Python (or even Perl) to automate lots of different changes. While the script adds extra work to traverse all the source directories and files, the extra flexibility is needed for conditional logic, e.g. adding an import to a new class if it was not imported before. Also in a script Regular Expressions can be nested, i.e. analysing the match of an expression further in a second step. This helps to keep the expressions simple.

For example I used a script to migrate Java's clone() methods from version 1.4 to 5. Java 5 offers covariant return types which can be used for clone(), removing the cast from client code. The following Ruby snippet is called with the name of the class and the full Java source as string:

shortName = shortClassNameFromClassName(className)
if source =~ / Object clone\(\)/
  # use covariant return type for method signature
  source = $` + " #{shortName} clone()" + $'
end
if source =~ /return super\.clone\(\);/
  # add cast to make code compile again
  source = $` + "return (#{shortName}) super.clone();" + $'
end

In the code base where I applied this widespread change, it fixed 90% of all occurrences as clone methods did not do anything else. It also created some broken code which I reverted. I always review automated changes, even large numbers, as jumping from diff to diff is pretty fast with modern tooling. Scripts using Regular Expressions helped me a lot in the past and I rate their power to high. Creating them is some effort, e.g. a medium amount of work.

Macros and Scripts
Many tools like Vim, Emacs, Visual Studio, Notepad++ and IntelliJ IDEA allow creation or recording Keyboard and or mouse macros or Application scripts. With them, frequently used or repetitive sequences of keystrokes and mouse movements can be automated, and that is exactly what we want to do. When using Macros the approach is the opposite as for navigation markers: We find the place of the needed change manually and let the Macro do its magic.

Most people I have talked to know Macros and used them earlier (e.g. 20 years ago) but not recently in modern IDEs. Alex has used Vim Scripts to automate repetitive coding tasks. I have not used it and relating to his experience. Here is a Vim script function which extracts a variable, taken from Gary Bernhardt's dotfiles:

function! ExtractVariable()
  let name = input("Variable name: ")
  if name == ''
    return
  endif

  " Enter visual mode
  normal! gv
  " Replace selected text with the variable name
  exec "normal c" . name
  " Define the variable on the line above
  exec "normal! O" . name . " = "
  " Paste the original selected text to be the variable value
  normal! $p
endfunction

I guess large macros will not be very readable and hard to change but they will be able to do everything what Vim can do, which is everything. ;-) They are powerful and easy to create - as soon as you know Vim script of course.

Structural Search and Replace
IntelliJ IDEA offers Structural Search and Replace which performs search and replace across the whole project, taking advantage of IntelliJ IDEA's awareness of the syntax and code structure of the supported languages.. In other words it is Search and Replace on the Abstract Syntax Tree (AST). Additionally it is possible to apply semantic conditions to the search, for example locate the symbols that are read or written to. It is available for Java and C# (Resharper) and probably in other JetBrains products as well. I have not used it. People who use it tell me that it is useful and easy to use - or maybe not that easy to use. Some people say it is too complicated and they can not make it work.

Online help says that you can apply constraints described as Groovy scripts and make use of IntelliJ IDEA PSI (Program Structure Interface) API for the used programming language. As PSI is the IntelliJ version of the AST, this approach is very powerful but you need to work the PSI/AST which is (by its nature) complicated. This requires a higher effort.

To be continued
And there are many more options to be explored, e.g. scripting code changes inside IDEs or using Refactoring APIs as well as advanced tools outside of IDEs. I will continue my list in the next part. Stay tuned.

13 July 2018

Categories of Architectural Refactoring

At the end of one of my refactoring workshops, we started discussing how to refactor
larger structures, sometimes even changing the architecture of a system. During that discussion we came up with four different categories of larger refactoring which use different techniques and serve different goals:

Replace whole parts
Change the structure of components
Similar change all over the place
Other changes

We struggled to find good names, so let me explain the different things a bit. When I talk about architecture, I mean Software Architecture of a single application, i.e. 1. the different software components and 2. their (inter-) dependencies making up the application. Components are sub-systems and usually contain one or more namespaces or packages. While we can discuss forever what exactly software architecture is and is not, I like the definition of IEEE which in addition to elements (1.) and relationships (2.) adds principles of its design and evolution. I describe these principles as 3. design guidelines which include smaller recommended usage patterns and coding conventions.

Martin Fowler and Neal Ford say that architecture is the stuff that's hard to change later and replacing parts of the architecture involves a lot of work: Imagine (the highly theoretical example of) changing the programming language a system is written in or (a more common example of) moving from a monolith to a Microservice based architecture. These two examples already fall into different categories.

Replacing whole parts of the architecture
Massive changes like changing the UI technology or targeting another platform are possible but a lot of code has to be rewritten. Substitute Algorithm is a classic Fowler refactoring, and so is Substitute Architectural Decision. (This is my third try to name these category of changes.) Architectural Decisions are made to support non-functional requirements like performance, security, connectivity etc and when revisiting them later - with more domain knowledge - better alternatives might come up. Real examples of changing Architectural Decisions include:

Replacing one relational database system with another
Replacing your relational database with a NoSQL one
Replacing UI technologies, e.g. moving from JSF to Vaadin
Changing API technologies, e.g. moving from SOAP to REST
Switching from a fat client to a client-server architecture

Depending on used frameworks and abstractions, some of these changes will be easy, others will not. Even when the necessary change is isolated, it likely touches more than one architectural component. For example moving to another data store might need changing all data access objects in all components which access the data store. The traditional approach is to change everything at once ("Big Leap") but it is possible to refactor the architecture in a safe way, using small steps. In Responsive Design (QCon 2009) Kent Beck describes several strategies to perform such changes:

Leap: Make the new design and use it immediately. (Only if the change is not too big.)
Parallel Change: Make the new design and run both new and old simultaneously.
Stepping Stone: Create a platform to bring the desired new design within reach.
Simplification: Only implement part of the design now and add complexity bit-by-bit later.
Hack it: Just do it and pay price later. (This is maybe not a real strategy ;-)

Changing the structure of architectural components
The next category is similar to Substitute Architectural Decision but is focused on the architectural building blocks - the components. Some examples are:

Introduce or extract new component, module or package
Remove or invert dependencies between components, e.g. remove cyclic dependencies
Introduce layering, add a new layer, clean up layering violations
Weaken potential trouble spots in the architecture, e.g. Butterfly, Breakable, Hub and Tangle

This category of changes is the "classic" Architectural Refactoring because that is how people call it. For example - in accordance to Martin Fowler's definition - Dave Adsit defines Architectural Refactoring as intentionally changing the structure of a system without altering its features. Similarly Sean Barow explains Architectural Refactoring as the deliberate process to remove architectural smells while not changing the scope or functionality of the code. So it is always structural and deals with components (1.), their relationships (2.) and their inner structure. Let's call it Refactor Architectural Structure to distinguish it from other architectural refactoring.

Sean Barow uses the term Code Refactoring for the refactoring defined by Fowler. Code refactoring focuses on software entities like classes, methods, fields and variables. Architectural refactoring involves code refactoring, which leads to confusion between the two. Working with dependencies needs extracting or moving classes, methods or fields. We can work each class or dependency at once, which supports an incremental approach.

Performing similar changes throughout the whole project
This is another category and it feels like Shotgun Surgery: A usually small change has to be repeated all over the place. While Shotgun Surgery is a code smell for feature logic, it is unavoidable if not required for consistent structure and usage defined by the architecture. For example let's assume we want to replace EasyMock with Mockito. Both libraries serve the same purpose and are used in a similar way, but their APIs are different. (Facing such a problem indicates that we are coupled to much to a used library - maybe we should have wrapped it, maybe not. But that discussion is outside the scope of this article.) Converting a single usage of EasyMock to Mockito is straight forward: We change the method invocations and execute the calls earlier or later in the test. The only problem is that we have hundreds of such cases. (Update 31st October 2018: Some people have pointed out that build tools and testing frameworks are not considered part of the architecture. In that case a migration between mocking libraries is misleading and I should have used another example. Still the idea is the same.) More examples of similar "Shotgun" changes are:

Upgrading a major version of a library with breaking API changes
Migrating between similar frameworks or libraries
Changing or unifying coding conventions and fixing violations
Consistently applying or changing aspects like logging or security
Applying internationalization

These changes do not change the overall structure, they just work on the implementation. They are similar to Refactor Architectural Structure, just deal with design guidelines, coding conventions and smaller usage patterns - item 3 from the definition of Software Architecture in the beginning. They are code refactoring applied in many places consistently across a whole code base.

These category is important, because some steps of both Substitute Decision and Refactor Structure need similar changes. The main challenge of these changes is the high number of occurrences. Changing conventions and aspects is a piece by piece work and can be done manually. Nevertheless some automation tool or script would help a lot. Even converting only basic scenarios, which often cover up to 80%, is a big win. Unfortunately some changes, e.g. migrating libraries, need to be done all at once. In such cases a Stepping Stone approach has worked well for me in the past: First wrapping the library piece by piece - incrementally - and then changing the wrapper and all invocations at once. Creating the wrapper is a lot of work. Again, automated migration removes the need for a wrapper. There are some options available, which I will cover in a future article.

I have not found a good name for this category. I used to call it Large Scale Refactoring, but I am not happy with that name. It is a similar or identical code refactoring applied to many places widespread throughout the code. It is a Widespread Architectural Change.

Other Architectural Changes
During our initial discussion, we also considered other refactoring. To be on the safe side, always allow a category of unknown things ;-). At the moment I have no real examples for them. These are smaller changes neither related to architectural decisions or structure nor are they widespread. I believe we find no examples because these are plain code changes and we do not consider them architectural even if they are related to the architecture of the applications in a way.

Summary
Next to the code we also can and should refactor the architecture of our applications. Architectural changes are expensive, still sooner or later things need to change if the application is in service just long enough. There are at least four categories of architectural refactoring:

Substitute Architectural Decision
Refactor Architectural Structure
Widespread Architectural Change
Other Architectural Changes

After discussing the topic with some friends, I wrote a followup with clarifications. I also used some tools to automate Widespread Architectural Changes.

2 October 2014

Visualising Architecture: Maven Dependency Graph

Last year, during my CodeCopTour, I pair programmed with Alexander Dinauer. During a break he mentioned that he just created a Maven Module Dependency Graph Creator. How awesome is that. The Graph Creator is a Gradle project that allows you to visually analyse the dependencies of a multi-module Maven project. First it sounds a bit weird - a Gradle project to analyse Maven modules. It generates a graph in Graphviz DOT language and an html report using Viz.js. You can use the dot tool which comes with Graphviz to transform the resulting graph file into JPG, SVG etc.

In a regular project there is not much use for the graph creator tool, but in super large projects there definitely is. I like to use it when I run code reviews for my clients, which involve codebases around 1 million lines of code or more. Such projects can contain up to two hundred Maven modules. Maven prohibits cyclic dependencies, still projects of such size often show crazy internal structure. Here is a (down scaled) example from one of my clients:

Maven dependencies in large project

Looks pretty wild, doesn't it? The graph is the first step in my analysis. I need to load it into a graph editor and cluster modules into their architectural components. Then outliers and architectural anomalies are more explicit.

15 November 2011

Visualising Architecture: Eclipse Plug-in Dependencies

In my job at the Blue Company I am working on a product family of large Eclipse RCP applications. Each application consists of a number of OSGi plug-ins which are the main building block of Eclipse RCP. There are dedicated as well as shared plug-ins. I am new to the project and struggle to understand which applications depend on which plug-ins. I would like to see a component diagram, showing the dependencies between modules, i.e. on OSGi level.

There is an Eclipse's Dependency Visualization tool inside the Eclipse Incubator project: Plug-in org.eclipse.pde.visualization.dependency_1.0.0.20110531 which displays dependencies of plug-ins. This is exactly what I need. Here is the visualization of one of the products:

(The yellow and green bounding boxes were added manually and indicate parts of the architecture.) The tool's main use is to identify unresolved plug-ins, a common problem in large applications. While you find unresolved plug-ins by running the application, the tool helps even before the application is launched. For more information see this article on developerWorks about the use of the Dependency Visualization. The latest version of the plug-in, including its download for Eclipse 3.5.x, is available at the testdrivenguy's blog.

28 December 2010

Architecture Rules

Unfortunately checking Java code for compliance with a given software architecture is not done on every project, as far as I know only a few experts do it. This is not good because "if it's not checked, it's not there". Free tool support for enforcing architectural rules has been lacking since long but recently things have changed. Well, maybe not recently but recently I thought about it :-)

Macker
There have always been a few basic tools to check the references of classes. One of these tools is Macker. It's quite old and not actively maintained any more, but I've been using it for years and it worked great for me. It's small and simple and yet immensely powerful. I love it and use it whenever possible. Everybody should use it. And don't turn it off!

A Shameless Plug
Macker and some other tools that can be used to enforce architecture rules are described in the fourth part of my 'Code Cop' series, published in the German magazine iX last summer: Automatisierte Architektur-Reviews (Automated Architecture Reviews) (iX 6/2010).

Architecture Reviews with Ant
The article describes several free tools (PMD, Macker and Classycle) and how to use them with Ant to verify different aspects of an architecture. Each of them has its advantages and disadvantages and when used together they are useful. The examples of the Ant integration and the PMD/Macker/Classycle architecture rules as given in the article should help you getting started with your own checks.

Macker and Maven
As I said above, Macker comes with proper Ant support, but Maven integration has been lacking. Last year I wanted to use Macker on an Maven project but was disappointed by the immaturity of the MackerMavenPlugin available on Codehaus. So I had to enhance it a bit. I submitted some patches which got accepted but still the MOJO wouldn't get promoted out of its sandbox state. Fortunately I keep an unofficial release (0.9) in my own Maven repository. So finally proper Maven integration of Macker is available.

Other Tools
I guess there are other tools available to be used with Maven. For example there is Architecture Rules with its maven-architecture-rules-plugin which uses JDepend under the hood. It looks promising but I haven't used it in production yet. And I hear that Sonar 2.4 is able to check architecture rules as well but I didn't try it till now.

(List of all my publications with abstracts.)

4 September 2007

I could disable Macker-check

Do you know Macker by Paul Cantrell? It's a build-time architectural rule checking utility for Java developers. Quite old, but we have been using it for years and it still works great for us.

As I am the code cop here, I tend to torture colleagues with forbidden references ;-) So one made this little modification to a comic from xkcd by Randall Munroe. I came across it recently while organising some project folders... So here it is:

I could disable Macker-check instead