## 12 August 2011

### Word Wrap Kata Variants

It's time for some exercise - time for a code kata. I like the simple ones because they don't take much time and still provide a certain amount of training. After having done the Prime Factors Kata more than 20 times using Java, Ruby, C#, Turbo Pascal, BASIC and even Forth, I feel like trying a new one for a change. I choose the Word Wrap Kata, also by Uncle Bob. Unlike Prime Factors, Word Wrap seems to be less popular, there are only a few experiences with it published, e.g. Word Wrap using Python.

Recursive
The kata's task is to write a function that, like a word processor, breaks the line by replacing the last space in a line with a newline. The most straight forward solution to this is IMHO the recursive one. I only need to consider the first blank or forced break and call myself with the remaining text.
```public String wrap(String line, int maxLineLen) {
if (line.length() <= maxLineLen) {
return line;
}
int indexOfBlank = line.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
return line.substring(0, split) + NEWLINE +
wrap(line.substring(split + offset), maxLineLen);
}```
Usually kata is not about the final solution, but about the process to get there. Still I want to compare different solutions, so let's analyse this one. If the `line` needs to be split n times (into n+1 shorter lines) then this solution creates 3*n `String` objects and additional n `StringBuilder`s for string concatenation. ... 4*n objects are created.

Tail Recursive
To be tail recursive a function's last statement must be the recursive call.
```public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
wrap(line, maxLineLen, accumulator);
return accumulator.toString();
}
private void wrap(String remainingLine, int maxLineLen, StringBuilder accumulator) {
if (remainingLine.length() <= maxLineLen) {
accumulator.append(remainingLine);
return;
}
int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
accumulator.append(remainingLine.substring(0, split));
accumulator.append(NEWLINE);
wrap(remainingLine.substring(split + offset), maxLineLen, accumulator);
}```
This solution creates the new `String` in the very end. It needs 2 `String`s per new line and only one `StringBuilder` and a final `String` to return it. ... 2*n+2 objects are created.

Loop
A tail recursive function can be rewritten to reuse the stack frame transforming it into a plain loop. As Java does not support that optimisation, I do it by hand.
```public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
String remainingLine = line;
while (remainingLine.length() > maxLineLen) {
int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
accumulator.append(remainingLine.substring(0, split));
accumulator.append(NEWLINE);
remainingLine = remainingLine.substring(split + offset);
}
accumulator.append(remainingLine);
return accumulator.toString();
}```
The loopy :-) solution creates the same number of objects as the tail recursive one, but has reduced call overhead. ... Still 2*n+2 objects are created.

Optimised Loop
Let's optimise away the splitting of the remaining line because it gets split again in the next call, so all these `String`s are only temporarily used.
```public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
int pos = 0;
while (pos + maxLineLen < line.length()) {
int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
int split;
int offset;
if (indexOfBlank > pos - 1) {
split = indexOfBlank;
offset = 1;
} else {
split = pos + maxLineLen;
offset = 0;
}
accumulator.append(line.substring(pos, split));
accumulator.append(NEWLINE);
pos = split + offset;
}
accumulator.append(line.substring(pos));
return accumulator.toString();
}```
Now only one `String` is created per new line, one for the last remaining part and one after the final concatenation. ... n+3 objects are created.

Using a Buffer
If I could get rid of half of the `String` splitting, why not drop the other half too?
```public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
accumulator.append(line);
int pos = 0;
int inserted = 0;
while (pos + maxLineLen < line.length()) {
int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
if (indexOfBlank > pos - 1) {
accumulator.setCharAt(inserted + indexOfBlank, NEWLINE);
pos = indexOfBlank + 1;
} else {
accumulator.insert(inserted + pos + maxLineLen, NEWLINE);
pos = pos + maxLineLen;
inserted++;
}
}
return accumulator.toString();
}```
Only one `StringBuilder` and one `String` are created. ... 2 objects are created. This is definitely the most garbage collector friendly solution because it creates only one temporary object, the `StringBuilder`.

Copying Characters
If there are blanks in `line` then all characters get copied once in `append()` and all solutions behave similar. (The method `substring()` does not copy characters, it just creates a new `String` with different pointers in the character array of the original `String`.) But if there are no blanks in `line` then the last solution copies all characters after the split point around one or more times. Copying large numbers of characters might be slower than allocating an object, especially as the JVM/Hotspot is optimised for short lived, small objects.

Resizing the StringBuilder
For all solutions using an explicit `StringBuilder` another optimisation is to avoid automatic resizing. The size of `accumulator` must be large enough to contain all additional newlines. If a line has a blank then it's replaced, so no new character is added. Only when a line contains no blank, then a newline is inserted. This can happen up to `lineLen / maxLineLen` times. Flooring (rounding down in integer division) is the right thing because after the last part, e.g. `remainingLine`, nothing is added.
```...
StringBuilder accumulator =
new StringBuilder(calcMaxSize(line.length(), maxLineLen));
...
private int calcMaxSize(int lineLen, int maxLineLen) {
int maxCharsAdded = lineLen / maxLineLen;
return lineLen + maxCharsAdded;
}```
Note that all these optimisations are highly theoretical. In a typical web or database application it doesn't matter at all if a few temporary objects are created or not. Remember that "Early optimisation is the root of all evil" (Donald Knuth). I would stick with the first solution because it's the shortest and easy to understand.

Bonus Round
```public String wrap(String line, int maxLineLen) {
return line.replaceAll("([^ ]{" + maxLineLen + "})" + // 1
"(?=[^ ])" +                                       // 2
"|" +                                              // 3
"(.{1," + maxLineLen + "})" +                      // 4
" ",                                               // 5
"\$1\$2" + NEWLINE);
}```
This solution is even shorter, just one line using a Regular Expression. It replaces the split points with a newline or adds one. The pattern matches areas of `line` which (1) contain exact `maxLineLen` characters that are not blanks, (2) which must be followed by a character that's not a blank (and which is not consumed) (3) or it matches areas of (4) one up to `maxLineLen` characters (5) which are followed by a blank. The match is replaced with the first (1) and second (3) groups together with a newline. The single blank (5) is not a member of any group and is dropped.

(Get the source at Bitbucket.)

Chuck Krutsinger said...
This comment has been removed by the author.
Peter Kofler said...

Chuck,
thank you for the comment on the Bonus Round solution. Yes there is a problem if maxLineLen is greater than the length of line, then no newline should be inserted.

I had a look and fixed the regex, which makes it even more incomprehensible ;-) Between the lines // 3 and // 4 I added

"(?=.{" + maxLineLen + "}.)" +

which by negative look-ahead makes sure that // 4 and // 5 only match if there are more than maxLineLen characters in the line. Thank you for pointing it out. I update the source.