It's time for some exercise - time for a code kata. I like the simple ones because they don't take much time and still provide a certain amount of training. After having done the
Prime Factors Kata more than 20 times using Java, Ruby, C#,
Turbo Pascal,
BASIC and even Forth, I feel like trying a new one for a change. I choose the
Word Wrap Kata, also by Uncle Bob. Unlike Prime Factors, Word Wrap seems to be less popular, there are only a few experiences with it published, e.g.
Word Wrap using Python.
Recursive
The kata's task is to write a function that, like a word processor, breaks the line by replacing the last space in a line with a newline. The most straight forward solution to this is IMHO the recursive one. I only need to consider the first blank or forced break and call myself with the remaining text.
public String wrap(String line, int maxLineLen) {
if (line.length() <= maxLineLen) {
return line;
}
int indexOfBlank = line.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
return line.substring(0, split) + NEWLINE +
wrap(line.substring(split + offset), maxLineLen);
}
Usually kata is not about the final solution, but about the process to get there. Still I want to compare different solutions, so let's analyse this one. If the
line
needs to be split
n times (into
n+1 shorter lines) then this solution creates
3*n String
objects and additional
n StringBuilder
s for string concatenation. ...
4*n objects are created.
Tail Recursive
To be
tail recursive a function's last statement must be the recursive call.
public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
wrap(line, maxLineLen, accumulator);
return accumulator.toString();
}
private void wrap(String remainingLine, int maxLineLen, StringBuilder accumulator) {
if (remainingLine.length() <= maxLineLen) {
accumulator.append(remainingLine);
return;
}
int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
accumulator.append(remainingLine.substring(0, split));
accumulator.append(NEWLINE);
wrap(remainingLine.substring(split + offset), maxLineLen, accumulator);
}
This solution creates the new
String
in the very end. It needs 2
String
s per new line and only one
StringBuilder
and a final
String
to return it. ...
2*n+2 objects are created.
Loop
A tail recursive function can be rewritten to reuse the stack frame transforming it into a plain loop. As Java does not support that optimisation, I do it by hand.
public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
String remainingLine = line;
while (remainingLine.length() > maxLineLen) {
int indexOfBlank = remainingLine.lastIndexOf(BLANK, maxLineLen);
int split;
int offset;
if (indexOfBlank > -1) {
split = indexOfBlank;
offset = 1;
} else {
split = maxLineLen;
offset = 0;
}
accumulator.append(remainingLine.substring(0, split));
accumulator.append(NEWLINE);
remainingLine = remainingLine.substring(split + offset);
}
accumulator.append(remainingLine);
return accumulator.toString();
}
The loopy :-) solution creates the same number of objects as the tail recursive one, but has reduced call overhead. ... Still
2*n+2 objects are created.
Optimised Loop
Let's optimise away the splitting of the remaining line because it gets split again in the next call, so all these
String
s are only temporarily used.
public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
int pos = 0;
while (pos + maxLineLen < line.length()) {
int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
int split;
int offset;
if (indexOfBlank > pos - 1) {
split = indexOfBlank;
offset = 1;
} else {
split = pos + maxLineLen;
offset = 0;
}
accumulator.append(line.substring(pos, split));
accumulator.append(NEWLINE);
pos = split + offset;
}
accumulator.append(line.substring(pos));
return accumulator.toString();
}
Now only one
String
is created per new line, one for the last remaining part and one after the final concatenation. ...
n+3 objects are created.
Using a Buffer
If I could get rid of half of the
String
splitting, why not drop the other half too?
public String wrap(String line, int maxLineLen) {
StringBuilder accumulator = new StringBuilder();
accumulator.append(line);
int pos = 0;
int inserted = 0;
while (pos + maxLineLen < line.length()) {
int indexOfBlank = line.lastIndexOf(BLANK, pos + maxLineLen);
if (indexOfBlank > pos - 1) {
accumulator.setCharAt(inserted + indexOfBlank, NEWLINE);
pos = indexOfBlank + 1;
} else {
accumulator.insert(inserted + pos + maxLineLen, NEWLINE);
pos = pos + maxLineLen;
inserted++;
}
}
return accumulator.toString();
}
Only one
StringBuilder
and one
String
are created. ...
2 objects are created. This is definitely the most garbage collector friendly solution because it creates only one temporary object, the
StringBuilder
.
Copying Characters
If there are blanks in
line
then all characters get copied once in
append()
and all solutions behave similar. (The method
substring()
does not copy characters, it just creates a new
String
with different pointers in the character array of the original
String
.) But if there are no blanks in
line
then the last solution copies all characters after the split point around one or more times. Copying large numbers of characters might be slower than allocating an object, especially as the JVM/Hotspot is optimised for short lived, small objects.
Resizing the StringBuilder
For all solutions using an explicit
StringBuilder
another optimisation is to avoid automatic resizing. The size of
accumulator
must be large enough to contain all additional newlines. If a line has a blank then it's replaced, so no new character is added. Only when a line contains no blank, then a newline is inserted. This can happen up to
lineLen / maxLineLen
times. Flooring (rounding down in integer division) is the right thing because after the last part, e.g.
remainingLine
, nothing is added.
...
StringBuilder accumulator =
new StringBuilder(calcMaxSize(line.length(), maxLineLen));
...
private int calcMaxSize(int lineLen, int maxLineLen) {
int maxCharsAdded = lineLen / maxLineLen;
return lineLen + maxCharsAdded;
}
Note that all these optimisations are highly theoretical. In a typical web or database application it doesn't matter at all if a few temporary objects are created or not. Remember that "Early optimisation is the root of all evil" (Donald Knuth). I would stick with the first solution because it's the shortest and easy to understand.
Bonus Roundpublic String wrap(String line, int maxLineLen) {
return line.replaceAll("([^ ]{" + maxLineLen + "})" + // 1
"(?=[^ ])" + // 2
"|" + // 3
"(.{1," + maxLineLen + "})" + // 4
" ", // 5
"$1$2" + NEWLINE);
}
This solution is even shorter, just one line using a Regular Expression. It replaces the split points with a newline or adds one. The pattern matches areas of
line
which (1) contain exact
maxLineLen
characters that are not blanks, (2) which must be followed by a character that's not a blank (and which is not consumed) (3) or it matches areas of (4) one up to
maxLineLen
characters (5) which are followed by a blank. The match is replaced with the first (1) and second (3) groups together with a newline. The single blank (5) is not a member of any group and is dropped.
(
Get the source.)