22 October 2011

Freeze Custom Ruby Strings When Used as Keys in Hash

Last week I spent quite some time chasing a single issue in my JavaClass Ruby gem. It really annoyed me and I could not find anything useful even using Google. I had to dig deep. Read what happened: I began with some kind of rich string, quite similar to the following class:
class RichString < String
def initialize(string)
super(string)
@data = string[0..0] # some manipulation here
end
def data
@data
end
end

word = RichString.new('word')
puts word # => word
puts word.data # => w
That was not special and worked as expected.

Lost ... !!Then I happened to use instances of RichString as keys in a hash. Why shouldn't I? They were still normal Strings and their data should be ignored when used in the hash.
map = {}
map[word] = :anything

word_key = map.keys[0]
puts word_key # => word
puts word_key.data # => nil
The last line warned me "instance variable @data not initialized". Oops, my little @data went missing indicated by the bold nil in the last line. First I did not know what was causing the problems. I was baffled as all tests were green and had a good coverage. I spent some time digging and rewriting a lot of functionality until I found that Hash#keys() caused the trouble when given my RichStrings as hash keys.
puts word == word_key   # => true
puts word.object_id == word_key.object_id # => false
Aha, Hash changed the keys. It's reasonable to prohibit key changes, so a String passed as a key will be duplicated and frozen. (RTFM always helps ;-) But how did it do that? It did not call dup() on the RichString. As Hash is natively implemented, I ended up in the C source hash.c.
/*
* call-seq:
* hsh[key] = value => value
* hsh.store(key, value) => value
*/

VALUE
rb_hash_aset(hash, key, val)
VALUE hash, key, val;
{
rb_hash_modify(hash);
if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
st_insert(RHASH(hash)->tbl, key, val);
}
else {
st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
}
return val;
}
So when the key is a String and not already included in the hash, then rb_str_new4 is called. (I just love descriptive names ;-) Furthermore string.c revealed some fiddling with the original key.
VALUE
rb_str_new4(orig)
VALUE orig;
{
VALUE klass, str;

if (OBJ_FROZEN(orig)) return orig;
klass = rb_obj_class(orig);
if (FL_TEST(orig, ELTS_SHARED) &&
(str = RSTRING(orig)->aux.shared) &&
klass == RBASIC(str)->klass) {
long ofs;
ofs = RSTRING(str)->len - RSTRING(orig)->len;
if ((ofs > 0) || (!OBJ_TAINTED(str) && OBJ_TAINTED(orig))) {
str = str_new3(klass, str);
RSTRING(str)->ptr += ofs;
RSTRING(str)->len -= ofs;
}
}
else if (FL_TEST(orig, STR_ASSOC)) {
str = str_new(klass, RSTRING(orig)->ptr, RSTRING(orig)->len);
}
else {
str = str_new4(klass, orig);
}
OBJ_INFECT(str, orig);
OBJ_FREEZE(str);
return str;
}
Frozen StringI didn't quite understand what was going on in rb_str_new4(), but it was sufficient to read a few lines: If the original string was frozen, then it was used directly. I verified that.
map = {}
map[word.freeze] = :anything

word_key = map.keys[0]
puts word_key # => word
puts word_key.data # => w
Excellent, finally my @data showed up as expected. Fixing the problem added some complexity dealing with frozen values, but it worked.

Freeze your custom Ruby strings when you use them as keys in a hash (and want to retrieve them with Hash#keys())

19 October 2011

Awesome Book Marks

Since the beginning of code-cop.org I have put strong emphasis on my personal branding. Till now I have created various t-shirts, business cards and buttons. I use these buttons to award conference speakers who delivered good presentations and to thank contributors who helped during Hackergarten. But now my mother-in-law excelled all of them. See my new hand embroidered book marks:

Awesome code-cop.org Book Marks
These are some awesome masterpieces! Thank you Lidia.