class RichString < String def initialize(string) super(string) @data = string[0..0] # some manipulation here end def data @data end end word = RichString.new('word') puts word # => word puts word.data # => wThat was not special and worked as expected.
Then I happened to use instances of
RichString
as keys in a hash. Why shouldn't I? They were still normal String
s and their data should be ignored when used in the hash.map = {} map[word] = :anything word_key = map.keys[0] puts word_key # => word puts word_key.data # => nilThe last line warned me "instance variable
@data
not initialized". Oops, my little @data
went missing indicated by the bold nil in the last line. First I did not know what was causing the problems. I was baffled as all tests were green and had a good coverage. I spent some time digging and rewriting a lot of functionality until I found that Hash#keys()
caused the trouble when given my RichString
s as hash keys.puts word == word_key # => true puts word.object_id == word_key.object_id # => falseAha,
Hash
changed the keys. It's reasonable to prohibit key changes, so a String
passed as a key will be duplicated and frozen. (RTFM always helps ;-) But how did it do that? It did not call dup()
on the RichString
. As Hash
is natively implemented, I ended up in the C source hash.c
./* * call-seq: * hsh[key] = value => value * hsh.store(key, value) => value */ VALUE rb_hash_aset(hash, key, val) VALUE hash, key, val; { rb_hash_modify(hash); if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) { st_insert(RHASH(hash)->tbl, key, val); } else { st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val); } return val; }So when the
key
is a String
and not already included in the hash, then rb_str_new4
is called. (I just love descriptive names ;-) Furthermore string.c
revealed some fiddling with the original key.VALUE rb_str_new4(orig) VALUE orig; { VALUE klass, str; if (OBJ_FROZEN(orig)) return orig; klass = rb_obj_class(orig); if (FL_TEST(orig, ELTS_SHARED) && (str = RSTRING(orig)->aux.shared) && klass == RBASIC(str)->klass) { long ofs; ofs = RSTRING(str)->len - RSTRING(orig)->len; if ((ofs > 0) || (!OBJ_TAINTED(str) && OBJ_TAINTED(orig))) { str = str_new3(klass, str); RSTRING(str)->ptr += ofs; RSTRING(str)->len -= ofs; } } else if (FL_TEST(orig, STR_ASSOC)) { str = str_new(klass, RSTRING(orig)->ptr, RSTRING(orig)->len); } else { str = str_new4(klass, orig); } OBJ_INFECT(str, orig); OBJ_FREEZE(str); return str; }I didn't quite understand what was going on in
rb_str_new4()
, but it was sufficient to read a few lines: If the original string was frozen, then it was used directly. I verified that.map = {} map[word.freeze] = :anything word_key = map.keys[0] puts word_key # => word puts word_key.data # => wExcellent, finally my
@data
showed up as expected. Fixing the problem added some complexity dealing with frozen values, but it worked.Freeze your custom Ruby strings when you use them as keys in a hash (and want to retrieve them with
Hash#keys()
)