Monday, 29 June 2009

Performance benchmark of CRC32, MD5 and SHA1 implementations in Ruby

Not long ago, I was experimenting with memcached from Ruby and I suddenly reached the maximum length of the key used to write to memcached, which is by default 250 chars.
However in my use case, I need to concatenate several requests params to create a unique key for my cached data.
So I fought of using some sort of hashing, unique to start with. Immediately thought of MD5. Found out it is available in the digest module so I used it.
Finally when I looked again at that code, I wondered if the MD5 was not actually too slow and counter productive in a caching mechanism. I could use SHA1 or even CRC32 as the collision risk is small and was not an issue in my instance.
So I ran the following ruby benchmark, to compare all implementations.

require 'benchmark'
require 'digest/md5'
require 'digest/sha1'
require 'zlib'

def b(n,s) do |x|"base:") { for i in 1..n; s; end }"crc32:") { for i in 1..n; Zlib.crc32(s); end }"md5:") { for i in 1..n; Digest::MD5.hexdigest(s); end }"sha1:") { for i in 1..n; Digest::SHA1.hexdigest(s); end }

And got the output for 100,000 iterations

irb(main):016:0> b(100000,
user system total real
base: 0.020000 0.000000 0.020000 ( 0.018913)
crc32: 0.090000 0.010000 0.100000 ( 0.096223)
md5: 0.350000 0.000000 0.350000 ( 0.420743)
sha1: 0.390000 0.010000 0.400000 ( 0.421582)

Interestingly, SHA1 and MD5 implementations are similarly slow in comparison the CRC32.
CRC32 is roughly 4 times faster than SHA1 or MD5 but 5 times slower than no hashing.
Of course, the CRC32 algorithm has a risk of collision but if used with another attribute like a timestamp, it creates a key fairly unique and suitable for caching.