Nice :)
Just a little remark. You should mark rotl32 and fmix as inline to speed it up.
Here's a modified version originally from Lionel Delafosse, I've only fixed it up to support 64bit.
// Originally converted by Lionell Delafosse function MurmurHash3_32(AKey: PByte; ALength, ASeed: UInt32): UInt32; const c1 = $CC9E2D51; c2 = $1B873593; var i: integer; h1, k1: UInt32; begin h1 := ASeed; i := ALength div SizeOf(UInt32); while i <> 0 do begin k1 := PCardinal(AKey)^ * c1; k1 := ((k1 shl 15) or (k1 shr 17)) * c2; h1 := h1 xor k1; h1 := (h1 shl 13) or (h1 shr 19); h1 := ((h1 shl 2) + h1) + $E6546B64; inc(AKey, SizeOf(UInt32)); dec(i); end; // tail if (ALength and 1) <> 0 then begin if (ALength and 2) <> 0 then k1 := (((UInt32(PByte(AKey+2)^) shl 16) xor PWord(AKey)^) * c1) // 3 bytes else k1 := UInt32(PByte(AKey)^) * c1; // 1 bytes h1 := h1 xor (((k1 shl 16) or (k1 shr 16)) * c2); end else if (ALength and 2) <> 0 then begin k1 := UInt32(PWord(AKey)^) * c1; // 2 bytes h1 := h1 xor (((k1 shl 16) or (k1 shr 16)) * c2); end; // finalization mix - force all bits of hash block to avalanche within 0.25% bias h1 := h1 xor ALength; h1 := (h1 xor (h1 shr 16)) * $85EBCA6B; h1 := (h1 xor (h1 shr 13)) * $C2B2AE35; result := h1 xor (h1 shr 16); end;
This should be a good starting point to do an assembly version. It's stated that this Murmurhash should be as fast as Murmur2, but it's not the case for the assembly generated by this code.
Maybe someone could take a look at the assembly generated when the C++ version is compiled with all compiler optimizations in place.
A little final note: The MurmurHash3_64 implementation should be better to use when compiling for 64bit.
-Atle