I definitely wouldn't condone such a thing.
I know how nasty that would be.
Plus if I think about it, I'd say that the rehashing that would need to take
place would consume more CPU cycles than a yet-to-be-proven gain from increasing
the bucket size.
Agreed. To expand on this for the benefit of others. The hash table is
just that. A hash. Each bucket in the hash can have multiple entries.
The implementation is such that each bucket is a linked list of entries.
Exactly. And around 2000 we tuned the hash generation function to have the best
balanced distribution over all buckets with a magic prime number which IIRC was
pretty exactly the golden ratio if you divided 2^32 through that number.
ratz@zar:~ > echo "(2^32)/2654435761" | bc -l
1.61803399392945414737
ratz@zar:~ >
We took the wisdom from Chuck Lever's paper "Linux Kernel Hash Table Behavior:
Analysis and Improvements" [1]. So once you have an evenly distributed hash
table the search for an entry is almost best case for all entries.
Thus there is no limit on the number of entries the hash table can hold
(other than the amount of memory available). The only advantage of
increasing the hash table size is to reduce, statistically speaking, the
number of entries in each bucket. And thus the amount of hash table
Which gains you almost nothing. The search is still ... left as an exercise to
all CS students here :)
traversal. To be honest I think the gain is probably minimal even in
extreme circumstances.
Especially since computing the hash value entry uses most of the times almost
the same amount of time as parsing the linked list.
Anecdotally, a colleague of mine did a test on making the linked lists
reordering. So that the most recently used entry was moved to the front.
Interesting approach. However I guess that the cache line probably already had
this entry. It would be interesting to see the amount of TLB flushes done by the
kernel depending on the amount of traffic and hash table size.
He then pushed a little bit traffic through the LVS box (>700Mbit/s).
We didn't really see any improvement. Which would make me think that
hash table performance isn't a particular bottleneck in the current
code.
Neither do I.
[1] http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|