Hi!
I have been testing the virtual server software and have tried to
maximize the load as much as possible. I have used the tunneling
version and have tried to open as many HTTP connections as possible
to 3 real servers. It appeared the load balancer (router) quickly
became the bottleneck and started hogging the virtual server
performance. The 'top' process showed the system load increased to
about 40 and more, the computer became totally unresponsive.
When we stopped the test, it took the server several minutes to get
back to normal.
I have tried to reduce the MASQUERADE_EXPIRE... (FIN) timeouts
and that made a lot of difference. The router was able to handle
much larger load, but that doesn't seem to solve the problem
completely.
It appears the problem is the number of active TCP connections
(masquerading entries) in the hash table. By we have tried to
enlarge the hash table size but that didn't solve the problem.
I have cheched the code, but I didn't do any profiling. The only
idea I had in my mind, when looking at the code, was what happens
when there are a LOT of masquerading entries. It appears the
problem lies in timeout detection code.
An entry needs to be removed from the table, when a timeout occurs.
Therefore when an entry is inserted into the table, a system timer
is created and added to the system timer list by using the 'add_timer'
function. That means that when there are 100000 entries in the hash
table, then there are a 100000 timers in the system timer list.
Each time a new IP packet arrives, one timer needs to be updaded
(actually removed and reinserted into the linked list). The insert
operation quickly becomes quite expensive (timely) because the list
need to remain sorted and a proper insertion position needs to be
found.
Therefore the more active connections there are, the less responsive
the server becomes.
I think the solution to the problem would be to write custom 'add_timer'
and 'del_timer' that would be a bit smarter than the original version.
The new code would mantain its own doubly linked list of timer_list
entries with several additional pointers to the list. One pointer would
point to the position where the MASQUERADE_EXPIRE_TCP timers should be
inserted, one to the MASQUERADE_EXPIRE_UDP, one for FIN and so on.
Therefore the insertion would become a constant-time task and it would
speed things up.
I haven't done any coding yet, but I will probably do it by the end of
the week.
Cheers,
Peter
|