Hello,
Which count? ipvsadm -L -n or ab? What exactly do both tell you at the
point of saturation? What kind of NICs do you use? Could you check their
speed setting with either mii-tool or ethtool, please?
Supports auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Ok, so this is fine.
The count was from ab however I checked when doing the same tests and it
was 500 per real server.
So it's a RS limitation. Maybe I didn't read your email carefully enough but
what is the average time to fetch _one_ page and how _big_ is it in bytes? Also
what is the load on the RS during the test?
and I was getting this from ipvsadm
TCP 192.168.55.5:http rr
-> 192.168.55.1:http Route 1 500 14699
-> 192.168.55.3:http Route 1 500 15404
Eek, your RS are ill. Sockets are not closed anymore there. I'm very much
interested in the page you're trying to fetch now.
I also tried to patch httperf as per their suggestion but when I did
this it just sat and ate cpu time.
;).
Should the patch work or should I be doing something else?
I put a smiley there as I was never able to get meaningful output from httperf
either myself. Most probably because of my lack of understanding how it worked.
test ----> LVS ----> RS
test --------------> RS
Yea, that is what I have been doing, sorry for not explaining it
clearly.
Thanks for confirmation; so this and the inactive counters from above to me
indicate that your RS application does not close the sockets properly. We'll
have to do some more testing then, once I get some more information about the
page and its size.
What is also funny is that you have a limit on exactly 500. Bugs or limitations
normally don't tend to show up with such an even number ;).
How would I be able to check if this is the case and how would I be able
to solve it?
You could run testlvs [1] but I can derive some numbers as soon as I know the
page size and the RTT for one GET.
Well it was able to take more than apache and I tried setting that to
take the most connections it could. Do you have any better suggestions
on software I should be using client side, even another protocol.
I know, we use thttp for static contents too sometimes because it can handle
more connections than apache, but it should be able to get a lot more. I wonder
if you set a connection limitation somewhere, something along the throttling
part of thttpd. Also check your LINGER_TIME and LISTEN_BACKLOG settings.
Well it is all on it's own switch so I doubt that is the issue.
Yes.
I know it should be able to handle more but it appears there is
something wrong with my tests.
Or the app.
However I do get this from both of the app servers
TCP: time wait bucket table overflow
Too many connections in TW for FW2 state and too little memory too keep sockets.
Very interesting!! From your 15k TW state entries and the 128Mb RAM assumption
it would still not make too much sense because a socket doesn't need 8500 bytes.
I think after the next email I have some tunables for you :)
We'll fiddle with some /proc/sys/net/ipv4/ entries.
I tried google and usenet however I could not find anything useful
Thanks again for your help, take care - RL
[1] http://www.ssi.bg/~ja/testlvs-0.1.tar.gz
Have a nice day,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|