Since no else is on right now, I'll offer this from my own experience.
I had a high number of inactive connections with apache set up to not
use keepalive at all. After activating keep alive in apache (LVS was
already persist) the number of inactive connections went way down.
So in my case at least, it was connections that were setup, used for a
single GET for a gif, button, jpeg, js script, or other page component
then the server closed the connection, only to open another for the next
gif, etc.
You might be able to use something like multilog to watch a bunch of the
logs at the same time to get an idea if the traffic looks like real
people (get page 1, get page 1 images, get page 2, get page 2 images) or
if it is random hammering from a dos attack.
I wrote a small shell script that pulled the recent log entries, counted
the hits per IP address for certain requests and then created a iptables
rule on the director (or some machine in front of the director) to
tarpit requests from that IP. This worked in my situation because we
knew that certain URLs were only hit a small number of times during a
legit use session (like a login page shouldn't be hit 957 times in an
hour by the same external IP) This could help reduce the tide of
requests if you are actually encountering a (d)dos. I ran it every 12
minutes or so. If you are getting ddos'd the tarpit function of iptables
http://www.securityfocus.com/infocus/1723 or the tarpit standalone can
be a great help. Also, Felix and his company seem to have helped some
large companies deal with high traffic ddos attacks - http://www.fefe.de/
BTW, You might be interested in http://www.backhand.org/mod_log_spread/
for centralized and redundant logging. That way you can run different
kinds of real time analysis with no extra load on the webservers or the
normal logging hosts by just having an additional machine join/subscribe
to the multicast spread group with the log data.
Rob
OK I can't find my script, but this was the start of it, it is hardly a
shell script (but someone may find it useful):
Add a "grep blah" command just before the awk '{print $2}' if you want
just certain requests or other filtering.
multidaychk.sh
#!/bin/sh
# look for mutliday patterns
# $1 is how many days back to search
# $2 is how many high usage IPs to list
ls -1tr /usr/local/apache2/logs/access_log.200*0 | tail -${1} | xargs -n
1 cat | awk '{print $2}' | sort | uniq -c | sort -nr | head -${2}
byhrchk.sh
#!/bin/sh
# looks for IPs hitting during a certain hr of the day
# $1 is how many days back to search
# $2 is how many high usage IPs to list
# $3 is which hour of the day
ls -1tr /usr/local/apache2/logs/access_log.200*0 | tail -${1} | xargs -n
1 cat | fgrep "2005:${3}" | awk '{print $2}' | sort | uniq -c | sort -nr
| head -${2}
recentchk.sh
#!/bin/sh
# This just checks the latest X lines from the newest log file
# $1 is how many lines from the file
# $2 is how many high usage IPs to list
ls -1tr /usr/local/apache2/logs/access_log.200*0 | tail -1 | xargs -n 1
tail -${1} | awk '{print $2}' | sort | uniq -c | sort -nr | head -${2}
HTH
nigel@xxxxxxxxxxx wrote:
Hi,
Now the bad news. This weekend the web service we run came under
increased load --- about an extra 10,000,000 queries per day ---- and we now
have a busted cluster. Here is what IPVS looks like:
IP Virtual Server version 1.0.10 (size=65536)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 66.98.x.y:80 rr
-> 66.98.x.y:80 Tunnel 1 37 337
-> 67.15.x.y:80 Tunnel 1 14 382
-> 66.98.x.y:80 Tunnel 1 6 131
-> 207.44.x.y:80 Tunnel 1 21 325
-> 66.98.x.y:80 Tunnel 1 57 422
-> 207.44.x.y:80 Tunnel 1 12 354
-> 69.57.x.y:80 Tunnel 1 33 355
-> 67.15.x.y:80 Tunnel 1 71 274
-> 67.15.x.y:80 Tunnel 1 12 378
-> 207.44.x.y:80 Tunnel 1 5 345
-> 66.98.x.y:80 Tunnel 1 59 301
-> 67.15.x.y:80 Tunnel 1 2 347
-> 67.15.x.y:80 Tunnel 1 19 375
-> 69.57.x.y:80 Tunnel 1 10 132
-> 69.57.x.y:80 Tunnel 1 3 128
-> 67.15.x.y:80 Tunnel 1 15 361
-> 69.57.x.y:80 Tunnel 1 8 128
-> 67.15.x.y:80 Tunnel 1 229 303
-> 67.15.x.y:80 Tunnel 1 16 372
-> 67.15.x.y:80 Tunnel 1 125 317
-> 67.15.x.y:80 Tunnel 1 12 367
-> 207.44.x.y:80 Tunnel 1 13 333
-> 207.44.x.y:80 Tunnel 0 144 5
-> 66.98.x.y:80 Tunnel 1 10 404
-> 207.44.x.y:80 Tunnel 0 0 0
-> 207.44.x.y:80 Tunnel 1 132 277
At this point the service works but is too slow. But in the next 60 seconds
the - InActConn count grows to over 2000+ per real server - and the whole thing
locks up.
* What precisely does the InActConn figures show?
Is this symptomatic of simply an overloaded cluster - or could it be a DOS
problem.
Any insights or similar experiences would be much appreciated?
Kind regards,
Nigel
|