Hello,
Either a massive bug in the ServerIron Firmware or a configuration
glitch on your side. Care to post the relevant part of the configuration?
In the ServerIron, each of the 6 real servers looks like this :
server real server01.domain x.x.x.41
port default disable
weight 10 0
port http
port http keepalive
port http url "GET /alarm/"
And this automatically gets /alarm/index.php as per configuration on
your lighttpd server?
port http status_code 200 299
!
And the virtual server :
server virtual virtual.domain x.x.x.225
port default dsr
port http sticky
port http dsr
bind http server01.domain http server02.domain http server03.domain
http server04.domain http
bind http server05.domain http server06.domain http
!
I don't exactly remember the FSM on the ServerIron hardware and
unfortunately these days one does not get access to their documentation
anymore, without a KP id :(. However, your configuration looks pretty
straight-forward and should definitely work. I'm just not sure if the
ServerIron OS distinguishes between HTTP no response and HTTP not
expected response?
What happens if your modify your PHP health check status script to
actually set code 500 for all HTTP requests? Do any of the RS get set
up, either with the ServerIron or the LVS?
The similar configuration with LVS (using keepalived) :
I'm not too familiar with the inner workings of keepalived, so maybe
Alexandre should throw an eye on this as well.
virtual_server x.x.x.229 80 {
delay_loop 6
This seems pretty short, considering you've 6 RS to check.
lb_algo rr
lb_kind DR
persistence_timeout 30
protocol TCP
real_server x.x.x.41 80 {
weight 10
HTTP_GET {
url {
path /alarm/
status_code 200
}
connect_timeout 5
nb_get_retry 2
delay_before_retry 5
}
}
! etc. for all other 5
}
How exactly do you get your RS to dynamically switch from HTTP response
code 200 to 500? Have you checked the HTTP response header using a CLI
tool like curl, lynx or wget?
Various ways. I'm using lighttpd with PHP as FastCGI, so by checking
a /alarm/index.php script :
- I get a 500 from lighttpd if the PHP backend is overloaded or dead
And right now I've extended this PHP script to keep sending 500s in
more situations, in order to avoid "plip-flopping" :
- I get a 500 from the script if the main db connection is down
- I get a 500 from the script of the server's 1min avg load is > 20
So what happens if you shut down all your DBs and restart keepalived?
How does the ipvsadm -Ln output look like?
I've checked with "curl -I" and get the status I expect in every case.
Ok.
I would like to have tried some kind of "keep the real server disabled
for n seconds when it's detected as down" in order to keep the check
from flip-flopping like this, but there is no such setting in
keepalived AFAICS.
Would it be possible and good enough for you to use the threshold
limitation feature by setting an upper and lower threshold for the
amount of active + inactive connections?
I've got a bit more information after running LVS for the past weeks
(without sending any real traffic to the virtual server IP address,
though, I use the ServerIron's virtual IP address currently). I keep
getting read timeouts from keepalived, so at a higher level it seems
that there already is an issue. The ServerIron reports no similar
timeouts against the same servers, which are running fine.
Health check read timeouts?
Anyhow, this is something I definitely need to fix before digging any
more about the LVS issue I reported initially.
Fair enough. Good luck,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|