LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Page cannot be displayed

To: mark.maiden@xxxxxxxxxxxxxx, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Page cannot be displayed
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Thu, 29 Mar 2007 15:52:50 +0200
We recently set up an Ultra Monkey load balancer with 2 real servers
and 95% of the time it seems to be working perfectly, but every now
and then our customers are getting "Page cannot be displayed" errors.

What's the average/peak request rate and size?

It happens at different stages on our websites and we can't seem to
reproduce the problem here. Out customers are very large Fortune 500
companies so we assume that their networking etc is top of the line,
and the fact that it is occurring with multiple customers we assume it
is our architecture. Our LB environment is as follows :

Ultra monkey box :
CentOS 4.4
ldirectord.cf :

# Global Directives
checktimeout=5
checkinterval=5
#fallback=127.0.0.1:80
autoreload=yes
#logfile="/var/log/ldirectord.log"
logfile="local0"

Can you correlate any log messages from ldirectord with the 5% page display problems? Since you seem to have a very high timeout value for your persistency and no indication of expire_nodest_conn it's not easy to pinpoint the problem. What kind of application is running behind the services? Does the application logic span over both services within the lifetime context? Does the fallback work?

# Controls IP packet forwarding
net.ipv4.ip_forward = 1
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

Besides the strange comment, enabling this can be helpful at times.

1: I assume that because we are using masq(NAT) that we don't need to
worry about the noarp problem with DR or TUN?

Correct.

2: Is there any ip tuning that we should do on the Ultra Monkey box as
not only is it acting as the load balancer but it is also a router
too?

Only if you experience performance problems. So I'd like to ask back if you've previously seen any indication of such problems in your log files (including kernel log: dmesg -s 100000).

3: Has anybody else seen this intermittent "Page cannot be displayed"
error with UM?

Sure, but there's tons of possibilities for this to happen. I can envision that ldirectord takes one of the RS out and due to the high service template timeout and the missing expire_nodest_conn setting and probably other issues, client requests are still being forwarded to the non-functional RS, which will definitely cause such a message to be displayed on the client's browser.

For your own amusement, I've allowed myself to quote the KB241344 article from Microsoft:

http://support.microsoft.com/kb/241344/EN-US/

This is maybe a wonderful example of why Microsoft is so much more successful than others: No mentioning of tcpdump/windump to their users and of course it's always the fault of the user :).

Regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>