On 11/8/06, Graeme Fowler <graeme@xxxxxxxxxxx> wrote:
Antonio Forster wrote:
> We had load ip_conntrack_ftp in both situations, with modular and
> static kernel. thanks for the comment anyway!
Are you also load-balancing *inbound* FTP sessions in this LVS?
Not at all. The only FTP sessions are initiated in the servers in th
LVS cluster to environments out of the cluster.
Humour me for a moment. On the face of it, from here, it seems highly
likely that the N:1 SNAT rule for outbound initiated connections is
incorrect - not that I'm accusing you of anything here, I am trying to
simplify the conditions.
The SNAT rules are the following:
iptables -t nat -I POSTROUTING -o eth0 -s inst11 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst12 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst13 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst14 -j SNAT --to-source VIP1
iptables -t nat -I POSTROUTING -o eth0 -s inst21 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst22 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst23 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst24 -j SNAT --to-source VIP2
iptables -t nat -I POSTROUTING -o eth0 -s inst31 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst32 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst33 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst34 -j SNAT --to-source VIP3
iptables -t nat -I POSTROUTING -o eth0 -s inst41 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst42 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst43 -j SNAT --to-source VIP4
iptables -t nat -I POSTROUTING -o eth0 -s inst44 -j SNAT --to-source VIP4
Can you do a sequence of tests? Below, the word "active" indicates that
*either*:
A: The "active" server has all services up, the others are down, the LVS
remains configured on the director for all four; or
B: The "active" server is the *only* server configured for LVS service
on the director.
1. Attempt an FTP connection from server1 (each time) with server1,
server2, server3, server4 active in the LVS on their own (four tests).
2. Do the same sequence but with the FTP connection coming from server2,
server3, server4 in turn (with the other servers active in turn as in 1).
3. Test from server(1,2,3,4) with pairs of servers active.
4. Test from server(1,2,3,4) with triplets of servers active.
5. Finally, test from server(1,2,3,4) with all servers active.
This way, although a bit long-winded, should at least throw some light
on the problem - bear in mind that we can only see what you're telling
us, so any additional info will help!
We have conducted all the tests you mentioned, and we found out that
if more than one instance is up and the LVS health checkers are
monitoring them and seeing they are up, the outbound FTP fails.
The strange part is:
- during the test, there were one virtual server group with only one
active instance, and that one had about 20 sessions. when I activated
another instance on the same virtual server, the FTP worked fine until
the amount of connections on the second instance reached the same
amount of connections the first instance had. At that time, the FTP
stopped working again.
With this behavior I thought the problem was a result of the load
balancing itself. Since the scheduler in use is wlc, until LVS had to
start balancing again between the two instances, it was working. When
considering this, I decided to change the keepalived configs to
include persistence for the sessions, and after that, it seems to be
working in all situations..
does it make sense? I'll go on with further testing anyway.
Thanks and best regards,
Antonio
|