The problem was indeed in the TCP session timeout settings. It defaults
to 900 seconds in CentOS 5. Connections made between the realserver
crash and pulse removing the node from the LVS config would remain in
ESTABLISHED state since director never got a CLOSE. As a result, these
dead connections remained in the ipvs table for whole 15 minutes and
thus making the realserver useless until the connections finally timed
out. I'm surprised nobody had this problem before.
Janar Kartau
Joseph Mack NA3T wrote:
> On Thu, 10 Apr 2008, Janar Kartau wrote:
>
>
>> But lately one of the realservers crashed during the day
>> and when it came back it was automatically added back to
>> the LVS and all but no new requests were sent to it.
>> Ipvsadm showed it had a lot of ActiveConn's and zero
>> InActConn's.
>>
>
> I'm surprised that we haven't heard about this as a problem
> before. A realserver crashing must happen often enough that
> someone else has already seen this.
>
>
>> These numbers remained the same for 10 or more minutes and
>> then ActiveConn started decreasing slowly.
>>
>
> I thought the timeout were about 2mins. Would changing them
> to 2mins help (it's one of the options to ipvsadm)?
>
>
>> Once the ActiveConn was lower than the other realservers
>> had, new requests started to reach the server and
>> InActConn increased from 0. I could reproduce this later
>> when i took a realserver down myself and noticed that the
>> more connections there were during the crash and after,
>> the bigger static count of ActiveConn's appeared for the
>> crashed server once it came back. Neither LVS restart or
>> "ipvsadm --zero" helped.
>>
>
> ipvs keeps its state tables so it doesn't mess with any
> ESTABLISHED connections.
>
> Joe
>
|