On Thu, 16 Mar 2000, Ratz wrote:
> I really would appreciate to generate, together with your help, a
> flowchart of the whole tcp-connection. Let me start [thankx to Joe for
> the picture in the LVS-HOWTO :) ]. LVS-DR, sched=rr, weight S#=1,
> http-GET-Request!
>
> _______
> | |
> | C | CIP
> |_______|
> |
> |
> ___|___
> | |
> | R |
> |_______|
> |
> |
> | __________
> | DIP | |
> |------| LB |
> | VIP |__________|
> |
> |
> |
> -------------------------------------
> | | |
> | | |
> RIP1, VIP RIP2, VIP RIP3, VIP
> ____________ ____________ ____________
> | | | | | |
> | S1 | | S2 | | S3 |
> |____________| |____________| |____________|
>
>
> C=Client, R=Router, S#=Realserver #, LB=Loadbalancer, ac=active
> connections, ic=inactive connections,
>
>
> C (R) LB S1 TCP_STATE(LB)
> ac ic
> 1+2) CIP -----------SYN------------> VIP ----SYN----> RIP1
> SYN_RECV 1 0
> 3) CIP <-------------------SYN/ACK----------------- RIP1
> 4+5) CIP -----------ACK------------> VIP ----ACK----> RIP1
> ESTABLISH 1 0
>
> ok, lets start sending real data
>
> 6) CIP -----------ACK------------> VIP ----ACK----> RIP1
> ESTABLISH 1 0
> ...
>
> So, now, we are finished, and want to close the connection. First
> problem: IMHO the loadbalancer is not able to distinguish between active
> close on the server side and active close and the clients side. This
> leeds to two final close sceneries (without SACK):
>
> active close on server side
> ===========================
>
> 1) CIP <---------------------FIN------------------- RIP1
> ESTABLISH 1 0
> 2+3) CIP ---------ACK--------------> VIP ----ACK----> RIP1
> ESTABLISH 1 0
> 4+5) CIP ---------FIN--------------> VIP ----FIN----> RIP1
> CLOSE_WAIT/CLOSED? 0 1 ?
> 6) CIP <---------------------ACK------------------- RIP1
> CLOSE_WAIT/CLOSED? 0 1 ?
>
> how does the lb know when he has to switch from CLOSE_WAIT to CLOSED? Or
> does he just switch to CLOSED?
>
Since LVS/DR box is only on the client-to-server connection, the LVS/DR
catch FIN packet and turns into the FINWAIT state, whose default timeout
is 2 minutes in the system now.
> active close on client side
> ===========================
>
> 1+2) CIP ---------FIN--------------> VIP ----FIN----> RIP1
> CLOSE_WAIT? 0 1 ?
> 3) CIP <---------------------ACK------------------- RIP1
> CLOSE_WAIT? 0 1 ?
> 4) CIP <---------------------FIN------------------- RIP1
> CLOSE_WAIT? 0 1 ?
> 5+6) CIP ---------ACK--------------> VIP ----ACK----- RIP1
> CLOSE_WAIT/CLOSED? 0 1 ?
>
The handling of LVS/DR in this situation is the same as above.
> I hope someone can help me with my confusion and that we can put this
> chart into the HOWTO, so everybody can understand how the loadbalancer
> is really working. What's missing? The whole IP_VS_MASQ_TABLE in the
> ip-layer (according to Wensong), SYN-cookies, SYN-drop. I'd really like
> to draw the whole functional chart but since I'm not sure mixing up the
> whole stuff I want add more.
>
>
> > >
> > > BTW.: What are the plans for transfering the ip_vs_masq_table from one
> > > kernel to another one in case of a failover of the loadbalancer? Is
> > > there already some idea or whatever?
> > >
> >
> > I just thought an idea on transfering the state table, it might be good.
> > We run a SendingState and a ReceivingState kernel_thread (daemons inside
> > the kernel like kflushd and kswapd) on the primary IPVS and the backup
> > respectively. Everytime the primary handles packets, it will put the
> > change of state in a sending queue. The SendingState kernel_thread will
> > wake up every HZ or HZ/2, to send the change in the queue to the
> > ReceivingState kernel_thread through UDP packets, and clear the queue
> > finally. The ReceivingState receives the packets and changes its own state
> > table.
> >
> > Since all is inside the kernel, it should be efficient, because the
> > switching overhead between the kernel and the user space (both for the UDP
> > communications and the read & write of those entries) can be avoided.
> >
> > Any comments?
>
> Sounds fair :)
> No, sorry, as I'm not yet a kernel hacker, I unfortunately can't
> contribute any good ideas to this subject. But I'm extremely interested
> in all kind of information I'll get!
>
No, no sorry. If you have time, you can still do some investigation or
coding too.
Cheers,
Wensong
|