LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: FreeS/WAN Cluster - our approach

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: FreeS/WAN Cluster - our approach
From: Roberto Nibali <ratz@xxxxxx>
Date: Thu, 14 Feb 2002 17:15:45 +0100
Hello Henrik,

> thanks for all the comments on LVSing IPSec.

No worries, you're invited to help coding if you can ;)
 
> I see that we'll have a lot of fun with it, if we really choose the
> cluster solution.

In what timeframe does this stuff need to work? And would you maybe
consider giving Julian and me a shell account on your setup, because
I doubt we can set this up at home?
 
> We did a little work since my first post and worked out a possible
> solution, which I want to present:
> (sorry for the long posting, but it becomes rather complex)

Long postings with high informational values are ok.
 
> Target:
> -------
> Cluster of IPSec Nodes for Load balancing (redundancy will be added later).

Oh, I'm not so sure if we can do state table synchronisation with
ESP/AH hashed table entries, unless we find out how the timeout and
such work. Maybe it's enough to synchronise the SDP pool.
 
> Assumptions:
> ------------
> - high traffic rates (200MBit+ in total)
> - many tunnels (1000+)

You must have a lot of customers ... or a broken application.

> - we somehow know* about all the existing tunnels and the subnets behind
> them

What do you mean by that?

> * in fact we have this Information in our 'Network Management System'
> BEFORE a tunnel is set up.

Why before and what good does a NMS do if it doesn't reflect the current
real picture?

>        +--------------+   +--------------+
>        |  IPSec term. |   |  IPSec term. |
>        +--------------+   +--------------+
>                |                |
>        +--------------+   +--------------+
>        | many Subnets |   | many Subnets |
>        +--------------+   +--------------+
> 
> We want to make it possible to have a secure connection from the 'many
> Subnets' to the 'target subnets'.

Do you intend to run IPsec in tunneling mode? I would assume so for
security and compatibility reasons.
 
> Main Problems:
> --------------
> - packets of the same IPSec tunnel MUST terminate in the same box

Once we get the routing of ESP/AH packets in LVS this is a piece of
cake with persistency.

> - packets from 'target subnets' destined for 'many Subnets' thgrough an
> IPSec tunnel MUST go to the correct node (the one that terminates/begins
> the tunnel).

My understanding of IPsec says this is an implication of the problem
above. Well, since we do LVS-DR, we don't rewrite the IP header and
thus still have the saddr information. If you're routing is ok on the
RS this is a solved issue.
 
> Our Approach for the solution:
> ------------------------------
> (at first I look at the problem from bottom to top of the drawing)
> We want to distribute the IPSec tunnels to the nodes. Each 'IPSec term'
> has:
> - an IP-Address (=tunnel starting address)
> - a subnet behind it (in the future there may be more of them, but for
> now one will do the job)

Every IPsec term. needs to be able to address all subnets or it defeats
the purpose of load balancing!
 
> Director 1:
> send packets for subnet of ipsec_term_003 to node_1
> send packets for subnet of ipsec_term_009 to node_1
> send packets for subnet of ipsec_term_002 to node_1
> 
> send packets for subnet of ipsec_term_005 to node_2
> send packets for subnet of ipsec_term_001 to node_2
> send packets for subnet of ipsec_term_010 to node_2
> 
> send packets for subnet of ipsec_term_008 to node_3
> send packets for subnet of ipsec_term_004 to node_3
> send packets for subnet of ipsec_term_007 to node_3
> 
> Director 2:
> send packets for tunnel starting address of ipsec_term_003 to node_1
> send packets for tunnel starting address of ipsec_term_009 to node_1
> send packets for tunnel starting address of ipsec_term_002 to node_1
> 
> send packets for tunnel starting address of ipsec_term_005 to node_2
> send packets for tunnel starting address of ipsec_term_001 to node_2
> send packets for tunnel starting address of ipsec_term_010 to node_2
> 
> send packets for tunnel starting address of ipsec_term_008 to node_3
> send packets for tunnel starting address of ipsec_term_004 to node_3
> send packets for tunnel starting address of ipsec_term_007 to node_3

Well, the problem with this is, that in IPsec tunneling mode, you simply
don't have this information about the to-be-routed subnet in the non
decrypted part of the IP packet. Read my email exchange with Julian. In
tunneling mode you only have the daddr which is equal to the VIP and
only the IPsec term. after deciphering knows where the packet needs to
be routed to.
 
> The next Idea is to store this Information in a hash in the directors.

This is called the connection affinity template, which is used for the
schedulers to effectively load balance traffic.
 
> Wheather the assignment of the 'ipsec terms' to the nodes is done in
> Director 1, Director 2 or another machine isn't clear to me, but it has
> to be done in one point (either DR1 or DR2 or somewhere else). Round
> robin should work for the beginning, maybe we can do some tuning here,
> when the system works.

Yes, as mentioned in my last email to Julian, I have a new algorithm
in mind that doesn't need artificial nor TCP header information to be
able to load balance.

> Back to these mysterious hashes: if we make them static, we waste the
> opportunity for the 'node sheduling' - so we must get to something
> semi-dynamic :-)

You lost me here.

> We think of a big hash, that stores the IP Adresses and the node for
> that IP-address. So if you have an IP, a quick search for the
> corresponding node is possible.

What do you refer to as node?
 
> A look at the sh and dh algorithms was quite helpful for me. But there
> the hash is too small, and it's static.

What do you mean with static hash?

> In fact I have no clue if it's possible to have such a hash which will
> probabaly be >50MB and I have no clue how to modify such a hash from
> 'outside', i.e.  another server.

:) Might I ask you for your definition of a hash? You cannot store
50 Mbyte in the kernel or I must have missed some OS concepts.
 
> That's our 'concept' by now.
> Does it sound stupid? or realistic?

Some of it sounds great and feasable, the rest I might not understand
and as such sounds strange.

Keep on thinking and proposing, we will find a solution.

Best regards,
Roberto Nibali, ratz

-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc


<Prev in Thread] Current Thread [Next in Thread>