LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip
Cc: pablo@xxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx
From: Duan Jiong <djduanjiong@xxxxxxxxx>
Date: Wed, 21 May 2025 10:01:23 +0800
On Tue, May 20, 2025 at 9:28 PM Julian Anastasov <ja@xxxxxx> wrote:
>
>
>         Hello,
>
> On Tue, 20 May 2025, Duan Jiong wrote:
>
> > 1.  setup environment
> >
> > [root@centos9s vagrant]# cat setup.sh
> > #!/bin/bash
> >
> > ip netns add server
> > ip link add svrh type veth peer name svr
> > ip link set svr netns server
> > ip link set svrh up
> > ip link set dev svrh address ee:ee:ee:ee:ee:ee
> > ip netns exec server ip link set svr up
> > ip netns exec server ip addr add 192.168.99.4/32 dev svr
> > ip netns exec server ip route add 169.254.1.1 dev svr scope link
> > ip netns exec server ip route add default via 169.254.1.1 dev svr
> > ip netns exec server ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee
> > dev svr nud permanent
> > ip route add 192.168.99.4/32 dev svrh
> >
> > ip netns add client
> > ip link add clih type veth peer name cli
> > ip link set cli netns client
> > ip link set clih up
> > ip link set dev clih address ee:ee:ee:ee:ee:ee
> > ip netns exec client ip link set cli up
> > ip netns exec client ip addr add 192.168.99.5/32 dev cli
> > ip netns exec client ip route add 169.254.1.1 dev cli scope link
> > ip netns exec client ip route add default via 169.254.1.1 dev cli
> > ip netns exec client ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee
> > dev cli nud permanent
> > ip route add 192.168.99.5/32 dev clih
> >
> > ip addr add 192.168.99.6/32 dev lo
> > ipvsadm -A -t 192.168.99.6:8080 -s rr
> > ipvsadm -a -t 192.168.99.6:8080 -r 192.168.99.4:8080 -m
> >
> > echo 1 > /proc/sys/net/ipv4/ip_forward
> > echo 1 >  /proc/sys/net/ipv4/vs/conntrack
> > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE
> >
> > 2. start server
> > ip netns exec server python -m http.server 8080
> >
> > 3. curl vip
> > ip netns exec client curl --local-port 15280 http://192.168.99.6:8080
> >
> > 4. curl rs
> > ip netns exec client curl --local-port 15280 http://192.168.99.4:8080
> >
> > Here are the ct rules for executing curl and the tcpdump capture.
> >
> > [root@centos9s vagrant]# tcpdump -s0 -nn -i clih
> > dropped privs to tcpdump
> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > listening on clih, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > 01:50:14.328558 IP6 fe80::fc0e:fff:fef8:7c05 > ff02::2: ICMP6, router
> > solicitation, length 16
>
>         Client correctly connects to VIP:
>
> > 01:50:28.430769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [S],
> > seq 614710449, win 64240, options [mss 1460,sackOK,TS val 2654895687
> > ecr 0,nop,wscale 7], length 0
> > 01:50:28.431026 ARP, Request who-has 192.168.99.5 tell 192.168.99.6, length 
> > 28
> > 01:50:28.431034 ARP, Reply 192.168.99.5 is-at fe:0e:0f:f8:7c:05, length 28
> > 01:50:28.431035 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 3593264529, ack 614710450, win 65160, options [mss 1460,sackOK,TS
> > val 4198589191 ecr 2654895687,nop,wscale 7], length 0
> > 01:50:28.431048 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
> > ack 1, win 502, options [nop,nop,TS val 2654895687 ecr 4198589191],
> > length 0
> > 01:50:28.431683 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [P.],
> > seq 1:82, ack 1, win 502, options [nop,nop,TS val 2654895688 ecr
> > 4198589191], length 81: HTTP: GET / HTTP/1.1
> > 01:50:28.431709 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.],
> > ack 82, win 509, options [nop,nop,TS val 4198589192 ecr 2654895688],
> > length 0
> > 01:50:28.434072 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.],
> > seq 1:157, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
> > 2654895688], length 156: HTTP: HTTP/1.0 200 OK
> > 01:50:28.434083 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
> > ack 157, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194],
> > length 0
> > 01:50:28.434166 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.],
> > seq 157:1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
> > 2654895690], length 1038: HTTP
> > 01:50:28.434171 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
> > ack 1195, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194],
> > length 0
> > 01:50:28.434221 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [F.],
> > seq 1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
> > 2654895690], length 0
> > 01:50:28.434669 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [F.],
> > seq 82, ack 1196, win 501, options [nop,nop,TS val 2654895691 ecr
> > 4198589194], length 0
> > 01:50:28.434712 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.],
> > ack 83, win 509, options [nop,nop,TS val 4198589195 ecr 2654895691],
> > length 0
>
>         But the following packet is different from your
> initial posting. Why client connects directly to the real server?

when there is a problem accessing the vip, the first thing users may consider
is to check whether the back-end service is normal or not

> Is it allowed to have two conntracks with equal reply tuple
> 192.168.99.4:8080 -> 192.168.99.6:15280 and should we support
> such kind of setups?

No, I don't think this needs to be supported, the tuple in the reply
direction should be different, it's just that here ipvs mistakenly did snat

>
>         May be you'll need a function in ip_vs_nfct.c that ensures
> the packet is in reply direction and its original dest is the
> vaddr as you already check. You will need an alternative
> function in ip_vs.h when CONFIG_IP_VS_NFCT is not defined.
> See ip_vs_conntrack_enabled() for reference. You can not directly
> use nf_ functions in ip_vs_core.c
>
> > 01:50:33.158284 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
> > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236082988
> > ecr 0,nop,wscale 7], length 0
> > 01:50:33.158429 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
> > val 4198593919 ecr 2236082988,nop,wscale 7], length 0
> > 01:50:33.158496 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
> > seq 886133764, win 0, length 0
> > 01:50:34.168530 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
> > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236083999
> > ecr 0,nop,wscale 7], length 0
> > 01:50:34.168722 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
> > val 4198594929 ecr 2236082988,nop,wscale 7], length 0
> > 01:50:34.168754 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
> > val 4198594929 ecr 2236082988,nop,wscale 7], length 0
> > 01:50:34.168751 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
> > seq 886133764, win 0, length 0
> > 01:50:34.168769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
> > seq 886133764, win 0, length 0
> > 01:50:36.216624 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
> > val 4198596977 ecr 2236082988,nop,wscale 7], length 0
> > 01:50:36.216626 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
> > seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236086047
> > ecr 0,nop,wscale 7], length 0
> > 01:50:36.216678 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
> > seq 886133764, win 0, length 0
> > 01:50:36.216690 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
> > seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
> > val 4198596977 ecr 2236082988,nop,wscale 7], length 0
> > 01:50:36.216693 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
> > seq 886133764, win 0, length 0
> > ^C
> > 28 packets captured
> > 28 packets received by filter
> > 0 packets dropped by kernel
> > [root@centos9s vagrant]# cat^C
> > [root@centos9s vagrant]# cat /proc/net/nf_conntrack | grep 15280
> > ipv4     2 tcp      6 7 CLOSE src=192.168.99.5 dst=192.168.99.6
> > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080
> > dport=15280 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0
> > zone=0 use=2
> > ipv4     2 tcp      6 53 SYN_RECV src=192.168.99.5 dst=192.168.99.4
> > sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080
> > dport=1279 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
>
>         dport=1279 ? Not 15280 ? Is it from your test?

Yes, It's because I added the iptables rule earlier, if I don't add
this the source port will remain at 15280, and
then the syn packet will be dropped in the __nf_conntrack_confirm function.

>
> Regards
>
> --
> Julian Anastasov <ja@xxxxxx>
>


<Prev in Thread] Current Thread [Next in Thread>