LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH] ipvs: skip ipvs snat processing when packet dst is not vip
Cc: pablo@xxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx
From: Duan Jiong <djduanjiong@xxxxxxxxx>
Date: Tue, 20 May 2025 09:52:19 +0800
On Tue, May 20, 2025 at 4:11 AM Julian Anastasov <ja@xxxxxx> wrote:
>
>
>         Hello,
>
>         Adding lvs-devel@ to CC...
>
> On Mon, 19 May 2025, Duan Jiong wrote:
>
> > Now suppose there are two net namespaces, one is the server and
> > its ip is 192.168.99.4, the other is the client and its ip
> > is 192.168.99.5, and the other is configured with ipvs vip
> > 192.168.99.6 in the host net namespace, configuring ipvs with
> > the backend 192.168.99.5.
> >
> > Also configure
> > iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE
> > to avoid packet loss when accessing with the specified
> > source port.
>
>         May be I don't quite understand why the MASQUERADE
> rule is used...

If nat is not configured, __nf_conntrack_confirm drops packets due to
tuple conflicts.

I'll post my reproduction method later on.

>
> >
> > First we use curl --local-port 15280 to specify the source port
> > to access the vip, after the request is completed again use
> > curl --local-port 15280 to specify the source port to access
> > 192.168.99.5, this time the request will always be stuck in
> > the main.
> >
> > The packet sent by the client arrives at the server without
> > any problem, but ipvs will process the packet back from the
> > server with the wrong snat for vip, and at this time, since
> > the client will directly rst after receiving the packet, the
> > client will be stuck until the vip ct rule on the host
> > times out.
> >
> > Signed-off-by: Duan Jiong <djduanjiong@xxxxxxxxx>
> > ---
> >  net/netfilter/ipvs/ip_vs_core.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/netfilter/ipvs/ip_vs_core.c 
> > b/net/netfilter/ipvs/ip_vs_core.c
> > index c7a8a08b7308..98abe4085a11 100644
> > --- a/net/netfilter/ipvs/ip_vs_core.c
> > +++ b/net/netfilter/ipvs/ip_vs_core.c
> > @@ -1260,6 +1260,8 @@ handle_response(int af, struct sk_buff *skb, struct 
> > ip_vs_proto_data *pd,
> >               unsigned int hooknum)
> >  {
> >       struct ip_vs_protocol *pp = pd->pp;
> > +     enum ip_conntrack_info ctinfo;
> > +     struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
> >
> >       if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
> >               goto after_nat;
> > @@ -1270,6 +1272,12 @@ handle_response(int af, struct sk_buff *skb, struct 
> > ip_vs_proto_data *pd,
> >               goto drop;
> >
> >       /* mangle the packet */
> > +     if (ct != NULL &&
> > +         hooknum == NF_INET_FORWARD &&
> > +         !ip_vs_addr_equal(af,
> > +                 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3,
> > +                 &cp->vaddr))
> > +             return NF_ACCEPT;
>
>         Such check will prevent SNAT for active FTP connections
> because their original direction is from real server to client.
> In which case ip_vs_addr_equal will see difference? When
> Netfilter creates new connection for packet from real server?
> It does not look good IPVS connection to be DNAT-ed but not
> SNAT-ed.
>
>         May be you can explain better what IPs/ports are present in
> the transferred packets.
>
> >       if (pp->snat_handler &&
> >           !SNAT_CALL(pp->snat_handler, skb, pp, cp, iph))
> >               goto drop;
> > --
> > 2.32.1 (Apple Git-133)
>
> Regards
>
> --
> Julian Anastasov <ja@xxxxxx>
>


1.  setup environment

[root@centos9s vagrant]# cat setup.sh
#!/bin/bash

ip netns add server
ip link add svrh type veth peer name svr
ip link set svr netns server
ip link set svrh up
ip link set dev svrh address ee:ee:ee:ee:ee:ee
ip netns exec server ip link set svr up
ip netns exec server ip addr add 192.168.99.4/32 dev svr
ip netns exec server ip route add 169.254.1.1 dev svr scope link
ip netns exec server ip route add default via 169.254.1.1 dev svr
ip netns exec server ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee
dev svr nud permanent
ip route add 192.168.99.4/32 dev svrh

ip netns add client
ip link add clih type veth peer name cli
ip link set cli netns client
ip link set clih up
ip link set dev clih address ee:ee:ee:ee:ee:ee
ip netns exec client ip link set cli up
ip netns exec client ip addr add 192.168.99.5/32 dev cli
ip netns exec client ip route add 169.254.1.1 dev cli scope link
ip netns exec client ip route add default via 169.254.1.1 dev cli
ip netns exec client ip neigh add 169.254.1.1 lladdr ee:ee:ee:ee:ee:ee
dev cli nud permanent
ip route add 192.168.99.5/32 dev clih

ip addr add 192.168.99.6/32 dev lo
ipvsadm -A -t 192.168.99.6:8080 -s rr
ipvsadm -a -t 192.168.99.6:8080 -r 192.168.99.4:8080 -m

echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 >  /proc/sys/net/ipv4/vs/conntrack
iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE

2. start server
ip netns exec server python -m http.server 8080

3. curl vip
ip netns exec client curl --local-port 15280 http://192.168.99.6:8080

4. curl rs
ip netns exec client curl --local-port 15280 http://192.168.99.4:8080

Here are the ct rules for executing curl and the tcpdump capture.

[root@centos9s vagrant]# tcpdump -s0 -nn -i clih
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on clih, link-type EN10MB (Ethernet), snapshot length 262144 bytes
01:50:14.328558 IP6 fe80::fc0e:fff:fef8:7c05 > ff02::2: ICMP6, router
solicitation, length 16
01:50:28.430769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [S],
seq 614710449, win 64240, options [mss 1460,sackOK,TS val 2654895687
ecr 0,nop,wscale 7], length 0
01:50:28.431026 ARP, Request who-has 192.168.99.5 tell 192.168.99.6, length 28
01:50:28.431034 ARP, Reply 192.168.99.5 is-at fe:0e:0f:f8:7c:05, length 28
01:50:28.431035 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 3593264529, ack 614710450, win 65160, options [mss 1460,sackOK,TS
val 4198589191 ecr 2654895687,nop,wscale 7], length 0
01:50:28.431048 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
ack 1, win 502, options [nop,nop,TS val 2654895687 ecr 4198589191],
length 0
01:50:28.431683 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [P.],
seq 1:82, ack 1, win 502, options [nop,nop,TS val 2654895688 ecr
4198589191], length 81: HTTP: GET / HTTP/1.1
01:50:28.431709 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.],
ack 82, win 509, options [nop,nop,TS val 4198589192 ecr 2654895688],
length 0
01:50:28.434072 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.],
seq 1:157, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
2654895688], length 156: HTTP: HTTP/1.0 200 OK
01:50:28.434083 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
ack 157, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194],
length 0
01:50:28.434166 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [P.],
seq 157:1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
2654895690], length 1038: HTTP
01:50:28.434171 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [.],
ack 1195, win 501, options [nop,nop,TS val 2654895690 ecr 4198589194],
length 0
01:50:28.434221 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [F.],
seq 1195, ack 82, win 509, options [nop,nop,TS val 4198589194 ecr
2654895690], length 0
01:50:28.434669 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [F.],
seq 82, ack 1196, win 501, options [nop,nop,TS val 2654895691 ecr
4198589194], length 0
01:50:28.434712 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [.],
ack 83, win 509, options [nop,nop,TS val 4198589195 ecr 2654895691],
length 0
01:50:33.158284 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236082988
ecr 0,nop,wscale 7], length 0
01:50:33.158429 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
val 4198593919 ecr 2236082988,nop,wscale 7], length 0
01:50:33.158496 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
seq 886133764, win 0, length 0
01:50:34.168530 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236083999
ecr 0,nop,wscale 7], length 0
01:50:34.168722 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
val 4198594929 ecr 2236082988,nop,wscale 7], length 0
01:50:34.168754 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
val 4198594929 ecr 2236082988,nop,wscale 7], length 0
01:50:34.168751 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
seq 886133764, win 0, length 0
01:50:34.168769 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
seq 886133764, win 0, length 0
01:50:36.216624 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
val 4198596977 ecr 2236082988,nop,wscale 7], length 0
01:50:36.216626 IP 192.168.99.5.15280 > 192.168.99.4.8080: Flags [S],
seq 886133763, win 64240, options [mss 1460,sackOK,TS val 2236086047
ecr 0,nop,wscale 7], length 0
01:50:36.216678 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
seq 886133764, win 0, length 0
01:50:36.216690 IP 192.168.99.6.8080 > 192.168.99.5.15280: Flags [S.],
seq 2329127612, ack 886133764, win 65160, options [mss 1460,sackOK,TS
val 4198596977 ecr 2236082988,nop,wscale 7], length 0
01:50:36.216693 IP 192.168.99.5.15280 > 192.168.99.6.8080: Flags [R],
seq 886133764, win 0, length 0
^C
28 packets captured
28 packets received by filter
0 packets dropped by kernel
[root@centos9s vagrant]# cat^C
[root@centos9s vagrant]# cat /proc/net/nf_conntrack | grep 15280
ipv4     2 tcp      6 7 CLOSE src=192.168.99.5 dst=192.168.99.6
sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080
dport=15280 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0
zone=0 use=2
ipv4     2 tcp      6 53 SYN_RECV src=192.168.99.5 dst=192.168.99.4
sport=15280 dport=8080 src=192.168.99.4 dst=192.168.99.6 sport=8080
dport=1279 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2


<Prev in Thread] Current Thread [Next in Thread>