lvs-devel
|
To: | Julian Anastasov <ja@xxxxxx> |
---|---|
Subject: | Re: ipvsadm: One-packet scheduling with UDP service is unstable |
Cc: | lvs-devel@xxxxxxxxxxxxxxx |
From: | Drunkard Zhang <gongfan193@xxxxxxxxx> |
Date: | Mon, 26 Aug 2013 10:07:35 +0800 |
2013/8/24 Julian Anastasov <ja@xxxxxx>: > > Hello, > > On Sat, 24 Aug 2013, Drunkard Zhang wrote: > >> I'm running x86_64 kernel. I compared kernel config of my two servers, >> a big difference between them is CONFIG_PREEMPT. While CONFIG_PREEMPT >> is disabled, trying plenty times of "ipvsadm -C && ipvsadm -R < >> rules-with-ops" will finally succeed, but with CONFIG_PREEMPT enabled > > There is no "./" in above ipvsadm commands, > I hope you put everything in scripts to make sure > the new ipvsadm binary is used. > >> it's too hard to get --ops work. I will test again on my "good" server >> another day to prove my guessing. > > My tests are on 32-bit UP, may be that is why I can > not reproduce it. > >> Is there any good debug method for this? Tuning >> /proc/sys/net/ipv4/vs/debug_level didn't gave me much. > > echo 20 > /proc/sys/net/ipv4/vs/debug_level > > should show something but don't do it for > 60K packets/sec > >> I use keepalived to manage the ipvs configuration, but as vrrp >> heartbeat going on and no realserver up/down, it won't interact with >> ipvs, right? So I can temporarily modify ipvs rule via ipvsadm after >> keepalived started, and the modified rules didn't changed as time fly, >> so do the --ops setting. > > Yes, just make sure ops is present after the tests, > in case some daemon removes the flag. > >> > More things to check: >> > >> > - if traffic stops check if some real server is hijacking the >> > traffic from director due to ARP problem in the real server. >> > Or explain how exactly OPS stops to work, do you see other >> > traffic for the VIP coming to director during such problem? >> > >> No possibility, I configured VIP on lo of realserver. >> for IP in $VIP; do >> ip addr add $IP/32 dev $VIP_NIC brd $IP >> done > > Setting these flags on "lo" is useless but > "all" values should do the job, so ARP problem is > solved. > >> sysctl -q -w net.ipv4.conf.lo.arp_ignore=1 >> sysctl -q -w net.ipv4.conf.lo.arp_announce=2 >> sysctl -q -w net.ipv4.conf.all.arp_ignore=1 >> sysctl -q -w net.ipv4.conf.all.arp_announce=2 >> >> > - Build ipvsadm with 'make HAVE_NL=0' to check if Conns=0 problem >> > in --stats output is netlink related. This builds ipvsadm without >> > netlink support but use this binary only to see stats, not >> > for configuration. >> > >> > - show output from 'cat /proc/net/ip_vs_stats_percpu' to see >> > the kernel's stats and rates. Note that these stats are not >> > zeroed while stats in /proc/net/ip_vs_stats are zeroed. >> >> Always changing. > > Even when OPS does not work? > >> vs3 ~ # cat /proc/net/ip_vs_stats_percpu >> Total Incoming Outgoing Incoming Outgoing >> CPU Conns Packets Packets Bytes Bytes >> 0 8F11751F 70455AB5 0 10AA672610D 0 >> 1 1A780554 1A780554 0 E2AB71BCA 0 >> 2 0 0 0 0 0 >> 3 BF0E0B BF0E0B 0 4B7E409C 0 >> 4 244BAF54 244BAF54 0 2224071265 0 >> 5 2360B25C 2360B25B 0 1715A45DB3 0 >> 6 0 0 0 0 0 >> 7 E88FEF E88FEF 0 6ECC3067 0 >> 8 1E2477AE 1E2477AE 0 12726CDE2E 0 >> 9 10BD4D97 10BD4D97 0 A35650024 0 >> A BE81916 BE81914 0 6D9FD6CEF 0 >> B 4474D837 4474D836 0 3FCEC43B56 0 >> C 0 0 0 0 0 >> D 0 0 0 0 0 >> E 0 0 0 0 0 >> F 0 0 0 0 0 >> ~ 721BAF1B 534F94AD 0 1B61556B50B 0 >> >> Conns/s Pkts/s Pkts/s Bytes/s Bytes/s >> 1120F 1120F 0 C1FEB1 0 > > So, to summarize for the both cases when OPS > works and when OPS does not work: > > - you check after every rule restoring that the ops is > present in kernel rules: cat /proc/net/ip_vs Sure, ops is always there. > - in both cases traffic is received on director (no ARP > problem): tcpdump -lnnn -i $INPUT_DEVICE -c 10 $VIP Also sure. > - cat /proc/net/ip_vs_stats_percpu in both cases shows > that Conns for CPU "~" (Totals) are increasing and "Conns/s" > rate is above 0. Help me to understand the Conns=0 and CPS=0 > values in ipvsadm, they are showing 0 in both cases, > right? Badly, Conns is a fixed number and Conns/s is zero. vs3 ~ # cat /proc/net/ip_vs_stats_percpu Total Incoming Outgoing Incoming Outgoing CPU Conns Packets Packets Bytes Bytes 0 12 3C3C98F 0 2CB498261 0 1 0 54324B 0 4BEFFBB9 0 2 0 50C2 0 1F37E8 0 3 0 0 0 0 0 4 0 0 0 0 0 5 0 1BC7A 0 1635AF3 0 6 0 31A7BE 0 1CDC43C7 0 7 0 2B4E76 0 1BF498BE 0 8 0 1D418 0 B86A6E 0 9 0 5B49E8 0 54FD74D5 0 A 0 75147D 0 410A95C0 0 B 0 0 0 0 0 C 0 BD570 0 49118C5 0 D 0 0 0 0 0 E 0 211948 0 138B63C5 0 F 0 626075 0 402A470B 0 ~ 12 5D7664E 0 43FC611F8 0 Conns/s Pkts/s Pkts/s Bytes/s Bytes/s 0 EF93 0 AE2DD9 0 > - where do you see that OPS is not working? In > ipvsadm -ln --stats/--rate ? Or packets do not > reach real servers? Do you see that rates or stats > for the real servers stop in ipvsadm output? I'm sure OPS is not working, both from ipvsadm -ln --stats/--rate and iftop -i eth0 -f "udp port 514" on real server. There's no ingress traffice at all when InPPS/InBPS from --rate is 0, but OPS is set. > May be we can enable debug for short time when > OPS is not working: > > # Start debug for 10ms > echo 20 > /proc/sys/net/ipv4/vs/debug_level > usleep 10000 > # Stop debug > echo 0 > /proc/sys/net/ipv4/vs/debug_level > > You can show me such debug. The main thing to > understand is where in IPVS the traffic is lost, the > debug will be helpful, it should be no more than one > page per packet. I need debug for one packet, something > that you see is repeated in logs. May be due to the > destination trash mechanism something is not set properly > after the ipvsadm -C && ipvsadm -R sequence. sleep 1 does not help with `ipvsadm -C && sleep 1 && ipvsadm -R < rules-with-ops`. Debug log is attached. bad-20130826-init.gz is produced by: ./ipvsadm -C # Clear previous log > /var/log/kern.log sleep 3 # Start debug echo 20 > /proc/sys/net/ipv4/vs/debug_level ./ipvsadm -R < /etc/keepalived/rules-with-ops usleep 10000 # Stop debug echo 0 > /proc/sys/net/ipv4/vs/debug_level bad-20130826-running.gz and good-20130825-running.gz is produced by: # Start debug for 10ms echo 20 > /proc/sys/net/ipv4/vs/debug_level usleep 10000 # Stop debug echo 0 > /proc/sys/net/ipv4/vs/debug_level good-20130825-running.gz is captured when OPS working. I noticed that after long time of running (ops is configured but not working), like 24 hours, restore the rules again, it may works sometimes. But with newly started kernel, it's just too hard to get ops working.
bad-20130826-init.gz
bad-20130826-running.gz
good-20130825-running.gz |
Previous by Date: | Re: ipvsadm: One-packet scheduling with UDP service is unstable, Julian Anastasov |
---|---|
Next by Date: | Re: ipvsadm: One-packet scheduling with UDP service is unstable, Drunkard Zhang |
Previous by Thread: | Re: ipvsadm: One-packet scheduling with UDP service is unstable, Jesper Dangaard Brouer |
Next by Thread: | Re: ipvsadm: One-packet scheduling with UDP service is unstable, Drunkard Zhang |
Indexes: | [Date] [Thread] [Top] [All Lists] |