LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Destination unreachable

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Destination unreachable
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Fri, 12 Sep 2003 19:02:50 +0200
Hello,

Below my original config file for keepalived. At the moment, i cannot
use both realserver, because one of them has now the ip 80.240.228.100
and is in production modus. Currently, all tests i make with one server
and this config:

Ok. so your test setup actually is:

LVS-NAT with sh scheduler and VIP 80.240.228.101 service port 80 persistent balanced over RIP 172.31.2.171 port 80. You do you connection tests from outside the cluster and I hope your routing is ok :).

TCP  80.240.228.101:80 sh persistent 300
  -> 172.31.2.171:80              Masq    1      1          2

Hmm, looks ok to me.

I made this config with ipvsadm himself and not with keepalived, because
i would disable the vrrp. And at the moment, it seems, that the error
will occur 10 times under befor.

I'm sorry, I do not understand "... it seems, that the error will occur 10 times under befor". What do you mean?

Are there configerrors for the vrrpd in my configfile?

That I can't tell for sure, Alex could tell you, but from a quick glance it looked pretty good.

Here an cut-out from my tcpdump, where the error occurs:

23:10:37.020778 212.152.215.168.1588 > 80.240.228.101.http: . ack 37781
win 8576 (DF)
23:10:37.021024 80.240.228.101.http > 212.152.215.168.1588: .
45821:46357(536) ack 2304 win 9648 (DF)
23:10:37.076160 80.240.228.101.http > 62.47.21.159.1904: P 1:536(535)
ack 586 win 6435 (DF)
23:10:37.135271 212.152.215.168.1587 > 80.240.228.101.http: . ack 45180
win 8576 (DF)

This belongs to an old working connection.

23:10:37.259744 212.119.130.205.1252 > 80.240.228.101.http: S
222598881:222598881(0) win 16384 <mss 1460,nop,nop,sackOK> (DF)
23:10:37.259770 80.240.228.101 > 212.119.130.205: icmp: 80.240.228.101
tcp port http unreachable [tos 0xc0]

Could you increase the verbosity of vs_debug and capture the log statements during such an event, please? Obviously your RS' service is down or someone is standing on the network cable :).

23:10:37.303779 62.47.21.159.1904 > 80.240.228.101.http: . ack 536 win
16529 (DF)
23:10:37.402666 212.152.215.168.1587 > 80.240.228.101.http: . ack 46252
win 8576 (DF)

Old connection with existing template.

23:10:37.492264 80.240.228.28 > 224.0.0.18: VRRPv2-advertise 28: vrid=17
prio=150 intvl=1 [tos 0xc0]

I thought, you disabled keepalived?

23:10:37.670230 212.152.215.168.1587 > 80.240.228.101.http: . ack 47324
win 8576 (DF)
23:10:37.925209 212.152.215.168.1587 > 80.240.228.101.http: . ack 47860
win 8576 (DF)
23:10:38.040296 212.152.215.168.1587 > 80.240.228.101.http: . ack 48932
win 8576 (DF)
23:10:38.133066 212.152.215.168.1587 > 80.240.228.101.http: P
2096:2378(282) ack 49011 win 8497 (DF)
23:10:38.133288 80.240.228.101.http > 212.152.215.168.1587: . ack 2378
win 8576 (DF)
23:10:38.135259 80.240.228.101.http > 212.152.215.168.1587: .
49011:49547(536) ack 2378 win 8576 (DF)
23:10:38.135307 80.240.228.101.http > 212.152.215.168.1587: .
49547:50083(536) ack 2378 win 8576 (DF)
23:10:38.135355 80.240.228.101.http > 212.152.215.168.1587: .
50083:50619(536) ack 2378 win 8576 (DF)
23:10:38.135405 80.240.228.101.http > 212.152.215.168.1587: .
50619:51155(536) ack 2378 win 8576 (DF)
23:10:38.135455 80.240.228.101.http > 212.152.215.168.1587: .
51155:51691(536) ack 2378 win 8576 (DF)
23:10:38.135500 80.240.228.101.http > 212.152.215.168.1587: P
51691:52193(502) ack 2378 win 8576 (DF)
23:10:38.315307 212.152.215.168.1588 > 80.240.228.101.http: . ack 38317
win 8576 (DF)
23:10:38.315606 80.240.228.101.http > 212.152.215.168.1588: .
46357:46893(536) ack 2304 win 9648 (DF)
23:10:38.460267 212.152.215.168.1588 > 80.240.228.101.http: . ack 39389
win 8576 (DF)
23:10:38.460534 80.240.228.101.http > 212.152.215.168.1588: P
46893:47429(536) ack 2304 win 9648 (DF)
23:10:38.460583 80.240.228.101.http > 212.152.215.168.1588: .
47429:47965(536) ack 2304 win 9648 (DF)

2 old established connections with a template and using SACK, this is most certainly fine.

23:10:38.492214 80.240.228.28 > 224.0.0.18: VRRPv2-advertise 28: vrid=17
prio=150 intvl=1 [tos 0xc0]
23:10:38.715332 212.152.215.168.1588 > 80.240.228.101.http: . ack 39925
win 8576 (DF)
23:10:38.715573 80.240.228.101.http > 212.152.215.168.1588: .
47965:48501(536) ack 2304 win 9648 (DF)
23:10:38.757088 802.1d config c06f.00:0a:8a:7a:11:c0.8003 root
806f.00:0a:f4:a2:ef:40 pathcost 3023 age 2 max 20 hello 2 fdelay 15

:) your switch talks a bit too much, maybe you should restrict your tcpdump to tcp only.

23:10:38.840007 212.152.215.168.1588 > 80.240.228.101.http: . ack 40997
win 8576 (DF)

No problem here.

So from what I see there is only 1 client which doesn't get a connection because the service seems to be down according to the director and this is either because the RIP has been removed (dynamically maybe) from the VIP or (which would be extremely nasty) you have a weird timing problem with regard to the TCP state table.

Are there kernelparameters which i can/must modify?

I hope not. I need a snapshot of the debug output of LVS when this error occurs. WARNING, increasing the vs_debug will bomb your logs quite a bit and it might be difficult to find the exact entry so I suggest you do your tests with only one VIP for now.

Currently i am seeking an other NIC like 3COM, because i am not sure
about intel cards working correctly.

The Intel NICs should be working fine with 2.4.21 (that's what you used, IIRC) as long as you use the eepro100 driver provided by Intel.

Here my Values under /proc/sys/net/ipv4/vs
*** am_droprate ***
10
*** amemthresh ***
1024
*** cache_bypass ***
0
*** debug_level ***
1

Do you spot anything unusual in your kernlog/messages file?

virtual_server 80.240.228.101 80 {
  delay_loop 3
  lb_algo sh

Could you out of curiosity (not that it should really help) also try with the rr scheduler, please?

  lb_kind NAT
  persistence_timeout 300
  protocol TCP

  real_server 172.31.2.181 80 {

Hmmm, so you use a different RIP when you start your tests with keepalived as opposed to setting the service up with ipvsadm directly?

    weight 1
    HTTP_GET {
      url {
        path /cgi-bin/check_web.pl
        digest a2299f35097ad6794a9983b39e182f15
      }
      connect_port 80
      connect_timeout 60
      nb_get_retry 2
      delay_before_retry 10
    }
  }
  real_server 172.31.2.182 80 {

ditto?

    weight 1
    HTTP_GET {
      url {
        path /cgi-bin/check_web.pl
        digest a2299f35097ad6794a9983b39e182f15
      }
      connect_port 80
      connect_timeout 60
      nb_get_retry 2
      delay_before_retry 10
    }
  }
}

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

<Prev in Thread] Current Thread [Next in Thread>