Hi Horms, hi Joe
First, thank you for your pointers!
> As Joe mentioned in a subsequent email, being able to move LVS from one
> chain to another is something that we are interested in. In particular
> I am of the believe that the FORWARD chain would be a much more logical
> home than LOCAL_IN as in some ways would allow LVS to act more like a
> router than a proxy (not that it is a proxy, but it kind of behaves like
> one in some ways because of its home on LOCAL_IN).
i thought to choose PRE_ROUTING because packets which LVS decides have
to go to Local need to pass PRE_ROUTING again, in order to be able to
DNAT them to the local proxy.
+ NF_HOOK_THRESH(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev,
+ NULL, ip_rercv, NF_IP_PRI_MANGLE);
+ return NF_STOLEN;
within the xmit function. This time i really will attach the patch, so
everyone also able to see what i am speaking about :)
Or does this not make any difference if i retransmit them to PRE_ROUTING
from within FORWARD? Maybe there then is to drop the routing decision?
> As I recall, I did try moving the code to the FORWARD chan a long time
> ago. I believe that the change was very similar to the LOCAL_IN to
> PRE_ROUTING snippet that you have below. I'm not sure that I ever posted
> the change, as I never tested it thorougly. So perhaps it too broke
> occasionally. In any case, this was a long time ago, and the kernel
Hmm, for me the move from LOCAL_IN to PRE_ROUTING is quite stable, as
far as i can say. I had a kernel with only that patch running for a
while. Certainly then it can forward only to slaves but not to Local.
So I think the problem is the retransmit code.
> As for debugging your problem. Providing the oops message - if any -
> might help. Hopefully there is a stack trace in there and that should
> start to point to where the problem is.
The problem is that after the kernel panic i cannot do anything. The
SysReq keys do not work and the serial console neither. So i am not able
to store the trace just in time, or even see it before the monitor
I did not found any procedure to reproduce it at a given time. It
happens only occasionaly after some hours of load.
One time i had the chance to take a picture of a stack trace. Here are
<0>Kernel panic - not syncing: Fatal exception in interrupt
well, while i can't understand why i never see ip_rercv, i nevertheless
think it must be the retransmit code, because of the multiple
At this moment i have running a test-kernel where i tried to substituted
the NF_HOOK_THRESH() call within ip_vs_loop_xmit() simply with ip_rcv().
Only to see if the checks within ip_rcv before the final NF_HOOK() are
probably needed when i retransmit.
But in reality i virtually do not really know what exactly am doing, so
maybe that change is also ridiculous.
That kernel runs now for 15 hours. I wait for the next crash :)
Before, I added
around the part where i reset the initialized variable of the conntrack
item. I am unsure if this is really needed but it had no influences. The
kernel still crashed.
> If your kernel is compiled with IP_VS_DEBUG then you can enable
well, i do this, it was very helpful seeing where the packets go, but it
logs nothing suspicious before the crash.
> Also, if you are doing development work, I do recommend considering
> using a more up to date kernel. Perhaps the latest rc kernel, currently
> 2.6.23-rc6. I'm not suggsting that you neccessarily drop this into
> production. But for development work, it is much easier to work with
> the kernel guys if you are on the same page as them.
Sounds reasonable, i will do that :)
:: e n d i a n
:: open source - open minds
:: peter warasin
:: http://www.endian.com :: peter at endian.com