LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: activeconns * weight overflowing 32-bit int

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: activeconns * weight overflowing 32-bit int
Cc: lvs-devel@xxxxxxxxxxxxxxx, Changli Gao <xiaosuo@xxxxxxxxx>
From: Simon Kirby <sim@xxxxxxxxxx>
Date: Mon, 5 Aug 2013 19:41:50 -0700
Hello!

On Fri, May 24, 2013 at 11:11:35AM +0300, Julian Anastasov wrote:

> On Thu, 23 May 2013, Simon Kirby wrote:
> 
> > >   Last time I also checked the assembler output from x86-32
> > > and it looked good. I used 'make net/netfilter/ipvs/ip_vs_wlc.s'
> > > to generate asm output, note the 's' extension.
> > > 
> > >   May be problem in your ipvs.c file comes from the
> > > fact that __u64 is unsigned long long for 32-bit platforms.
> > > I changed the code to use 'unsigned long' as follows:
> > > 
> > > #if 0
> > > typedef unsigned long long __u64;
> > > #else
> > > typedef unsigned long __u64;
> > > #endif
> > > 
> > >   and the x86-32 platform works correctly.
> > > x86-64 works for both cases: 'unsigned long long' and
> > > 'unsigned long' but x86-32 generates many mul operations
> > > for the 'unsigned long long' case which is not used in
> > > the kernel. That is why I didn't noticed such problem.
> > > 
> > >   So, I think we should be safe by adding
> > > (__u64) without any (__u32), next days I'll check again
> > > the asm output. May be (__u32) before weight ensures
> > > 32x32 multiply but it should not be needed.
> > 
> > Hmm, I was comparing atomic_t being s32 versus u32, not u64 being u64. :)
> > Anyway, the .s results are much easier to read, and (closer to) reality!
> > I did a comparison with (__u64)loh * atomic_read(dest->weight) versus
> > (__u64)loh * (__u32)atomic_read(dest->weight) on both arches and uploaded
> > them to http://0x.ca/sim/ref/3.9-ipvs/. It's not a huge difference, but I
> > prefer the shorter/faster version. ;)
> 
>       I now see why your patch shows difference compared
> to my tests month ago. This change is the culprit:
> 
> -       int loh, doh;
> +       unsigned int loh, doh;
> 
>       It effectively changes the operation from:
> 
> (__u64/__s64) int * int
> 
>       into
> 
> (__u64) unsigned int * int
> 
>       that is why you fix it by using __u32:
> 
> (__u64) unsigned int * unsigned int
> 
>       so that both operands are from same 4-byte signedness.
> 
>       I think, we should keep loh and doh to be int, may be
> the following both solutions should generate 32x32 multiply:
> 
> 1. same as my first email:
> 
> int loh, doh;
> 
> (__u64/__s64) loh * atomic_read(&dest->weight)
> 
>       In this case I see only one difference between
> __u64 and __s64:
> 
> -       jb      .L41    #,
> -       ja      .L79    #,
> +       jl      .L41    #,
> +       jg      .L79    #,
> 
> 2. Your patch:
> 
> unsigned int loh, doh;
> 
> (__u64) loh * (__u32) atomic_read(&dest->weight)
> or
> (__s64) loh * (__u32) atomic_read(&dest->weight)

Did you mean here "(__s64) loh * (__s32) atomic_read(&dest->sweight)"?

If not, the results for me on GCC 4.7.2 were what I posted at
http://0x.ca/sim/ref/3.9-ipvs/.

>       Both solutions generate code that differs only
> in imul vs. mul. In internet I see that imul is
> preferred/faster than mul. That is why I prefer solution 1,
> it has less casts.
> 
>       So, I think you can change your patch as follows:
> 
> 1. Use int for loh, doh. Note that some schedulers
> use 'unsigned int' and should be patched for this
> definition: NQ, SED, WLC
> 
> 2. Use (__u64) prefix only, no (__u32) before atomic_read:
> LBLC, LBLCR, NQ, SED, WLC
> 
>       (__u64) loh * atomic_read(&dest->weight) ...
>       (__u64) doh * ...
> 
> 3. Explain in commit message that we find the
> result64=int32*int32 faster than result64=uint32*uint32
> and far better than using 64*64 multiply which is
> a bit slower on older CPUs.

I found that u64*u32 was faster than u64*s32, but I didn't check s64*s32.
I checked now and found that u64*u32 and s64*s32 have the same number of
output instructions on i386, with just the different signedness tests.
Both actually use imul since it's the comparison after which it cares
about for signedness.

But what actually makes sense? Do negative weights ever make sense?

Simon-
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>