On Thu, May 23, 2013 at 11:44:22PM +0300, Julian Anastasov wrote:
> Hello,
>
> On Thu, 23 May 2013, Simon Kirby wrote:
>
> > Hello!
> >
> > Yes, see: http://0x.ca/sim/ref/3.9-ipvs/
> >
> > The case of just (__u64) on i386 looks not very good, but making the
> > weight also unsigned (__u32) seems to improve things. I set up a test
>
> Hm, only sizeof(long int) should differ between
> 32-bit and 64-bit platforms, atomic_t is always int, int
> is returned from atomic_read and int is always 4 bytes.
It's always 4 bytes, but gcc does more stuff with u64 * s32 than it does
with u64 * s32. atomic_t seems to be int, so s32 on either arch. :)
For the u64 * u32 case, it is using the single operand form of imul which
puts a 64-bit result in eax:edx, and then does an expanded unsigned
comparison, as expected. The u64 * s32 case uses a mix of 2 imul and 2
mul and some other things.
> > harness (ipvs.c) and disassembled i386 and amd64 compiler outputs for
> > both. The only reason I haven't submitted it yet is that I haven't yet
> > confirmed that it fixes our problem in production, though it did seem to
> > work in testing. Will follow-up shortly.
>
> Last time I also checked the assembler output from x86-32
> and it looked good. I used 'make net/netfilter/ipvs/ip_vs_wlc.s'
> to generate asm output, note the 's' extension.
>
> May be problem in your ipvs.c file comes from the
> fact that __u64 is unsigned long long for 32-bit platforms.
> I changed the code to use 'unsigned long' as follows:
>
> #if 0
> typedef unsigned long long __u64;
> #else
> typedef unsigned long __u64;
> #endif
>
> and the x86-32 platform works correctly.
> x86-64 works for both cases: 'unsigned long long' and
> 'unsigned long' but x86-32 generates many mul operations
> for the 'unsigned long long' case which is not used in
> the kernel. That is why I didn't noticed such problem.
>
> So, I think we should be safe by adding
> (__u64) without any (__u32), next days I'll check again
> the asm output. May be (__u32) before weight ensures
> 32x32 multiply but it should not be needed.
Hmm, I was comparing atomic_t being s32 versus u32, not u64 being u64. :)
Anyway, the .s results are much easier to read, and (closer to) reality!
I did a comparison with (__u64)loh * atomic_read(dest->weight) versus
(__u64)loh * (__u32)atomic_read(dest->weight) on both arches and uploaded
them to http://0x.ca/sim/ref/3.9-ipvs/. It's not a huge difference, but I
prefer the shorter/faster version. ;)
Simon-
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
|