Hello,
On Mon, 5 Aug 2013, Simon Kirby wrote:
> > 1. same as my first email:
> >
> > int loh, doh;
> >
> > (__u64/__s64) loh * atomic_read(&dest->weight)
> >
> > In this case I see only one difference between
> > __u64 and __s64:
> >
> > - jb .L41 #,
> > - ja .L79 #,
> > + jl .L41 #,
> > + jg .L79 #,
> >
> > 2. Your patch:
> >
> > unsigned int loh, doh;
> >
> > (__u64) loh * (__u32) atomic_read(&dest->weight)
> > or
> > (__s64) loh * (__u32) atomic_read(&dest->weight)
>
> Did you mean here "(__s64) loh * (__s32) atomic_read(&dest->sweight)"?
>
> If not, the results for me on GCC 4.7.2 were what I posted at
> http://0x.ca/sim/ref/3.9-ipvs/.
Reading your reply I see that the key issue you are
missing here is that the type of loh/doh matters, it should
match the type of atomic_read (and its signedness).
As atomic_t is int we prefer loh and doh to be int.
The __u64/__s64 cast before loh/doh does not matter at all.
If both operands are not from same signedness the
32*32 optimization is not applied.
> > Both solutions generate code that differs only
> > in imul vs. mul. In internet I see that imul is
> > preferred/faster than mul. That is why I prefer solution 1,
> > it has less casts.
> >
> > So, I think you can change your patch as follows:
> >
> > 1. Use int for loh, doh. Note that some schedulers
> > use 'unsigned int' and should be patched for this
> > definition: NQ, SED, WLC
> >
> > 2. Use (__u64) prefix only, no (__u32) before atomic_read:
> > LBLC, LBLCR, NQ, SED, WLC
> >
> > (__u64) loh * atomic_read(&dest->weight) ...
> > (__u64) doh * ...
> >
> > 3. Explain in commit message that we find the
> > result64=int32*int32 faster than result64=uint32*uint32
> > and far better than using 64*64 multiply which is
> > a bit slower on older CPUs.
>
> I found that u64*u32 was faster than u64*s32, but I didn't check s64*s32.
It is faster because your loh/doh are u32 and single mul is
generated while for u64*s32 we have u64=u32*s32 which obviously
is not implemented with single imul/mul. Our goal is single imul.
> I checked now and found that u64*u32 and s64*s32 have the same number of
> output instructions on i386, with just the different signedness tests.
> Both actually use imul since it's the comparison after which it cares
> about for signedness.
IIRC, (u64) u32 * u32 uses mul if you compile
with optimizations (-O).
> But what actually makes sense? Do negative weights ever make sense?
__u64/__s64 cast does not matter because both
operands are positive values. As result, this solution
looks better:
int loh, doh;
(__s64) loh * atomic_read(&dest->weight)
because:
- both operands are 'int' => no extra casts before atomic_read
- 'int', not 'unsigned int' because imul is better than mul
Regards
--
Julian Anastasov <ja@xxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
|