On Mon, 23 Feb 2015, Alex Gartrell wrote:
> We have a custom scheduler that is failing some atomic allocations
You can use GFP_KERNEL from add_dest method.
> sometimes, and it was made pretty hard to discover by the fact that we more
> or less drop the return code on the floor. This is especially nasty, as
> the ipvsadm -ln invocation shows the scheduler to be as we wish it were
> instead of as it is.
> This is an RFC because I have no idea what the best way to proceed with the
> upd_dest invocation is. I could clone all of the state and simply restore
> it, but I figured one of you might have a better idea.
Can a new create_dest(svc, bool for_update) method
solve such problem? Such modifications are serialized, so
create_dest can attach new structure to sched_data just before
the add_dest/upd_dest call. IPVS does not clone dests on
update but scheduler may need this for RCU update.
create_dest should be called early in __ip_vs_update_dest.
Note that kfree(dest) is not enough, see ip_vs_dest_free(),
more fields may be allocated before __ip_vs_update_dest is
dest_p is not needed, this patch is ok
In the __ip_vs_update_dest change you are missing
that ip_vs_rs_hash was called. In fact, I think we
should call __ip_vs_unlink_dest(svc, dest, 0) and
__ip_vs_del_dest(svc->net, dest, false)
from ip_vs_add_dest() when both __ip_vs_update_dest(, 1)
and ip_vs_new_dest() fail. This should work because
add_dest call is the last thing we do on ADD, so
the DEL operation but without calling del_dest should
be ok. Such change invalidates Patch 1.
Why above solution is preferred:
At first look, scheduler that can return error from add_dest
should not provide this dest to RCU readers via its schedule
method and kfree(dest) looks ok.
The problem is with the ip_vs_has_real_service() call
and the ip_vs_rs_hash() call we do before add_dest,
dest still can be accessed by RCU readers.
For this reason we can not just kfree(dest).
Not needed if patch 3 is changed
As for del_dest result, failure should not happen,
we should be able to flush dests on module cleanup. I think, there
should not be any allocations in this case.
Julian Anastasov <ja@xxxxxx>
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html