LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: Re: [RFC PATCH 1/1] IPVS netns shutdown/startup dead-lock

To: "Julian Anastasov" <ja@xxxxxx>
Subject: Re: Re: [RFC PATCH 1/1] IPVS netns shutdown/startup dead-lock
Cc: "Hans Schillstrom" <hans.schillstrom@xxxxxxxxxxxx>, horms@xxxxxxxxxxxx, wensong@xxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx
From: "Hans Schillstrom" <hans@xxxxxxxxxxxxxxx>
Date: Tue, 14 Jun 2011 00:09:26 +0200 (CEST)
>---- Original Message ----
>From: Julian Anastasov <ja@xxxxxx>
>To: "Hans Schillstrom" <hans.schillstrom@xxxxxxxxxxxx>
>Cc: horms@xxxxxxxxxxxx, wensong@xxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, 
>netdev@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx, hans@xxxxxxxxxxxxxxx
>Sent: Mon, Jun 13, 2011, 23:23 PM
>Subject: Re: [RFC PATCH 1/1] IPVS netns shutdown/startup dead-lock
>
>Hello,
>
>On Mon, 13 Jun 2011, Hans Schillstrom wrote:
>
>> ip_vs_mutext is used by both netns shutdown code and startup
>> and both implicit uses sk_lock-AF_INET mutex.
>> 
>> cleanup CPU-1         startup CPU-2
>> ip_vs_dst_event()     ip_vs_genl_set_cmd()
>>  sk_lock-AF_INET     __ip_vs_mutex
>>                      sk_lock-AF_INET
>> __ip_vs_mutex
>> * DEAD LOCK *
>
>       So, sk_lock-AF_INET is locked before calling
>ip_vs_dst_event ? Do you have a backtrace for this case?

Yes plenty this one is with lockdep

Chain exists of:
  rtnl_mutex --> __ip_vs_mutex --> sk_lock-AF_INET

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET);
                               lock(__ip_vs_mutex);
                               lock(sk_lock-AF_INET);
  lock(rtnl_mutex);

 *** DEADLOCK ***

3 locks held by ipvsadm/993:
 #0:  (genl_mutex){+.+.+.}, at: [<ffffffff812edc52>] genl_lock+0x17/0x19
 #1:  (__ip_vs_mutex){+.+.+.}, at: [<ffffffff81307dcb>] 
ip_vs_genl_set_cmd+0xe1/0x3a3
 #2:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8130ffc1>] 
start_sync_thread+0x3ec/0x5ff


>I assume your patch is tested to fix the problem ?

Yes.

>
>> This can be solved by have the ip_vs_mutex per netns
>> or avid locking when starting/stoping sync-threads.
>> i.e. just add a starting/stoping flag.
>
>       sk_release_kernel is called in thread
>context, so ip_vs_mutex is not involved there. We
>have a problem only with start_sync_thread, right?

Yes the socket operations

>
>> ip_vs_mutex per name-space seems to be a more future proof solution.
>
>       Global mutex protects some global lists such as
>virtual services. If your patch works, better way to fix this problem
>is to use some new mutex. May be we can move the IPVS_CMD_NEW_DAEMON,
>IPVS_CMD_DEL_DAEMON and IP_VS_SO_GET_DAEMON code before the
>__ip_vs_mutex locking. This mutex should be used for start_sync_thread,
>stop_sync_thread, ip_vs_genl_dump_daemons and IP_VS_SO_GET_DAEMON.
>For example, ip_vs_sync_mutex.

I think we should avoid global mutexes as a rule of tumb, 
because it's realy hard to keep track of all possible cases 
that can occur when multiple netns is alive and/or goes up and down.

There might be more suprises while a netns exits (in terms of locks)...
my gut feeling is, avoid global locks as long as possible.

>
>       Note that __ip_vs_sync_cleanup is missing a
>__ip_vs_mutex lock. We have to use the new mutex there.

OK

>
>> Which one should be used ?
>
>       For now __ip_vs_mutex should be global ...

I do agree, but in the long term I vote for mutex per netns.
 
Thanks a lot Julian

Regards
Hans

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>