LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Threaded Health Checkers design

To: jacob.rief@xxxxxxxxxxxx
Subject: Threaded Health Checkers design
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Cc: keepalived-devel@xxxxxxxxxxxxxxxxxxxxx
From: Alexandre Cassen <alexandre.cassen@xxxxxxxxxx>
Date: Sun, 29 Feb 2004 00:08:03 +0100 (CET)
Hi Jacob,

I have done some little thought to the THC design. First of all this
is a nice feature, so thanks for investigating and dedicating time on
this task.

While making some paper thought, I realize that we can extend this
THC design to be global to both Keepalived features. I mean for
healthchecking as for VRRP part ! :) that would be nice.

1. Cons: Need a more quickly way to develop new checker so there is a need
   for a plugin framework. THC comes here. OTOH, we might want to drive
   VRRP FSM accoridng to third party checkers, that way a VRRP instance
   will force state transition according to a specific checker result,
   this is useful for fault tolerant appli like database. Using VRRP
   with this extension we will be able to bring application takeover
   with the speed of VRRP FSM takeover (less than 3s).

   Current checker design is only available for LVS part and use an
   asynchronous design based on central I/O multiplexing. We can call
   this design as realtime checking. To extend Keepalived design to
   support THC we will use unix pthread and launch one thread per
   checker (cf. Jacob's doc for advantages).

2. Pros: To support both extension we need to specify the keywords and
   the way we want to implement this. First, we want to keep things as
   simple as possible hidding some complexity inside software.

    . We consider using 'plugin' to be a unix shared lib. All plugins
      are localized in the same place. This location is browsed during
      daemon bootstrap. We need to keep it generic and global this is
      why I propose to localize this in the global_defs block :

      global_defs {
          notification_email {
              <EMAIL ADDRESS>
             <EMAIL ADDRESS>
              ...
          }
          notification_email_from <EMAIL ADDRESS>
          smtp_server <IP ADDRESS>
          smtp_connect_timeout <INTEGER>
          router_id <STRING>
          plugin_checker_location <STRING>
      }

      NB: Note we replace lvs_id by router_id which sound more generic.

    . During daemon boot strap, the 'plugin_checker_location' will be
      browsed and a callback function for all shared lib will be called
      in order to import related infos into Keepalived subsystem
      (keyword, ....). Each checkers will export a keyword function that
      will be called during daemon initialization. That way we will have
      a generic way to implement checkers and all checkers related will
      be localized inside the same place (the checker code). To permit
      this design the keepalived's check_api code need to be extended
      to support plugin keyword callback launch. This design will hide
      internal Keepalived complexity to hecker plugin hackers.

    . In order to make checker generic for LVS and VRRP part we need to
      move the checker keywords place inside the parsing keyword tree.
      We need to put them the same level as virtual_server and vrrp_instance
      keywords. Puting all checkers at the same keyword level will introduce
      name space collision for realtime design checker and threaded checkers
      (known as THC). So we can propose the begining keyword to be fixed like:

      - RT_HTTP_GET, RT_SSL_GET for checkers implemented using the realtime
        I/O MUX.

      - TH_HTTP_GET, TH_SSL_GET for checkers based on pthread.


3. Checkers specs: A checker tests service activity and returns a boolean result
   after performing checked service. So a checker is related to a service for
   the moment @IP & @PORT. Since we want this to be used in VRRP part too, we 
need
   to specify them inside the checker itself. So, we will focus virtual_server &
   vrrp_instance to their minimal role, IPVS related and VRRP related. All 
checker
   related will be localized inside checker configuration block. For example, if
   we specify the TH_HTTP_GET :

   TH_HTTP_GET <STRING> {
       connect_ip   <IP-ADDRESS>        # TCP port to connect
       connect_port <PORT>              # TCP port to connect
       delay_loop   <INTEGER>           # delay timer for service polling

       bindto             <IP ADDRESS>  # IP address to bind to
       connect_timeout    <INTEGER>     # Timeout connection
       nb_get_retry       <INTEGER>     # number of get retry
       delay_before_retry <INTEGER>     # delay before retry

       url {                            # A set of url to test
           path        <STRING>         # Path
           digest      <STRING>         # Digest computed with genhash
           status_code <INTEGER>        # status code returned into the HTTP 
header
           regexpect   <STRING>         # regular expression match
       }
       url {
           path        <STRING>         # Path
           digest      <STRING>         # Digest computed with genhash
           status_code <INTEGER>        # status code returned into the HTTP 
header
           regexpect   <STRING>         # regular expression match
       }
       ...
   }


4. real_server specs: Each real_server will refer one or more checkers. Config
   block can be : 

   real_server <IP ADDRESS> <PORT> {    # RS declaration
       weight <INTEGER>                 # weight to use (default: 1)
       inhibit_on_failure               # Set weight to 0 on healtchecker 
failure
                                        
       notify_up <STRING>|<QUOTED-STRING>    # Script to launch when
                                             #  healthchecker consider service
                                             #  as up.

       notify_down <STRING>|<QUOTED-STRING>  # Script to launch when 
                                             #  healthchecker consider service
                                             #  as down.

       checkers <LIST-OF-CHECKERS-STRING>    # Set of checker monitoring the 
service

   }

   => real_server presence in the IPVS pool will be drived by checkers status.
      if all are OK service is present ortherwise it is removed or inhibited.


5. vrrp_instance specs: We will not repeat the vrrp_instance configuration block
   here. Just consider we will add a new keyword as checkers like in the
   real_server configuration block. Say :

   vrrp_instance <STRING> {
       ...
       checkers <LIST-OF-CHECKERS-STRING>    # Set of checker monitoring the 
service
       ...
   }

   => checkers status will be part of the VRRP FSP as a protocol state
      transition key. if all are OK then the VRRP protocol will continue as
      normal FSM, otherwise vrrp_instance will be forced in FAULT state 
      until checkers reports is OK.


This is the most generic way AFAIK to deal with this integration. If you have 
any
comments, fill free to share opinion.

Best regards,
Alexandre
<Prev in Thread] Current Thread [Next in Thread>