Hi Jacob,
I have done some little thought to the THC design. First of all this
is a nice feature, so thanks for investigating and dedicating time on
this task.
While making some paper thought, I realize that we can extend this
THC design to be global to both Keepalived features. I mean for
healthchecking as for VRRP part ! :) that would be nice.
1. Cons: Need a more quickly way to develop new checker so there is a need
for a plugin framework. THC comes here. OTOH, we might want to drive
VRRP FSM accoridng to third party checkers, that way a VRRP instance
will force state transition according to a specific checker result,
this is useful for fault tolerant appli like database. Using VRRP
with this extension we will be able to bring application takeover
with the speed of VRRP FSM takeover (less than 3s).
Current checker design is only available for LVS part and use an
asynchronous design based on central I/O multiplexing. We can call
this design as realtime checking. To extend Keepalived design to
support THC we will use unix pthread and launch one thread per
checker (cf. Jacob's doc for advantages).
2. Pros: To support both extension we need to specify the keywords and
the way we want to implement this. First, we want to keep things as
simple as possible hidding some complexity inside software.
. We consider using 'plugin' to be a unix shared lib. All plugins
are localized in the same place. This location is browsed during
daemon bootstrap. We need to keep it generic and global this is
why I propose to localize this in the global_defs block :
global_defs {
notification_email {
<EMAIL ADDRESS>
<EMAIL ADDRESS>
...
}
notification_email_from <EMAIL ADDRESS>
smtp_server <IP ADDRESS>
smtp_connect_timeout <INTEGER>
router_id <STRING>
plugin_checker_location <STRING>
}
NB: Note we replace lvs_id by router_id which sound more generic.
. During daemon boot strap, the 'plugin_checker_location' will be
browsed and a callback function for all shared lib will be called
in order to import related infos into Keepalived subsystem
(keyword, ....). Each checkers will export a keyword function that
will be called during daemon initialization. That way we will have
a generic way to implement checkers and all checkers related will
be localized inside the same place (the checker code). To permit
this design the keepalived's check_api code need to be extended
to support plugin keyword callback launch. This design will hide
internal Keepalived complexity to hecker plugin hackers.
. In order to make checker generic for LVS and VRRP part we need to
move the checker keywords place inside the parsing keyword tree.
We need to put them the same level as virtual_server and vrrp_instance
keywords. Puting all checkers at the same keyword level will introduce
name space collision for realtime design checker and threaded checkers
(known as THC). So we can propose the begining keyword to be fixed like:
- RT_HTTP_GET, RT_SSL_GET for checkers implemented using the realtime
I/O MUX.
- TH_HTTP_GET, TH_SSL_GET for checkers based on pthread.
3. Checkers specs: A checker tests service activity and returns a boolean result
after performing checked service. So a checker is related to a service for
the moment @IP & @PORT. Since we want this to be used in VRRP part too, we
need
to specify them inside the checker itself. So, we will focus virtual_server &
vrrp_instance to their minimal role, IPVS related and VRRP related. All
checker
related will be localized inside checker configuration block. For example, if
we specify the TH_HTTP_GET :
TH_HTTP_GET <STRING> {
connect_ip <IP-ADDRESS> # TCP port to connect
connect_port <PORT> # TCP port to connect
delay_loop <INTEGER> # delay timer for service polling
bindto <IP ADDRESS> # IP address to bind to
connect_timeout <INTEGER> # Timeout connection
nb_get_retry <INTEGER> # number of get retry
delay_before_retry <INTEGER> # delay before retry
url { # A set of url to test
path <STRING> # Path
digest <STRING> # Digest computed with genhash
status_code <INTEGER> # status code returned into the HTTP
header
regexpect <STRING> # regular expression match
}
url {
path <STRING> # Path
digest <STRING> # Digest computed with genhash
status_code <INTEGER> # status code returned into the HTTP
header
regexpect <STRING> # regular expression match
}
...
}
4. real_server specs: Each real_server will refer one or more checkers. Config
block can be :
real_server <IP ADDRESS> <PORT> { # RS declaration
weight <INTEGER> # weight to use (default: 1)
inhibit_on_failure # Set weight to 0 on healtchecker
failure
notify_up <STRING>|<QUOTED-STRING> # Script to launch when
# healthchecker consider service
# as up.
notify_down <STRING>|<QUOTED-STRING> # Script to launch when
# healthchecker consider service
# as down.
checkers <LIST-OF-CHECKERS-STRING> # Set of checker monitoring the
service
}
=> real_server presence in the IPVS pool will be drived by checkers status.
if all are OK service is present ortherwise it is removed or inhibited.
5. vrrp_instance specs: We will not repeat the vrrp_instance configuration block
here. Just consider we will add a new keyword as checkers like in the
real_server configuration block. Say :
vrrp_instance <STRING> {
...
checkers <LIST-OF-CHECKERS-STRING> # Set of checker monitoring the
service
...
}
=> checkers status will be part of the VRRP FSP as a protocol state
transition key. if all are OK then the VRRP protocol will continue as
normal FSM, otherwise vrrp_instance will be forced in FAULT state
until checkers reports is OK.
This is the most generic way AFAIK to deal with this integration. If you have
any
comments, fill free to share opinion.
Best regards,
Alexandre
|