> the project homepage is : http://keepalived.sourceforge.net
Some thoughts on this topic:
1. If the checks are moved from the director to the real servers you
can relax the CPU activity in the director. If this does not sounds as
a problem for small clusters consider a setups with many virtual services
and with many real servers. The director is going to waste CPU cycles
only for checks. Not fatal for LVS (the network always has CPU cycles)
but for the checks.
Sure, but in my mind people who run big LVS infrastructure can run the
whole solution on a director with appropriate CPU.
Big director solution are chip today. But it can weaken the network
performances it is true (multiple tests like SSL checks can act that way
too with CPU...).
So we can imagine a solution where the director solution is a cluster of
two server :
1. One server for the VS gestion using the ipvs kernel module
2. The second for performing the keepalived checks triggers. This server
will communic via socket with the first server to pass add/remove
realserver from the pool.
=> In this solution we need to implement a communication composant that
listen for he ipvs director.
=> We can also imagine that when the ipvs director break down, a daemon
like hearthbeat moves the ipvs director functionnality on the keepalived
I am using Arrowpoint loadbalancer at work (CS50), and they perform
triggers checks like this on each loadbalancer. For administrators, i think
it is a good design to
locate the keepalived functionnality. If the CPU is not so strong, we can
also create, using LVS, a virtual server with a cluster of keepalived
server. This can be a good
design too i think.
2. Don't copy the same work from the hardware solutions, most of them
can't run agents in the real servers and implement checks in different
director to set the weight based on expression from these parameters:
one expression for FTP, another for HTTP, etc
That you discribe here is the way like BMC BEST/1 or PATROL or other
monitoring platform work. For me adding an agent on each server
multiplicate administration task and introduce security vulerabilities (i
probably mistake... :) ).
If we do not want to depend on the plateform the realserver service run we
need to centralize the check triggers to the loabalancer or a single point
check. A monitoring environnement based on a couple
of collector/monitoring console are extremly OS dependent. In a really
first realse of keepalived I had used monitoring agent based on a simple
protocole frame to communicate with a centralized monitoring tools. But my
environnement is really eterogeneous (Oracle OAS, IIS, Netscape, Apache in
the same realserver pool), so to factorise a limite the OS dependent dev I
have emplemented a design centralized to a single point using network
scanning technic to perform check.
3. Of course, there are other ways to set the weights - when they
are evaluated in the director. This can include decisions based on
response times (from the L7/4/3 checks), etc. Not sure how well they
are working. I've never implemented such risky tricks.
Yes !!! :) response time and the ability to check application performance
is a great and VERY interresting functionnality that we can add to such
daemon. We can use a dynamic structure registering statistics about each
server response time... if the response time decrease or change, we can
modify the cluster performance and made him fully dynamic on hte
applications performance. We can so define here a "weighted performance"
variable like the LVS weight. We can also use some great fairequeing
functionnality that is present in the advanced routing functionnality to
adjust ip stream using kernel call to QOS framework.... really a good think
to do here :)
4. User defined checks: talk with the real service and analyze these
A macro language definition ... a small language to define checks and use
hardcoded primitives (tcpcheck, httpget, ...) to define action on result...
5. NAT is not the only used method. The DR and TUN methods don't allow
the director's checks properly to check the real services: the real
service listens to the same VIP and it is hard to generate packets
in the director with daddr=VIP that will avoid the routing and will
reach the real server. They don't leave the director. What means this:
we can't check exactly the VIP:VPORT in the real service, may be only
RIP:VPORT ? This problem does not exist when the checks are performed
from the real service, for example the L4 check can be simple bind()
to VIP:VPORT. Port busy means L4 succeeds. No problems to perform
L7 checks. Sometimes httpd can listen to many virtual domains with
bind to 0.0.0.0. Why we need to perform checks for all these VIPs
when we can simply check on of them. Many, many optimizations, User
6. Some utilities or ioctls can be included in the game: ip,
ipchains or the ioctls they use. This allows complex virtual services
to be created and to support the fwmark-based virtual services.
Yes it is in my focus : adding multiple kernel functionnality wrappers...
for fwmark, qos, ...
7. Redundancy: keepalive probes to other directors, failover times,
This can be a very very long discussion :)
Of course yes !!! :))) I think many interresting things .... do not give an
8. point otherwise i will not stop coding !!!! :))