Re: [lvs-users] LVS + Database

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] LVS + Database
From: Seth Call <sethcall@xxxxxxxxx>
Date: Thu, 14 Jun 2012 12:04:14 -0500
Great response Anders.  And I second that: PG is configured by default
extremely conservative.

There is a PG healthcheck added to HAProxy as a patch in 1.4 (easy to add
yourself), or you can use 1.5beta.;a=commit;h=38b4156a691557f4eda30445f0ef8ce61f280dfc

If you look at checks.c 's diff, you can 'see' what they are doing (you
have to understand a little more context of checks.c to really grok it, I
think.  I can tell it's somehow parsing data coming back from postgres but
I can't tell where or how.  I'd guess when HAProxy does a TCP connect to
postgres, you can add a check like this to actually look at the response
data to see if it looks good.)

Anyway, maybe haproxy is suitable for you or not, regardless, maybe the
code in this patch would help you script something better.

Or, you could see how you like this script:

On Thu, Jun 14, 2012 at 11:08 AM, Anders Henke <anders.henke@xxxxxxxx>wrote:

> On June 14th 2012, YesGood wrote:
> > And, but it's possible to build a load balancing cluster of database with
> > LVS?
> > where all clients connects to database servers in write and read mode.
> > or
> >  there are other best tools for this target?
> It's possible, when the database servers do support such a pattern.
> In terms of databases, you're looking for terms like "synchronous
> multi-master replication". As far as I know, PG doesn't support
> multi-master
> replication out of the box, but there are a few third-party tools,
> who claim to add such support to PG.
> looks like a good starting point.
> Maybe you'd like to take a look at
> and
> as well (both for load balancing as well as HA).
> The next question is wether the overall design is actually an improvement
> to a simple active/passive HA cluster.
> > I want to build a cluster with postgreSQL, and then I would use the port
> > 5432 in the keepalived.conf
> > For the script in the MISC_CHECK, there are some rules?
> > And I don't know nagios, but with the nagios, what condition I should
> check?
> Basically Nagios consists of a daemon, a web interface and a set of
> so-called "plugins". The later ones are developed independently from
> Nagios and are simple command line utilities available via
> For example this one:
> ---cut
> $ /usr/lib/nagios/plugins/check_pgsql --help
> check_pgsql v1.4.15 (nagios-plugins 1.4.15)
> Copyright (c) 1999-2007 Nagios Plugin Development Team
>        <nagiosplug-devel@xxxxxxxxxxxxxxxxxxxxx>
> Test whether a PostgreSQL Database is accepting connections.
> Usage:
> check_pgsql [-H <host>] [-P <port>] [-c <critical time>] [-w <warning
> time>]
>  [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>]
> [...]
>  This plugin tests a PostgreSQL DBMS to determine whether it is active and
>  accepting queries. In its current operation, it simply connects to the
>  specified database, and then disconnects. If no database is specified, it
>  connects to the template1 database, which is present in every functioning
>  PostgreSQL DBMS.
>  The plugin will connect to a local postmaster if no host is specified.  To
>  connect to a remote host, be sure that the remote postmaster accepts
>  connections (start the postmaster with the -i option).
> [...]
> ---cut
> would give a good start for such a "check_misc" skript.
> > For the replicas in the database servers pool, I thinking work with the
> > DRBD+ocfs2.
> >From what you're writing, I do assume the following:
> -you'll be installing PG on two nodes "behind" a LVS load balancer.
> -DRBD is used to give all nodes a shared block storage.
> -A shared OCFS2 filesystem will be run on top of the DRBD volumes.
> In order to speed up database queries, a lot of information is being
> cached in local memory (RAM) of each node. So if one node actually wrote
> a block onto the ocfs2 filesystem in a not multi-master-aware DBMS, the
> other nodes won't know about this changed information: the changing DBMS
> isn't aware of any other DBMS to notify about this. If the other nodes were
> asked for the same information, they might answer with an locally cached,
> but outdated information.
> So if your application is going to read from one PG node and write
> requests will go to a different PG node, you may invalidate your data.
> That's just a very simple example. In reality, your nodes will not
> only write "wrong" records to their shared database, they may also
> damage indexes, overwrite consistent data with inconsistent data or
> simply shred your data in ways your DBMS didn't ever think about and
> can't repair.
> This issue may be skipped, if PG is configured not to use any caching at
> all
> and services every read request from disk (I don't know if that's
> possible).
> However, this way you're also skipping what makes PG fast: its cache in
> memory.
> Retrieving data from memory is a lot faster than retrieving data from disk.
> Depending on the exact configuration, writes may also be temporarily
> delayed
> in RAM of each node, but that's another story and may shred even more data
> much faster in this setup.
> A multi-master-aware DBMS would lock affected database records per write
> (much like the lock manager for OCFS2 does) and ensure that any changed
> data has also been written onto all other nodes. Network latency usually
> becomes an issue here, that's why many of those systems do rely on
> non-Ethernet-networks for their synchronization work.
> Then the next dimension: even if this clustering wouldn't damage your data,
> it most likely won't improve your performance as well. At least actual
> performance
> for writes are likely to suffice.
> -DRBD does service read requests from local storage (unless that local
>  storage has failed), so read requests are usually fast and not of an
>  issue. Writing can be an issue, as write requests have to be performed by
>  both nodes in the DRBD cluster before the write requests may be marked
>  as "complete". So your database is not only writing for the local disk
>  to complete the "write" transaction, it's also waiting for the network
>  to transmit the changed block, the other node to acknowledge and write
>  that block to its local disk.
>  At least, you're adding this network latency to every write transaction.
>  DRBD's permits some tuning (protocol a-c), but this does add
> inconsistency,
>  so for a shared filesystem, you're stuck with synchronous mode.
> -OCFS2 requires a lock manager; this lock manager makes sure only one
>  node at a time is able to write to a file on a shared filesystem.
>  So if you're running two non-multi-master-aware database systems on
>  the same shared file system, only one of them is actually able to write
>  to the shared transaction log. The other one will either throw an error
>  message (->no additional performance) or wait until the lock is available
>  (->no additional performance).
> -OCFS2 works using block sizes from 512 Byte to 4kb. DRBD works on top of a
>  4kb internal block size. So in probably the worst scenario, you're using
>  OCFS2 with 512 Byte block size and for every OCFS2 block write, DRBD
>  will attempt to re-sync 4kb of actual data.
>  If DRBD and OCFS2 aren't properly aligned to each other, this may eat
>  up any remaining performance pretty fast.
> > And use heartbeat for the health-checking monitoring in the pool of
> > database servers.
> > It's a feasible scenario?
> Heartbeat is a complete cluster manager and not a monitoring tool.
> It's usually used for automatically managing what needs to be done when
> one node fails or a failed node later shows up again.
> If you're considering DRBD-replication (which is limited to two servers),
> I'd rather recommend to setup a "typical" active/passive HA setup.
> One node is active, while the other node is only replicating any changes
> from the master. In order to improve performance, I'd rather recommend
> tuning the PG configuration. I'm no PG expert, so I can't give much advice
> on this; however, I'm aware that many linux distributions do initially
> setup PG in a very conservative, extremely slow default configuration
> with much room for improvement in terms of performance.
> Anders
> --
> 1&1 Internet AG              Expert Systems Architect (IT Operations)
> Brauerstrasse 50             v://49.721.91374.0
> D-76135 Karlsruhe            f://49.721.91374.225
> Amtsgericht Montabaur HRB 6484
> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich,
> Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler,
> Dr. Oliver Mauss, Jan Oetjen
> Aufsichtsratsvorsitzender: Michael Scheeren
> _______________________________________________
> Please read the documentation before posting - it's available at:
> mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to
Please read the documentation before posting - it's available at: mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to

<Prev in Thread] Current Thread [Next in Thread>