LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] LVS + Database

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] LVS + Database
From: Anders Henke <anders.henke@xxxxxxxx>
Date: Thu, 14 Jun 2012 18:08:53 +0200
On June 14th 2012, YesGood wrote:
> And, but it's possible to build a load balancing cluster of database with
> LVS?
> where all clients connects to database servers in write and read mode.
> or
>  there are other best tools for this target?

It's possible, when the database servers do support such a pattern.

In terms of databases, you're looking for terms like "synchronous 
multi-master replication". As far as I know, PG doesn't support multi-master
replication out of the box, but there are a few third-party tools,
who claim to add such support to PG.

http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

looks like a good starting point.

Maybe you'd like to take a look at

http://www.postgresql.org/docs/current/interactive/high-availability.html
and
http://www.postgresql.org/docs/current/interactive/different-replication-solutions.html

as well (both for load balancing as well as HA).

The next question is wether the overall design is actually an improvement
to a simple active/passive HA cluster.

> I want to build a cluster with postgreSQL, and then I would use the port
> 5432 in the keepalived.conf
> For the script in the MISC_CHECK, there are some rules?
> And I don't know nagios, but with the nagios, what condition I should check?

Basically Nagios consists of a daemon, a web interface and a set of
so-called "plugins". The later ones are developed independently from
Nagios and are simple command line utilities available via 
http://nagiosplugins.org/.

For example this one:

---cut
$ /usr/lib/nagios/plugins/check_pgsql --help
check_pgsql v1.4.15 (nagios-plugins 1.4.15)
Copyright (c) 1999-2007 Nagios Plugin Development Team
        <nagiosplug-devel@xxxxxxxxxxxxxxxxxxxxx>

Test whether a PostgreSQL Database is accepting connections.

Usage:
check_pgsql [-H <host>] [-P <port>] [-c <critical time>] [-w <warning
time>]
 [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>]
[...]
 This plugin tests a PostgreSQL DBMS to determine whether it is active and
 accepting queries. In its current operation, it simply connects to the
 specified database, and then disconnects. If no database is specified, it
 connects to the template1 database, which is present in every functioning
 PostgreSQL DBMS.

 The plugin will connect to a local postmaster if no host is specified.  To
 connect to a remote host, be sure that the remote postmaster accepts TCP/IP
 connections (start the postmaster with the -i option).
[...]
---cut

would give a good start for such a "check_misc" skript.

> For the replicas in the database servers pool, I thinking work with the
> DRBD+ocfs2.

>From what you're writing, I do assume the following:
-you'll be installing PG on two nodes "behind" a LVS load balancer.
-DRBD is used to give all nodes a shared block storage.
-A shared OCFS2 filesystem will be run on top of the DRBD volumes.

In order to speed up database queries, a lot of information is being
cached in local memory (RAM) of each node. So if one node actually wrote
a block onto the ocfs2 filesystem in a not multi-master-aware DBMS, the
other nodes won't know about this changed information: the changing DBMS
isn't aware of any other DBMS to notify about this. If the other nodes were 
asked for the same information, they might answer with an locally cached, 
but outdated information.

So if your application is going to read from one PG node and write
requests will go to a different PG node, you may invalidate your data.

That's just a very simple example. In reality, your nodes will not
only write "wrong" records to their shared database, they may also
damage indexes, overwrite consistent data with inconsistent data or
simply shred your data in ways your DBMS didn't ever think about and
can't repair.

This issue may be skipped, if PG is configured not to use any caching at all 
and services every read request from disk (I don't know if that's possible). 
However, this way you're also skipping what makes PG fast: its cache in memory.
Retrieving data from memory is a lot faster than retrieving data from disk.


Depending on the exact configuration, writes may also be temporarily delayed 
in RAM of each node, but that's another story and may shred even more data 
much faster in this setup.

A multi-master-aware DBMS would lock affected database records per write 
(much like the lock manager for OCFS2 does) and ensure that any changed
data has also been written onto all other nodes. Network latency usually
becomes an issue here, that's why many of those systems do rely on
non-Ethernet-networks for their synchronization work.

Then the next dimension: even if this clustering wouldn't damage your data,
it most likely won't improve your performance as well. At least actual 
performance 
for writes are likely to suffice.

-DRBD does service read requests from local storage (unless that local
 storage has failed), so read requests are usually fast and not of an
 issue. Writing can be an issue, as write requests have to be performed by 
 both nodes in the DRBD cluster before the write requests may be marked
 as "complete". So your database is not only writing for the local disk
 to complete the "write" transaction, it's also waiting for the network
 to transmit the changed block, the other node to acknowledge and write
 that block to its local disk.
 At least, you're adding this network latency to every write transaction.
 DRBD's permits some tuning (protocol a-c), but this does add inconsistency,
 so for a shared filesystem, you're stuck with synchronous mode.

-OCFS2 requires a lock manager; this lock manager makes sure only one
 node at a time is able to write to a file on a shared filesystem.

 So if you're running two non-multi-master-aware database systems on
 the same shared file system, only one of them is actually able to write
 to the shared transaction log. The other one will either throw an error
 message (->no additional performance) or wait until the lock is available
 (->no additional performance).

-OCFS2 works using block sizes from 512 Byte to 4kb. DRBD works on top of a 
 4kb internal block size. So in probably the worst scenario, you're using
 OCFS2 with 512 Byte block size and for every OCFS2 block write, DRBD
 will attempt to re-sync 4kb of actual data.
 If DRBD and OCFS2 aren't properly aligned to each other, this may eat
 up any remaining performance pretty fast.

> And use heartbeat for the health-checking monitoring in the pool of
> database servers.
> It's a feasible scenario?

Heartbeat is a complete cluster manager and not a monitoring tool.
It's usually used for automatically managing what needs to be done when
one node fails or a failed node later shows up again.

If you're considering DRBD-replication (which is limited to two servers), 
I'd rather recommend to setup a "typical" active/passive HA setup.

One node is active, while the other node is only replicating any changes 
from the master. In order to improve performance, I'd rather recommend 
tuning the PG configuration. I'm no PG expert, so I can't give much advice 
on this; however, I'm aware that many linux distributions do initially
setup PG in a very conservative, extremely slow default configuration
with much room for improvement in terms of performance.


Anders
-- 
1&1 Internet AG              Expert Systems Architect (IT Operations)
Brauerstrasse 50             v://49.721.91374.0
D-76135 Karlsruhe            f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, 
Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler,
Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>