Re: lvs-rrd

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: lvs-rrd
From: "Salvatore D. Tepedino" <sal@xxxxxxxxxxxx>
Date: Tue, 20 Jan 2004 14:25:05 -0500
On Tue, 2004-01-20 at 07:38, Joseph Mack wrote:
> "Salvatore D. Tepedino" wrote:

> OK. What's the relation between lrrd, nagios and rrd? (or how do they 
> work)

Uh... no idea. Lets see... Nagios is a monitoring tool (from what I've
seen in the past minute. Never heard of it before). I guess it
integrates rrd in some way. 

Lrrs (It, apparently, "has been renamed to Munin") looks like a perl
script that makes it easier to remotely collect information for
insertion into the rrd files. 

Both of these are tools that use RRD, and are not part of the rrd
project at all, from what I can tell. Much like my project uses rrd
(only they've apparently spent more time getting things working nice and
making prettier web pages than mine ;)

> > I learned by example. 
> I haven't worked out whether I need SNMP or not. Apparently it's optional
> (from one of the other postings here), but the docs aren't clear.

Right. SNMP is just a way to collect the information. It's completely
optional. I'm using simple bash scripts to collect the data and jam it
into the rrd file. Then the graphing is a seperate action.

> > If you want, I can package up the scripts I use for my system
> > monitoring. 
> that would be great. Even better would be some intro text explaining
> how it works so we can change the setup if we want.

I'll send that off to you, but in the meantime, I'll post an example up

First, rrd file creation:

/usr/local/rrdtool/bin/rrdtool create bandwidth.rrd -s 300 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
RRA:MIN:0.5:1:600 \
RRA:MIN:0.5:6:700 \
RRA:MIN:0.5:24:775 \

Basically, this say, create a file 'bandwidth.rrd' that will collect
data once per 300 seconds (5 minutes) which includes two data sources
(the DS lines) named IN and OUT. The IN and OUT data sources are of type
'counter' (meaning they'll store a count of total bandwidth and will be
displayed in a Unit/second way). The 600 means that if there's a gap of
more than 600 seconds between updates, (should be one per 300 seconds)
consider it bad data and discard it. 0 is the minimum number expected,
and U (Unknown) is the highest number expected. These can be adjusted so
you can exclude obviously bad data, such as if your temperature
monitoring script tried to insert that the temp was 300.
The RRA lines collect longer term data points (the average, min and max)
as RRD discards data when it gets old enough and will just keep what you
specify. What the specific numbers mean for the RRAs, I forget. The docs
do that well, but this set op numbers works well for at least a year's
worth of data (I've not collected anywhere near that much, so I can't
say what happens then).
Basically, I copied most of the numbers and changed the DS names. Unless
you need something that needs data points more often than once per 5
minutes, don't bother learning what all this means (except for the DS

Next. Data collection:
eval `grep eth0 /proc/net/dev|cut -f2 -d:|awk '{printf
"IN=%-11d\nOUT=%-11d\n", $1, $9}'`

for i in IN OUT;do
        if [ -z ${!i} ]; then
                eval $i=U

/usr/local/rrdtool/bin/rrdtool update \
/home/websites/ N:$IN:$OUT

Simple bash script that grabs the raw byte numbers from proc, sets two
variables with the data and the update line jams it into the rrd file.
It's important to note that the order of the variables in the update
line is the same as the creation script. Otherwise you'll screw up your
the N in N:$var:$var just means "Now". You can add historical or future
data if you want, but I don't believe you can add data before your last
data point, so anything other than "N" is not really useful for normal
Also a good note: Don't worry too much about collecting data in exact 5
minute increments. If rrd gets data at odd intervals it knows enough to
take the time into account. So you can update once per second if you
want, and still get good data. You can even update slower than every 5
minutes and it will adjust the numbers accordingly (IE: if you update at
5:30, it will know to take 30 seconds worth of data and add that to the
next data point.)

Then, the fun part: Graphing

/usr/local/rrdtool/bin/rrdtool graph \
/home/websites/ -z \
-i --no-minor -s -1d -h 200 -w 600 -t "Bandwidth Usage - Day" -v \
"Bytes per second" \     
        CDEF:NIN=IN,-1,* \
        CDEF:background=IN,POP,LTIME,7200,%,3600,LE,INF,UNKN,IF \
        CDEF:backgroundN=background,-1,* \
        AREA:background#F3F3F3 \
        AREA:backgroundN#F3F3F3 \
        COMMENT:"In" \
        AREA:NIN#009900:" " \
        GPRINT:IN:MAX:"Max %3.2lf %SB/s" \
        GPRINT:IN:AVERAGE:"Ave %3.2lf %SB/s" \
        GPRINT:IN:MIN:"Min %3.2lf %SB/s" \
        GPRINT:IN:LAST:"Current %3.2lf %SB/s\r" \
        COMMENT:"Out" \
        AREA:OUT#0000FF:" "  \
        GPRINT:OUT:MAX:"Max %3.2lf %SB/s" \
        GPRINT:OUT:AVERAGE:"Ave %3.2lf %SB/s" \
        GPRINT:OUT:MIN:"Min %3.2lf %SB/s" \
        GPRINT:OUT:LAST:"Current %3.2lf %SB/s\r" \
        VRULE:$DAY#FF0000 \
        -c CANVAS#E5E5E5 COMMENT:\\n COMMENT:"$graph_date\r" \
        COMMENT:"up $graph_uptime\r"

Believe me, it looks more complicated than it is. 
The first part is just options passed to RRD
First 'graph' to, well, graph. Then specify the image file to be
created. -z is 'lazy' (make a new graph at most once per 5 minutes.
Reduces load if a lot of people are viewing the page, as the graphs are
only created on demand. So if no one view them for a day, they won't be
generated at all that day). -i is to create interlaced gifs (just a
'while loading' visual difference. Not important). --no-minor is a
visual preference to remove minor graph lines. I think it makes the
graphs look cleaner. -s -1d means 's'tart the graph 1 day ago (ending
now, by default). -h and -w are how big, pixel wise, to make the graph.
-t is for the title at the top of the graph and -v is the vertical
legend on the left side of the graph. 

Next, we DEFine a data point named "IN" from the data source named "IN"
from the rrd file (I named them the same for simplicity) and there's
also a DEFine for OUT lower, although I usually put them together at the
top now (this was an older setup). They only have to be defined before
they're used, so where they are DEFined is not too critical. 

CDEF is a means by which you can manipulate the data points. It uses
reverse polish notation equations. So in the first case I'm saying to
define a data point 'NIN'=IN*-1. So it makes IN negative for the graph.
Ignore the 'background' lines. They just make a pretty background for
the graph. 
COMMENT just sticks a comment below the graph.
AREA creates a filled area from 0 to the current data point. In this
case 'NIN', with a color of 009900 (html color value). The :" " means to
label this on the graph with a blank space. Done for cosmetic reasons (I
want a box with the color, but I already labeled it with the comment
line. If I leave out the :" " I won't get a box with color. This can be
useful in other areas). 
GPRINT is much like a printf line. It adds comments to below the graph.
In the first case, I'm telling it to print the MAX data point for IN
(over the time period covered, 1 day, with the label 'Max' format %3.2
and I forget what lf means. %S means to scale the output (5kB, 10MB,
etc) and the /s is just text (per second).
The VRULE creates a line at the end of the day (the variable is set
earlier in the script) and then CANVAS colors the background and there's
some more comments stuck on. 

Simple, right?
It's overwhelming at first, but it's not that bad once you've done a
few. Really.

Salvatore D. Tepedino <sal@xxxxxxxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>