Derek Glidden wrote:
> Karl wrote:
> >
> > Derek Glidden wrote:
> > >
> > > Is it crazy that we should see _used_ (not just "allocated")
> > > file-handles on a config like this grow into the thousands and nearing
> > > the ten-thousands mark? Or does something somewhere, possibly one of
> > > the RedHat-supplied packages, have a leak that I should try to track
> > > down?
There could be a leak.
Could you tell me what version of the kit you are using?
eg 0.4.16-7
you'll get it from issuing
rpm -q piranha
There are updates available for the Redhat kit. You'll find them
on
ftp://people.redhat.com/kbarrett/HA
for the current stable release
and
ftp://people.redhat.com/kbarrett/HA/experimental
for the current 'I'm pretty damned sure it works' kit
>
> > >
> >
> > you might want to run lsof and see what process(es) are maintaining
> > large numbers of open files. we had a problem similar to this that was
> > caused by some errant logging in our code.
>
> Here's the problem:
>
> cat /proc/sys/fs/file-nr
> 16793 16555 262144
>
> lsof | wc -l
> 394
>
Thats a lot of files
> So there's 16161 handles that the kernel is reporting as actively used
> that don't show up from lsof. From what I understand, the numbers from
> the proc filesystem are "file handles allocated, file handles in use,
> and file-max" respectively. I may be wrong about these numbers, but
> after the first crash, we increased file-max from default of 4096 to
> 16384, whereupon it ran for three whole days before crashing again with
> "file-max" and "inode-max limit reached" errors, at which point we
> increased both values to 262144 as it currently stands, so I'm betting
> that we really are using 16K filehandles as it shows. The numbers also
> seem to steadily increase over the life of the LVS, at least up to a
> point, although I haven't been able to keep a close enough eye on the
> box over a long enough period of time to see if it's based on number of
> connections, number of ipvs/ipchains rulesets, lifetime of the box, etc.
I'm rebuilding the test rack I use in the lab, I'l;l leave something nasty
running over it for the next few days to see if I can get a similar problem
Phil
=--=
|