Hi Joe,
The conventional LVS wisdom is that it's not a good
idea to build an LVS e-commerce website in which https
is persistent for long periods.
This is also a general wisdom of practical software engineering. Maybe
some AC or PWC guys are subscribed to this list? Listen carefully then!
The initial idea was that a long timeout allows
the customer to have a cup of coffee or surf to
other websites while thinking about their on-line
purchase.
Unless he has to buy the cup of coffee first at thinkgeek.
The problem with this approach is that the amount
of memory use is expected to be large and the director
will run out of memory. We've been telling people
Yes, if the timeout is very high, and people keep on klicking on the
site (thus the template never expires), and you've got low memory and a
lot of people are interested in your site.
to rewrite their application so that state is maintained
on the realservers allowing the customer to take an
indefinite time to complete their purchase.
Well, it depends what you want to offer. If it's an online shop like
amazon.com you certainly want to store the generated cookie or whatever
it is on a central DB cluster where every RS can connect to and request
for the ID if it doesn't already have one.
Currently 1G of memory costs about an hour of programmer's time
(+ benefits, + office rental/heating/airconditioning/equipment
:) I don't know about the expenses in the states but you can certainly
buy a lot of RAM over here for an hour of a programmer's time.
+ support staff). Since memory is cheap compared to the cost
of rewriting your application, I was wondering if brute
force might just be acceptable.
It's a completely different layer. It's about software engineering and
not about saving money. Yes, you can probably kill the problem temporary
by adding more memory but a broken application framework remains a
broken application framework.
Plus, normally when you do build an e-commerce site, you have a customer
that has outsourced this task to your company. So you do a C-requirement
and a feasability study to provide the customer with a proper cost
estimation. Now you build the application and it is built in a broken
way so that you need to either fix it or add more RAM in our case. The
big problem here is:
o you might have a strict SLA that doesn't permit this
o you change the C-requirements and thus you need a new test phase
o the customer gets upset because she spent big bucks on you
It's lack of engineering and a typical situation of plain incompetence:
When you earnestly believe you can compensate for a lack of skill by
doubling your efforts, there's no end to what you can't do.
But all this also depends on the situation. I don't think we can give
people a generalised view of how things have to be done. One might argue
that people come to this project because of monetary constraints and
they sure do not care about the application if the problem is solved by
putting more RAM into the director.
I for example rather spend a few bucks on good hardware and a lot of RAM
for the RS because they need to carry the execution weight of the
application. The director is just a more or less intelligent router.
I can't find any estimates of the numbers involved in the HOWTO
although similar situations have been discussed on the mailing
list eg
We actually have but I can't remember where. It was back in 2000 or so ;).
http://marc.theaimsgroup.com/?l=linux-virtual-server&m=99200010425473&w=2
there the calculation was done to see how long a director would
hold up under a DoS. The answer was about 100secs for 128M memory
and 100Mbps link to the attacker doing a SYN flood.
Yes.
I'm not running one of these web
sites and I don't know the real numbers here. Is amazon.com
or ebay connected by 100Mbps to the outside world?
They might be. AFAICR google.com is/was running LVS and they certainly
have this connection. Some of our customers do have such fat pipes too :).
What you can do with 1G of memory on the director?
It depends.
each connection requires 128bytes. 1G/128 is 8M customers
online at any one time. Assuming everyone buys something
this is 1500 purchases/sec. You'd need the population of
a large town just to handle shipping stuff at this rate.
First of all, you can't use 1G out of 1G for the LVS. Maybe 750MB or
800MB but not 1GB. And then you have a normal TCP timeout of 2 Minutes
per template. Now yes, you could actually have 6-8M potential customers
but the problem with your thinking is that you assume that as soon as
the template is created it is destroyed again within a second. But this
isn't the case. It will at least remain for 2 Minutes (Julian correct me
if this value is wrong, because I do not have the code not a box to check).
So you get 6M customers and the LVS is dead until the first one decides
to move away from the site and then still it need 2 Minutes to free that
RAM (a bit untechnically spoken). Now if the guy decides to come back
within those 2 Minutes the template timer will simply be updated and
still noone can reach the site.
I doubt if any website at peak load has
8M simultaneous customers.
It's not about 6M for a peak (6M is only for the first 6M) but about
8M/120s for best effort which is 50000 and this is for a low timeout of
2 Minutes. That means after the initial fill of the template space in
RAM you max out at 50000 conns/s with 1GB because the old templates do
not get release while the timer is still active because we assume that
the customer wants to come back during a certain amount of time
(persistency).
50000 conns/s seems like a high number, and in fact it is, but now
consider someone putting the timeout to something insane like 15 Minutes
and you get 6M/900 = 6666 conns/s which is not a lot. Think of a
connection request with about let's say 200bytes, you get:
ratz@laphish:~ > echo "6666*200/1024/1024*8" | bc -l
10.17150878906250000000
ratz@laphish:~ >
That's only 10Mbit/s of requests. I'm pretty sure that amazon has a
higher request rate.
However you only have 64k ports on each realserver to
connect with customers allowing only have 64k
customers/realserver. How much memory do you need on
the director to handle a fully connected realserver?
64k x 128 = 8M
Julian already gave the answer.
Let's say there are 8 realservers. How much memory
is needed on the director?
8 x 8M = 64M
this is not a lot of memory. So the problem isn't
memory but realserver ports AFAIK
What is the minimum throughput of customers assuming
they all take 4000 sec (66 mins) to make their
purchase?
8 x 64k/4000 = 64 purchases/sec
You're still going to need a hire a few people to pack and
ship all this stuff. If people use only take 6mins
for their purchase, you'll be shipping 640 packages/sec.
This is all wrong based on the assumption that the ports are the
restriction.
Assuming you make $10/purchase at 64 purchases/sec, that's
$2.5G/yr.
Please use Gauss for such an estimation ;)
So with 64M of memory, 8 realservers, 4000sec persistence
timeout, and a margin of $10/purchase I can make a profit
of $2.5G/yr.
No offense to you Joe (since you're not an American anyway), but I think
if business income was to be calculated in that manner I would start
to understand the economical problems of the USA. :)
It seems memory is not the problem here, but realserver
ports (or being able to ship all the items you sell).
No.
Let's look at another use of persistence - for squids
(despite the arrival of the -DH scheduler, some people
prefer persistence for squids).
Ok.
Here you aren't limited by shipping and handling of purchases.
Instead you are just shipping packets to the various target
httpd servers on the internet. You are still limited to
64k clients/realserver. Assume you make persistence = 256secs
(anyone client who is idle for that time is not interested
in performance). This means that the throughput/realserver is
256hits/sec. This isn't great. I don't know what throughput
to expect out of a squid, but I suspect it's a lot more.
This is not what we've been saying on the mailing list.
Have I missed something?
Yes, but Julian told you already.
Hope this helps and best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
|