Some random thoughts:
Looking through a copy of the paper via
http://www.cs.ucsb.edu/~acha/courses/99/290i/papers.html
it looks like the database content is spread based on server capacity
across the boxes. From the hints given it looks like you could build
something similar by doing this:
* Take a database say 1Tb in size.
* Randomly distribute this content over N servers, each with
varying capacity perhaps, once.
When doing a search, using wlc based balancing to pick a single server at
random. Then perform the search.
This will return X results. You know that server has a certain capacity
relative to the total capacity - ie 1/Yth. Therefore the number of pages
in the database could be approximately X * Y. (Which would explain why
some search engines return "of approximately X results")
[ This approach of searching etc makes alot of sense given the background
to the system appears to be harvest derived originally. (The comments
they make about treating machines as storage of non-vital info fits in
very similarly to our model of cached info.) ]
If that fails, the request can be passed onto the next server...
*If* Inktomi have a patent on this idea though (which follows naturally
from the architecture description), and I wouldn't be suprised if they
did, to replicate their architecture without paying them a fee could be
dodgy. And given Inktomi power quite a few of the major search engines, I
wouldn't be suprised if they do have a patent on this.
If they don't then replicating the architecture using LVS looks simple on
the surface, but with probably many gotchas!
Michael.
--
National & Local Web Cache Support R: G117
Manchester Computing T: 0161 275 7195
University of Manchester F: 0161 275 6040
Manchester UK M13 9PL M: Michael.Sparks@xxxxxxxxxxxxxxx
On Tue, 1 Feb 2000, Ron King wrote:
> Hello,
> I haven't found much on Inktomi (Hotbot's) original architecture, but
> I did find 'Cluster-Based Scalable Network Services'. In this paper
> they say: 'Hotbot workers statically partition the search-engine
> database for load
> balancing'. 'Each worker handles a subset of the database proportional
> to its CPU power..'
> I want to use a cluster of 32-bit Linux systems to handle a large
> database of several hundred million web pages. I think this would
> require some sort of cluster front end, because all data queries would
> have to be sent to all of the nodes, and the results from the nodes
> combined before being presented to the user.
> Is LVS suitable for something like this? If not, can anyone recommend an
>
> existing open source system that can do this?
>
> Regards,
>
> Ron King
>
>
> ----------------------------------------------------------------------
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx
>
----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx
|