sadarax has asked for the wisdom of the Perl Monks concerning the following question:

I am looking for something to help share the outgoing bandwidth of a server. This is not sharing the processes load of the server, just the file-uploads. If there is a system like this already available for free use, please tell me.

The Design: There is the main server (with the website), and volunteers offer their own computers and bandwidth to act mirrors of the main website. End users ask the server for a file. The server tells the volunteer computers to fulfill the request. The user receives their file and never knows the difference. Note: the volunteer computers would not have the same URL or IP addresses as the main website.

Diagram of the general process

Is it possible to make a system like this? Especially, is it possible so the website end users never know the difference, but the server is successfully sharing the file-uploads with the volunteers?

If it is possible, I would like some resources and information about getting starting with writing this system.

Using Bittorrent is not an option for this situation, both in principle and practice. If you want to know read below.

I have written some code for a very basic server and client that talk to each other. Things I will probably need to know:

Of course, any other information you good Monks feel is useful for me to know about server would also be appreciated.

Some other programmers suggested a system like this might be the most possible. When the user goes to www.website.com, the server opens an 'empty' page with a script running on it that pulls the data from the volunteer computers and renders those files on the page for the user as normal. Not dynamically generated pages, just accepting the files sent from the volunteer computers (similar to hotlink systems).

Diagram of the frame-script general process

My idea in principle is very similar to bittorrent, but bittorrent itself is not at all suitable for use with a normal website.

Reasons why Bittorrent is not a practical idea:

1) Torrents usually need to be completely downloaded or a large chunk having been completed before any of the data is readable. This would make browsing a website a long and arduous task, and it would make casual browsing completely impossible. But worse it would require several times more bandwidth to complete the torrent. In the end the user would be given many files they might not be interested in at a significant cost to the website in bandwidth, ultimately making the upload bandwidth problem much worse.

2) Torrent files are very static, and the only way to update them is by creating another torrent file. This would not do for a website that updates a little bit each day. It would require the user to download and run a new torrent file each time.

3) To use a torrent, the user use a torrent program, not a webbrowser. This already means that there is more work for them just to view the site casually, which is not good for popularity. My system would be integrated relatively seamlessly on the server side and the end-user would not need to behave any different.

Thanks

Replies are listed 'Best First'.
Re: Bandwidth upload sharing
by Corion (Patriarch) on Oct 13, 2007 at 14:00 UTC

    I guess that the issue of trust or compromised other machines is solved for you.

    I think that you have two approaches, which basically both amount to the same solution you already have, the (i)frame-based approach. The alternative to using (i)frames is to use HTTP redirects to the final (client) URLs and possibly to have a (very) dynamic DNS so you have some control over the caching and clients:

    • The main server is both, the HTTP and DNS server for your domain (sadarax.com for example)
    • All clients get a dynamic DNS entry and register themselves by GETting an URL crafted specially for them. Say client1.sadarax.com and client2.sadarax.com, while your main server is www.sadarax.com.
    • All clients have a complete mirror of the website (provided by rsync) or the central server knows which clients have which resources available.
    • The main server serves all "main" HTML pages but other, heavy URLs on the pages, like images, point to client1.sadarax.com or client2.sadarax.com, either randomly or whoever has the image available.
    • Optionally, the clients could also completely host the site and proxy-cache all requests using Squid or a custom Perl webserver. That way you would trickle down only the requested parts of the website.

    With that setup, you can distribute the website but still maintain central control over all clients by DNS because all adresses resolve through your central DNS server and should a client drop out or go rouge, you can simply update the DNS entry for that client to point to another IP address or your central server.

    Also see the Fast Flux Networks for something quite similar to what you want, even though there, most of the content again comes from a central, "mothership" server.

Re: Bandwidth upload sharing
by clinton (Priest) on Oct 13, 2007 at 15:24 UTC
    You could have a look at MogileFS which uses webdav to upload the files to whatever server is available, then later syncs the servers so that everything should be stored everywhere. When serving the files, it chooses the servers based on response, and so should distribute the load efficiently.

    It would be transparent to the user, and just relies on ordinary http hosts.

    Clint

Re: Bandwidth upload sharing
by NetWallah (Canon) on Oct 13, 2007 at 05:24 UTC
    Sounds like you are looking for BitTorrent.

    Azureus - Java BitTorrent Client in Ubuntu is one of the most popular implementations.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

      Unfortunately bittorrent is not a feasible option. I have edited the original post to include reasons why bittorrent is not a possible solution to the problem.
Re: Bandwidth upload sharing
by NetWallah (Canon) on Oct 14, 2007 at 04:53 UTC
    Another possibility, inspired by Corion ++ 's excellent suggestions is based on my interpration of your statements::
    • HTML content can be served by the main server.
    • It is the file download to new clients that needs to be distributed.
    • Registered Volunteer computers will provide appropriate, secure mechanisms for the new clients to download.

    Under these conditions, the server should periodically verify the availability and load on each volunteer computer, then modify the served download page(s) for each request such that the download request points to the next available volunteer computer (Round Robin). If all are busy, advise and place the clients in a queue, with a page that refreshes periodically with new status.

    With the power of perl, this should not be too difficult to implement.

    Cheers!

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re: Bandwidth upload sharing
by sadarax (Sexton) on Oct 14, 2007 at 10:30 UTC
    Thanks for the information everyone. I'm going to investigate all of these options. I do have something to add.

    If possible, I would like to modify the existing website as little as possible. Changing the links to content on thousands of archived pages would require a lot of work.

    Namely, my hope was the run the sharing-server, and then start the normal apache website within it. HTTP requests come to the share-server first, it forwards them into the Apache server within itself, the Apache issues a response, and the share-server has one of the node-sharing-clients fulfill the request.

    Are any of the options particularly well suited for this?

      Let me try to paraphrase, and restate your requirements, then analyze them.

      • You have your "normal apache web server" - let us refer to this as the "web server"
      • Then you have the "sharing-server", which, by coincidence/design may reside on the same hardware. Let us call this the "reverse web proxy", since that seems to be it's function. (Web requests hit there first).
      • In addition, you have a bunch of "volunteer" "node-sharing-clients". To me, this should be considered equivalent to a "web farm".

      If we agree on the terminology, this is a fairly standard setup, except for the fact that the web farm comprises of voluntary machines that may appear and disappear. The purpose of the "web server" is to provide the "master" set of files.

      The only additional suggestion I have is that you consider BitTorrent as a mechanism for re-distributing the web server content to the web farm.

           "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom