wil has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I've got the code below working to pick a file from a directory at random and print it to the browser. This works great, but as it is used as a front page for a busy website, I want to look at ways to:

a) speed it up
b) use less system resourcs, i.e. memory

The files are all named index.foo.html where foo can be absolutely anything, and the list is ever changing, so listing the file names in an array or a hash is not an option.
my @files = glob '/home/test/_html/index/index.*.html'; open FILE, '<' . $files[rand @files] or die $!; print while <FILE>; close FILE;

I am basically just looking at ways of making this more efficent, so I'm posting here to see what kind of responses I get and see if anyone has any ideas on how to make this more sleek :-)

Thanks!

- wil

Replies are listed 'Best First'.
Re: Pick random file and print
by japhy (Canon) on May 07, 2002 at 12:49 UTC
    Here's a commonly used trick, adapted from "get a random line from a file":
    my ($n, $file); opendir DIR, $path or die "can't opendir $path: $!"; while (defined (my $f = readdir DIR)) { next if -d "$path/$f"; $file = $f if rand(++$n) < 1; } closedir DIR; $file = "$path/$f";

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Thank you for the replies fellow monks!

      I haven't been able to find much on PM about glob v. readdir but what I did find (I don't use glob, I use readdir) suggests that I should be using readdir, and japhy, your code uses readdir so I assume this is the way forward. I guess the next step is to benchmark.

      Thanks for your help!

      - wil
Re: Pick random file and print
by Dog and Pony (Priest) on May 07, 2002 at 12:41 UTC
    That all depends on what is acceptable I guess. If you want to avoid a lot of IO being done by your script, you could always redirect the browser to the chosen random file. See the method redirect in CGI.pm. That will, of course, change your url to this new place, and of course these files must be available via the web browser.

    If you want to mask the url, there are still ways to do this. Produce a one-frame frameset that links the chosen page in, for instance.

    I really should look this up instead of just tossing it out, but I think it might be possible to stat the directory that these files are in and see if it has changed - in effect, see if any files has been added/deleted. This would allow you to cache your list of index files until further notice. Depending on your setup however, this may or may not be a possible/effective way - meaning, where do you store this list?

    If you really want to read the directory and then push the contents to the browser, then I don't see what improvements might be done at all... then again, I'm no optimizing expert I guess. :) Maybe someone will tell us if glob or readdir is the faster/more effective though.


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
      I should have clarified this a bit better. The files reside in a directory beneath the www directory, and I think the overheads on loading CGI.pm is too great (?).

      I wonder if throwing the HTML data into an SQL table and grab the info from there would speed things up. I mean; is this faster than accessing and printing a file? I doubt it, as this is what Perl fundamentally does best, but I really am not sure.

      Thank you for your thoughts, though - much appreciated!

      - wil
        If you're loading CGI.pm for each request, then you must not be using mod_perl, which would give you a performance boost just by avoiding having to load perl repeatedly.

        Once you're there, you can also start caching data from one request to the next by using apache's fastcgi module or by turning your cgi into an apache module itself. If you need to keep the list of available files up-to-the-second, you can store a database connection for future requests. If it can lag a bit, then just keep the list of files around - each server process (assuming apache) will, by default, die after handling 50 requests, at which point your cached data will disappear and be reloaded by the new server spawned to replace it.

        Going to SQL without mod_perl/fastcgi, though, would just make things worse, due to the overhead of opening a new database connection for each request.

        You wouldn't really have to load CGI.pm, I guess. You could go with something lighter (look here for instance), or you could have a look at what CGI.pm is doing when you do print $q->redirect($your_url);. You should be able to emit the same with something like:
        print "Status: 302 Moved\n"; print "Location: $your_url\n\n";
        Instead of the usual "Content-type: text/html". The url can be relative as well, if that makes it easier. :) If you have cookies and other stuff this gets more complex, of course.

        Not that it matters much if you can't access the documents via the web server (was that what you meant with "beneath"?) and have no intention of moving them. :) If so, the frameset idea also goes bye-bye.

        As for the SQL table - probably not if you are not using mod_perl or something similar which has persistence. Doing SQL querys is very fast, but there is lots of overhead when opening a new connection to the database, so you probably lose anyways.

        Not much of ideas I guess - I still think it is about as good as you can do, unless you change approach, that was all. I guess this is gonna bite me hard when someone proves me *really* wrong. :)


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
Re: Pick random file and print
by perlplexer (Hermit) on May 07, 2002 at 12:37 UTC
    Well, how many files do you have in that directory? hundreds? thousands?
    The following may not be any faster but will certainly use less memory if you have a large number of files:
    my ($dir, $file, $rnd) = ('/home/test/_html/index/', undef, undef); opendir DIR, $dir or die $!; while (1){ $file = readdir DIR; last unless defined $file; next unless $file =~ /^index\.[^.]+\.html/; $rnd = $file; last if rand(10) > 5; } closedir DIR; die "No files found\n" unless defined $rnd; open IN, "<$rnd" or die $!; print while (<IN>); close IN;
    If you want to make it faster, consider switching to mod_perl. There isn't much else you can do here.

    --perlplexer
      It should be said, that in this snippet the alternatives are not equally possible (unless there're two of them, of course). rand(10) is greater than 5 in approximately every other case. (Approximately because rand(10) never yields 10.)
A reply falls below the community's threshold of quality. You may see it by logging in.