in reply to Speeding up large file processing

Here's the new code that's making it run a lot faster now. The script reads a category descriptions file with 4000 lines, and retrieves the description for each of the categories that are to display on a page.

open(FILE, "$catdesc"); my @desc = <FILE>; close(FILE); chomp @desc; %category_descriptions = {}; ## Create a hash with the category names to display. foreach $directory_name (@subdirectories) { my($date_a, $directory_name) = split(/\t/,$directory_name); if($directory_name ne '') { $category_descriptions{"$FORM{'direct'}/$d +irectory_name"} = 1; } } ## Set the description for each category foreach $line (@desc) { my ($catname, $catdescription) = split(/\t/, $line); $catdescription =~ s/^\s+//g; # trim leading blanks... $catdescription =~ s/\s+$//g; # trim trailing blanks... next if (!$catdescription); # skip line if no description if($category_descriptions{$catname} == 1) { $category_descriptions{$ca +tname} = "<br><$font>$catdescription</font><br>"; } }

I'd really appreciate to know if there's even a faster way of doing this. Having it all migrated to MySQL would be great but the system would need too many modifications, that it's just as easy to change the whole thing.

Thanks,
Ralph

Replies are listed 'Best First'.
Re^2: Speeding up large file processing
by BrowserUk (Patriarch) on Jul 15, 2005 at 14:26 UTC

    How often does the information you are extracting from the directory structure change?

    Seems to me that instead of rebuilding the html representing the structure every time the cgi script is called, you should be maintaining a pre-built file that contains the html.

    When a request arrives to display the data, you just read that pre-formatted file and present it.

    When changes are made to the directory structure, you run a server process that re-creates the html file.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Hi, thanks for your reply!
      Yeah, I thought of doing something like that to speed it even further. However, the information changes randomly at different times since I'm not the one updating the content. For now, this solution is running at least fast enough for the current load. Regards, Ralph.

        In that case, I'd think seriously about setting up a daemon process to monitor the directory structure and update the html when it detects changes.

        Under Win32 I'd use Win32::ChangeNotify. There is a similar facility (PAM?) under Linux I believe.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.