Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a slight problem, I have a script which runs a contest site. At present there are three text files which store email addresses, at about 20,000 email addresses in each file.
In the admin area of this script I have to print out these lists in a scrollable text box, which is fine. My only problem is opening three files of this size takes ages - at the moment I am opening each file and using the while loop to print out each line. Is there a quicker way to do this? (I'm aware of storing it in memory and using a foreach loop, but apart from very resource demanding this isnt much quicker :( )

Thanks,

John

Replies are listed 'Best First'.
Re: Displaying a Large File
by tachyon (Chancellor) on Aug 31, 2001 at 23:37 UTC

    Opening the file will take a few milliseconds. Reading the whole thing a few milliseconds more. This is not your problem. Nor is memory an issue - this is a small file. Read it line by line or slurp it all in. It does not matter.

    Your problem is bandwidth. Let's say 15 chars per address 15 * 20,000 = 300,000 bytes. Sending 300,000 bytes (300kB) across the internet embedded in an HTML form (scrolling text box) is what takes the time. I assume this is what you are doing given the context.

    A 56k modem gets a max of 56,000 bits per second. Let's be optimistic and assume you actually get this. There are 8 bits in a byte so a 56K modem is good for 56,000 / 8 = 7000 bytes per second. So we now understand that 300,000 / 7000 = 42 seconds download. This is the minimum. In the real world double it or more as 56K only happens in perfect conditions. If you want speed you need to change your logic and limit the download.

    There are some tricks you can use to get the partially downloaded data to render (display) faster but if you want zeus@zod.com you will be waiting until it all comes down.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Displaying a Large File
by count0 (Friar) on Aug 31, 2001 at 23:10 UTC
    To get the quickest speed possible, you definitely don't want to load the files into memory. It's probably best to continue reading the file line by line.

    If you need to do any operations on the email addresses, such as changing, sorting, or deleteing certain ones.... you should certainly consider a RDBMS.
    Depending on your requirements your milage will vary, but in similar situations I've found that using MySQL and DBI proved to be considerably more time efficient than a flat file with tens of thousands (or more) records. I'd be glad to dig up those benchmarks when I get home if anyone is interested, and post them on Monday (right now my only internet connection is at work).
(crazyinsomniac) Re: Displaying a Large File
by crazyinsomniac (Prior) on Aug 31, 2001 at 23:05 UTC
    I'm sorry, but what is "a scrollable text box".

    I'm a dummy, and usually dare not venture a guess, but are you talking html? tk? jellybelly?

    Let's see, 20,000 emails to a file, at 3 files, with an average lenghth of email address 35 (just for kicks), + 1 character for newlines, that's roughly 60,000 * 35 =2,160,000 bytes...

     
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Re: Displaying a Large File
by jlongino (Parson) on Aug 31, 2001 at 23:42 UTC
    It might help if we knew a little more about how these three files are created, used, etc. Are these static files created from some other source? Maybe two being static and third appended to. Do you perform any type of updates on these files? Can you change the I/O routines that generate the files? The answers to these questions are are important when trying to tackle this type of problem.

    If the files are static, read them in once store them in a hash or array, then write them out to a new file using Storable::nstore or something similar. Reading them back in using Storable::retrieve is very fast. If you would like I could post some simple examples.

    If the files are dynamic, you might want to look into a full blown RDBMS as count0 suggests.

    @a=split??,'just lose the ego and get involved!';
    for(split??,'afqtw{|~'){print $a[ord($_)-97]}
Re: Displaying a Large File
by runrig (Abbot) on Aug 31, 2001 at 23:45 UTC
    If the file has just raw email addresses, and you have to mark it up before printing it (by adding <LI> tags or whatever), then you might save yourself a bit of time by marking up the whole file beforehand and using File::Copy instead of reading and printing the addresses line by line.

    Then again, as tachyon implies, it might not make a bit of difference.