perlcapt has asked for the wisdom of the Perl Monks concerning the following question:

The data is simple enough, a record per line. But there may be some lines that aren't valid records. I want to read only the first valid record and the last valid record without processing all the lines in between to see if they are valid. (Some files have over 1-million lines).

I have an idea how I might do it:

  • read from the top until I get to the first valid record.
  • seek the end of the file minus some reasonable block of bytes (figured from lines and bytes/line).
  • read the block until I get to the last valid record
  • if there aren't any valid records, back up one more, block, etc.

    It doesn't appear hard, but you never know.. and besides, I much prefer to use code that is alread written (lazy boat bum). Anyone know of a module that has this in it or working code? Or improvements on the algorithm? perlcapt
    • Comment on Quick version of "first record" to "last record" ?
  • Replies are listed 'Best First'.
    Re: Quick version of "first record" to "last record" ?
    by ikegami (Patriarch) on Oct 15, 2004 at 19:32 UTC

      How about

      1. Open the file.
      2. Read until the first valid record is reached. That's your First Record.
      3. Reopen the file with File::ReadBackwards.
      4. Read until the first valid record is reached. That's your Last Record.

      I've never used the module, but someone who did once praised it here on Perl Monks. Also, the docs say it's "memory efficient", so it doesn't read the whole file into memory.

        Exactly what I'm looking for. CPAN has grow so big, it can take hours using the search engine to find specifically what your looking.. in cases like this, more time than to write a hack to solve the problem. We will always need human intelligence, experience, and memory -- hence Perl Monks.
    Re: Quick version of "first record" to "last record" ?
    by Anonymous Monk on Oct 16, 2004 at 17:29 UTC
      Here's how I would do it, since I don't have that module installed, and I'm lazy:
        perl -ne 'if (/valid/) { print; exit }' < data
        tac data | perl -ne 'if (/valid/) { print; exit }'
      
      "tac" is reverse "cat".
        I had never heard of 'tac' before. Way cool. It even exists on Cygwin! I'll be using the Module, but nonetheless adding tac to my *nix vocabulary. Thanks. -ben