kirkbrown has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

How would I read through multiple text files for the following text (between the #):

##################
RECORD: 4499
'Field: CSPAN'
'MSG# VA - 0001 - 02'
##################

Key things to note: The 'RECORD' value changes. The letters and numbers after 'MSG#' will vary. Also, I want to keep track of the last file number scanned, so I do not receive duplicates.

The final thing I want to do is put the resulting extracted text into a .csv file.

My Unix system does not have the latest modules, so I need to script everything from scratch.

  • Comment on Read files and convert extracted text to .csvc

Replies are listed 'Best First'.
Re: Read files and convert extracted text to .csvc
by cdarke (Prior) on May 22, 2010 at 14:06 UTC
    In addition to the advice from toolic, here is a further hint:
    local $/ = "##################\n";
    before you read the file. That will cause the read to transfer the entire block record, rather than a line at a time. It is documented in perlvar.
Re: Read files and convert extracted text to .csvc
by toolic (Bishop) on May 22, 2010 at 13:20 UTC
    Start by reading perlintro. It shows the basics of reading files and using regular expressions. Write your own code. If you have specific questions, post the code you have tried, along with expected vs. actual output.
    My Unix system does not have the latest modules, so I need to script everything from scratch.

    You may get some ideas from Yes, even you can use CPAN so that you don't have to (poorly) re-invent some wheel. Even if you find it difficult to properly download/install modules, it may be possible is to copy-n-pasting from the source code (with the proper acknowledgements).

Re: Read files and convert extracted text to .csvc
by graff (Chancellor) on May 22, 2010 at 14:43 UTC
    Are you saying that those lines of "######" are actually in the text files? If so, then the previous reply about setting the special "$/" variable will help you a lot. Or, if the actual text files use some other consistent pattern as the separator between 3-line records like the one you showed, just set $/ to that other pattern (look at the perlvar manual for more information about $/).

    If the files are just sequences of 3-line records with nothing else separating the sets, you'll need something like:

    my %record; my ( $id, $field ); while (<>) { chomp; if ( /^RECORD:\s+(\d+)/ ) { $id = $1; elsif ( /Field:\s/ ) { $field = $_; elsif ( /MSG#\s ) { $record{$id}{field} = $field; $record{$id}{msg} = $_; } } # now go through the %record set, output to csv...

    As for putting stuff into a csv file, Text::CSV will be a big help, and as indicated above, you can install your own copy of the module in your home directory. But maybe your unix system already has it -- try "perldoc Text::CSV" as a shell command, to see if the module there.

    (And most unix systems that support multiple users have at least one "sysadmin" person, who will often accept reasonable requests for installing non-core perl modules from CPAN. You do know how to contact your sysadmin, right?)

      ... if the actual text files use some other consistent pattern as the separator between 3-line records like the one you showed, just set $/ to that other pattern ... [Emphases added.]

      It's important to remember that $/ does not detect record separators based on a pattern (i.e., a regular expressison) match, but on an exact string match. See discussion in perlvar.