Ronnie has asked for the wisdom of the Perl Monks concerning the following question:

Well Bretheren I've got a small question to ask. I've written a Perl script which serially reads a file and extracts all the error records and outputs them to another file. No problem there! Except...... The input file was created elsewhere using micro-cobol (at a guess) and unfortunately there is some sloppy programming therein! The problem with this file is that when the last record is written to a page it is concatenated with the header details for the next page. In between these records is a hard coded page throw (^L). If the last record on a page is an error record, Houston we have a problem! We end up with the error record and a header record in the output error file. Does anyone have any suggestions on how I can circumvent this? I have recommended that the suppliers change their shoddy code but it's unlikely that they will co-operate. Help! Cheers in anticipation, Ronnie Cruickshank (No vice!)

Replies are listed 'Best First'.
Re: Unprintable Characters
by BrowserUk (Patriarch) on Jun 23, 2004 at 15:44 UTC

    Would running the file through a simple filter to convert the "\cL"s to "\n" not solve the problem?

    perl -ple" s[\cL][\n]g" < file > modified

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      Thanks for that. The c in "\cL" was the bit (no pun intended) that I didn't know about! Once you gave me that information all of my previous attempts to resolve this now work. Cheers, Ronnie
Re: Unprintable Characters
by graff (Chancellor) on Jun 24, 2004 at 02:51 UTC
    BrowserUK's idea would probably suffice. Another way would be to set the INPUT_RECORD_SEPARATOR variable ($/) to "\xC", which would allow you to read one "page" at a time into $_, instead of just one text line at a time. Then you could split the page record into lines, if that's what you need:
    { local $/ = "\x0C"; # form-feed character (^L) while (<>) { # $_ contains one whole "page" @lines = split /\n/; # split into lines if you need to ... } } # $/ is now back to its "normal" setting