I came across a situation similar to this while working on a project. It is not a large project, but the program involved will be kicked off every 5 minutes. To my dismay, the "integrity" of the input files is not of the highest standards, but then they do originate from a governmental source.

The program opens several files, and is to grab the text between header/footer tags (they're not html) including the header/footer. I thought that using the range operator would be perfect but that's when I got hit by the "integrity" factor. Apparently it is not uncommon for a footer to be left off when it is the last section in a file. So much for elegance.

My question is, is there a clever/elegant solution to this problem (i.e., slight modification to the range statement) or will it require a more brute force approach? I'm not asking that anyone rewrite the entire program to make it work, I can do that myself. It seems to me that this must be a common problem and that the use of the range operator as I've done is too fragile for any but the most controlled circumstances.

file2.txt:
FILE2 AAA text1 text2 text3 EOAAA BBB text4 EOBBB CCC text5 text6 text7 text8 EOCCC
file1.txt:
FILE1 SEGMENT1 text1 text2 text3 EOS1 SEGMENT2 text4 EOS2 SEGMENT3 text5 text6 text7 text8
file3.txt:
FILE3 P201 text1 text2 text3 EOP201 P333 text4 EOP333 P588 text5 text6 text7 text8 EOP588
Program:
use strict; my @jobs = ( 'file2.txt|AAA|EOAAA', 'file1.txt|SEGMENT3|EOS3', 'file3.txt|P333|EOP333' ); for (@jobs) { my ($file, $beg, $end) = split /\|/; my $first_line = 1; my @lines = (); print "Opening file: '$file'\n", " beg: '$beg' end: '$end'\n"; open (INFILE, "<$file") or die "Could not open '$file': $!\n"; while (<INFILE>) { if (/$beg/ .. /$end/) { chomp; print " first: '$_'\n\n" if $first_line; $first_line = 0; push (@lines, $_); } } print "$_\n" for @lines; print "-" x 30, "\n\n"; undef @lines; }
Output:
Opening file: 'file2.txt' beg: 'AAA' end: 'EOAAA' first: 'AAA' AAA text1 text2 text3 EOAAA ------------------------------ Opening file: 'file1.txt' beg: 'SEGMENT3' end: 'EOS3' first: 'SEGMENT3' SEGMENT3 text5 text6 text7 text8 ------------------------------ Opening file: 'file3.txt' beg: 'P333' end: 'EOP333' first: 'FILE3' FILE3 P201 text1 text2 text3 EOP201 P333 text4 EOP333 ------------------------------
I tried what seemed like reasonable approaches, but none of them worked, such as the following and variations thereof: if ((/$beg/ .. /$end/) or (/$beg/ .. eof())) { Thanks much,

--Jim


In reply to Between-text range operator problem by jlongino

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.