yoda54 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks,

How does one break out a segment of lines if it doesnt match a certain criteria?

For example, I've got a text file like this:

START

important data

important data

END

START

Nothing interesting here, skip the rest of this start and end.

STOP

START

important data - more cool stuff continue here

important data - ditto

END

As I'm pulling data in between the start and end blocks, if I find nothing I need for a particular start/end range then how can I just skip entirely and start on the next START segment?

Thanks for any help!!

while(<F>) { chomp; if (/^<START/.../^<\/END/) { if ($_ !~ m/cool data/gi) { break this current START ... END block } } }

Replies are listed 'Best First'.
Re: Breaking out of a matching block
by ikegami (Patriarch) on Feb 09, 2010 at 07:31 UTC
    Something like
    my $abort = 0; while (<$fh>) { chomp; if ($_ eq 'START' .. $_ eq 'END') { if ($abort) { $abort = 0 if substr($_, -2) eq 'E0'; next; } ... if (!/cool data/gi) { $abort = 1; next; } ... } }

    Of course, it would be much simpler, safer, robust and clearer if you used an XML/HTML parser since you appear to be trying to parse XML/HTML.

Re: Breaking out of a matching block
by Marshall (Canon) on Feb 09, 2010 at 05:44 UTC
    First, I would suggest reading this: Flipin good, or a total flop?, an excellent node by GrandFather.

    As to what you hope to save, there isn't anything that just magically skips a bunch of input lines from the file. Each line has to be read sequentially from the input and this I/O operation is typically the resource intensive "expensive" part. The range operation has a "flip/flop" state that says whether you are within the range or not.

    Its not clear to me how you decide what is "cool data" or not? or maybe more to the point how you know all the "cool" data has happened and no more is left in that block?

    It could be that you need a different "END" regex? Any valid regex can go there, maybe you need to "end" on either START or END tokens? Or have some term that factors in "all cool data processed"? That would cause range op to start looking for the next START token.

    The performance is unlikely to become significantly faster because the "expensive" part is the I/O. Perl regex is very fast. Anyway instead of thinking "break", think about what ending regex would be more appropriate.

    Update: It could be that a more traditional parsing would work better instead of the regex range operation for this application.

    while (<$fh>) { process_block() if /^START/; } sub process_block { while (<$fh>) { return() if /^END/; return() if "data isn't cool anymore"; #pseudo code ..process cool data.. #pseudo code } }