in reply to Search all occurences of text delimited by START and END in a string

Hi natol44,

You can process your file simply line-by-line.

#!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; foreach (<DATA>) { chomp; print "$1\n" if /$start(.+)$end/; } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
Or you may use HTML::Parser if your html files are not well formatted.
  • Comment on Re: Search all occurences of text delimited by START and END in a string
  • Download Code

Replies are listed 'Best First'.
Re^2: Search all occurences of text delimited by START and END in a string
by kennethk (Abbot) on May 12, 2015 at 15:04 UTC
    While your answer is accurate to within the posted spec, for good form it's probably better to make it new line tolerant and to encourage people inlining variables into regexes to escape meta characters.
    #!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; my $data = do { local $/; <DATA>; }; while ($data =~ /\Q$start\E(.+?)\Q$end\E/sg) { print "$1\n"; } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
    If they are concerned about holding the whole file in memory, there is a convenient choice for record separator:
    #!/usr/bin/perl use strict; use warnings; use diagnostics; my $start = '<START>'; my $end = '<END>'; local $/ = $end; while (<DATA>) { while (/\Q$start\E(.+?)\Q$end\E/sg) { print "$1\n"; } } __DATA__ <START>TEXT1<END> <various data> <START>TEXT2<END> <various data> <START>TEXT3<END>
    where I've kept the regex as is since the last record will not be <END> delimited, and so there'd be a failure for an unmatched <START>

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.