http://qs1969.pair.com?node_id=495544

gasho has asked for the wisdom of the Perl Monks concerning the following question:

I have file that is one long line. Pattern looks like: somecharSTARTfvENDsomecharSTARTsvENDsomecharSTARTtvENDsomechar I would like to extract all values between START END so my @Result should be @Result= qw(fv sv tv) My sub is extracting only first and last finding :(
sub getInfoFromLongLine { #Openning file for reading open(IFH,"$InputFile") || die "Can't open file: $InputFile\n"; while($line=<IFH>) { if ($line=~/.*START(.*?)END/) { push(@Result,$1); } if ($line=~/.*?START(.*?)END/) { push(@Result,$1); } } return @Result; }
Thanks in advance Gasho

Replies are listed 'Best First'.
Re: How to deal with long line
by philcrow (Priest) on Sep 27, 2005 at 21:04 UTC
    This is an ideal case for extract_tagged method of Text::Balanced. From the perldoc, I think you should say something like:
    my $remainder = $original_string; my @answers; while ( defined $remainder ) { my $extracted; ( $extracted, $remainder ) = extract_tagged( $remainder, 'START', 'END', undef, { bad => ['START'] }, ); push @answers, $extracted; }

    Phil

Re: How to deal with long line
by GrandFather (Saint) on Sep 27, 2005 at 20:50 UTC

    If the file is of modest size then read eveything into a string and:

    my $str = <DATA>; my @strs = $str =~ /START(.*?)END/g; print join "\n", @strs; __DATA__ somecharSTARTfvENDsomecharSTARTsvENDsomecharSTARTtvENDsomechar
    Update: Change to read from a "file"

    Monarch has suggested to me that this is likely to get pretty slow with HUGE files, and I agree. For very large files I'd suggest using this technique.


    Perl is Huffman encoded by design.
      Thanks a lot it worked fine, I updated it to read from a file:
      sub getInfoFromLongLine { #Openning file for reading open(IFH,"$InputFile") || die "Can't open file: $InputFile\n"; my $str = <IFH>; my @wanted_substrings = $str =~ /<A>(.*?)<\/A>/g; return "@wanted_substrings"; }
Re: How to deal with long line
by chester (Hermit) on Sep 27, 2005 at 20:47 UTC
    Here's one way that reads in the string a little bit at a time, which is good if your data is very large. Mostly untested, but works for your data:

    use warnings; use strict; my @wanted_substrings; { local $/ = 'END'; while(my $string = <DATA>) { push @wanted_substrings, ($string =~ /START(.*)END/); } } print "@wanted_substrings"; __DATA__ somecharSTARTfvENDsomecharSTARTsvENDsomecharSTARTtvENDsomechar