in reply to Re: Regex perplexity
in thread Regex perplexity

Thanks. I wasn't entirely clear on the input format, each file has a number of those data elements in it (i.e. the example is one file); which I need to extract by ID number. I have altered your code to this (where $id_flag is triggered once we come across the appropriate ID number):

sub FindPositions { my $string = $_[0]; my $id = $_[1]; my ($firstQ, $firstS); my ($lastQ, $lastS); my $id_flag; my $line; # pipe-ize the string my $string_pipe = new FileHandle("echo \'$string\' |") or die; while (!(defined($id_flag) && defined($firstQ) && defined($firstS))) { $line = <$string_pipe>; $id_flag = 1 if ($line =~ /<a name = $id>/); $firstQ = $1 if ($line =~ /^Query:\s+?(\d+)[\sgcat]*(\d+)/) && do{ +$lastQ=$2}; $firstS = $1 if ($line =~ /^Sbjct:\s+?(\d+)[\sgcat]*(\d+)/) && do{ +$lastS=$2}; } foreach $line (<$string_pipe>) { $lastQ = $2 if ($line =~ /^Query:\s+?(\d+)[\sgcat]*(\d+)/); $lastS = $2 if ($line =~ /^Sbjct:\s+?(\d+)[\sgcat]*(\d+)/); } return ($firstQ, $firstS, $lastQ, $lastS); }
But I'm not sure how to make it grab the appropriate ending values. As is it grabs the last ones in the file.

Thanks

Replies are listed 'Best First'.
Re: Re: Re: Regex perplexity
by eweaverp (Scribe) on Jun 30, 2003 at 22:30 UTC
    Nevermind; I'm dumb. I just add this line:
    # slice out the appropriate part of the string ($string) = $string =~ /(><a name = $id>.*?<\/pre>)/s;

    to the beginning of the subroutine and remove the $id_flag weirdness and it works great. Thanks for the example code, and the multi-loop approach. Apparently there _are_ some things a regex can't do!

    Cheers all,

    Evan
      And... that made me realize that this:
      sub FindPositions { my $string = $_[0]; my $id = $_[1]; # slice out the appropriate part of the string ($string) = $string =~ /(><a name = $id>.*?<\/pre>)/s; my @positions = $string =~ m/<a name = $id>.*?Query: (\d+).*?Sbjct: +(\d+).*?<\/pre>/s; push (@positions, $string =~ m/<a name = $id>.*Query: \d+\s+[a-z]+ ( +\d+)\n.*\nSbjct: \d+\s+[a-z]+ (\d+)\n<\/pre>/s); return @positions; }
      works too. First and last. Hmm. Anyway...