Re: Regex perplexity

Needs a little more than a regex, I think. Here's my take on it:

#!/usr/bin/perl -w
 
use strict;
my ($firstQ, $firstS);
my ($lastQ, $lastS);
 
while (!(defined($firstQ) && defined($firstS)))
{
        $_=<>;
        $firstQ=$1 if(/^Query:\s+?(\d+)[\sgcat]*(\d+)/) && do{$lastQ=$
+2};
        $firstS=$1 if(/^Sbjct:\s+?(\d+)[\sgcat]*(\d+)/) && do{$lastS=$
+2};
}
while(<>)
{
        $lastQ=$2 if (/^Query:\s+?(\d+)[\sgcat]*(\d+)/);
        $lastS=$2 if (/^Sbjct:\s+?(\d+)[\sgcat]*(\d+)/);
}
print "First:\t$firstQ\t$firstS\n";
print "Last:\t$lastQ\t$lastS\n";


## Output of large dataset:
#############
First:  11      1
Last:   370     360

## Output of small dataset:
#############
First:  249     64265
Last:   271     64243
[download]

Code works with both a single data item or multiple data items spread across several lines. Now you can make the regexes in the second while loop more efficient by taking out the first capturing parentheses but I wanted to keep them the same as the first while loop so you could see the symmetry and what I was doing with the regex.

HTH

Comment on Re: Regex perplexity Download Code

Replies are listed 'Best First'.
Re: Re: Regex perplexity by eweaverp (Scribe) on Jun 30, 2003 at 22:24 UTC
Thanks. I wasn't entirely clear on the input format, each file has a number of those data elements in it (i.e. the example is one file); which I need to extract by ID number. I have altered your code to this (where $id_flag is triggered once we come across the appropriate ID number): sub FindPositions { my $string = $_[0]; my $id = $_[1]; my ($firstQ, $firstS); my ($lastQ, $lastS); my $id_flag; my $line; # pipe-ize the string my $string_pipe = new FileHandle("echo \'$string\' \|") or die; while (!(defined($id_flag) && defined($firstQ) && defined($firstS))) { $line = <$string_pipe>; $id_flag = 1 if ($line =~ /<a name = $id>/); $firstQ = $1 if ($line =~ /^Query:\s+?(\d+)[\sgcat](\d+)/) && do{ +$lastQ=$2}; $firstS = $1 if ($line =~ /^Sbjct:\s+?(\d+)[\sgcat](\d+)/) && do{ +$lastS=$2}; } foreach $line (<$string_pipe>) { $lastQ = $2 if ($line =~ /^Query:\s+?(\d+)[\sgcat](\d+)/); $lastS = $2 if ($line =~ /^Sbjct:\s+?(\d+)[\sgcat](\d+)/); } return ($firstQ, $firstS, $lastQ, $lastS); } [download] But I'm not sure how to make it grab the appropriate ending values. As is it grabs the last ones in the file. Thanks	[reply] [d/l]
Re: Re: Re: Regex perplexity by eweaverp (Scribe) on Jun 30, 2003 at 22:30 UTC
Nevermind; I'm dumb. I just add this line: `# slice out the appropriate part of the string ($string) = $string =~ /(><a name = $id>.*?<\/pre>)/s;` [download] to the beginning of the subroutine and remove the $id_flag weirdness and it works great. Thanks for the example code, and the multi-loop approach. Apparently there _are_ some things a regex can't do! Cheers all, Evan	[reply] [d/l]
Re: Re: Re: Re: Regex perplexity by eweaverp (Scribe) on Jun 30, 2003 at 22:50 UTC
And... that made me realize that this: `sub FindPositions { my $string = $_[0]; my $id = $_[1]; # slice out the appropriate part of the string ($string) = $string =~ /(><a name = $id>.?<\/pre>)/s; my @positions = $string =~ m/<a name = $id>.?Query: (\d+).?Sbjct: +(\d+).?<\/pre>/s; push (@positions, $string =~ m/<a name = $id>.Query: \d+\s+[a-z]+ ( +\d+)\n.\nSbjct: \d+\s+[a-z]+ (\d+)\n<\/pre>/s); return @positions; }` [download] works too. First and last. Hmm. Anyway...	[reply] [d/l]