in reply to regex with variable input
Collapse the three lines of declaration into one line:
Arrays are by default initialized as empty arrays, so you don't need to set them to (). <em$R_score is a scalar, not an array, so by default it is undef, which acts like the empty string if you try to concatenate to it; but you should initialize it to "" if it might be read before it is assigned.my ( $R_score, @protein, @R_score_protein );
In any case, you don't need $R_score, since you only use it to transfer a value from $1 into @R_score_protein. Delete the variable and assign $1</em, into the array.
Does 'strict' not complain about that unquoted string? In any case, get that URL out of the code and into a configuration variable at the beggining of the file. Better yet, provide for the user to override the URl at run time:
use Getopt::Long; our URL = "http://www.bork.embl-heidelberg.de/g2d/" . "list_hits_disease.pl?U57042:Inflammatory_bowel_disease_7"; GetOptions( 'url=s', \$URL ) || die ( "Error reading command line args +" ); my $browser = LWP::UserAgent->new(); my $resp = $browser->get( $URL );
if you can process proteins and R scores separately:
Better yet, you could move the regular expressions into 'local' variables or 'configuration' variables:while ( $content_all =~ m{R\-score<\/A>\s=\s(\d\.\d+)\;}g ) { push( @R_score_protein, $1 ); } while ( $content_all =~ m{\[(NP_\d+)\]}g ) { push @protein, $1; }
Tidier, more readable, more maintainable.my $protein_re = qr|\[(NP_\d+)\]|; my $score_re = qr|R-score</A>\s=\s(\d\.\d+)|; while ( $content_all =~ m{$protein_re} ) ..... while ( $content_all =~ m{$score_re} ) ....
I suspect you want to associate the protein codes and the R scores, which the version above doesn't provide. You need to break the input into distinct lines, which I'll arrange to have happen magically in the break_into_lines() routine:
At this point, @records contains a bunch of records, which can be printed neatly, or processed ...# $record stores references to hashes, # which are accumulated in @records # my ($record, @records); for my $line ( break_into_lines( $resp->content() ) ) { if ( $line =~ /CANDIDATE/ ) { # # Save previous record, if there is one # if ( defined $record ) { push( @records, $record ); } $records = {}; # assign a new anon. hash. } # In the hash referenced by $record, use the 'protein' key # to access an array into which the $protein is/are added. # while ( my ( $protein ) = ( $line =~ /$protein_re/g ) ) { push( @{ $record->{'protein'} }, $protein ); } if ( my ( $score ) = ( $line =~ /$score_re/g ) ) { push( @{ $record->{'score'} }, $score ); } }
for my $records ( @records ) { print "Proteins: ", join( ", ", @{ $record->{'protein'} } ), "\n" if ( scalar @{ $record->{'protein'} } ); print "Score: ", $record->{'score'}, "\n"; print "-" x 72; }
--
TTTATCGGTCGTTATATAGATGTTTGCA
|
|---|