Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

getting a sequence of numbers and leters

by shabird (Sexton)
on Mar 23, 2020 at 17:36 UTC ( [id://11114571]=perlquestion: print w/replies, xml ) Need Help??

shabird has asked for the wisdom of the Perl Monks concerning the following question:

I have a file in which i have the following line

BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019

I want to extract BOGUS PPI3_SYNY3 and "276 aa" using regular expression. my code is as follows

$content =~ /BOGUS\s+([A-Z0-9_]+)/; $bogus = $1; $noOfAcids =~ /\s+([0-9\sa-z]$)/; $acids = $1 ; print("bogus: $bogus\nNumber of acids:$acids\n

but unfortunately both regex gives the same output.

OUTPUT

bogus: PPI3_SYNY3

Number of amino acids:PPI3_SYNY3

Replies are listed 'Best First'.
Re: getting a sequence of numbers and leters
by choroba (Cardinal) on Mar 23, 2020 at 17:48 UTC
    It should be doable using a single regex:
    #!/usr/bin/perl use warnings; use strict; my $line = "BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019\n"; if (my ($bogus, $acids) = $line =~ /BOGUS\s+([A-Z0-9_]+) ([0-9]+ [a-z] ++)/) { print "bogus: $bogus\nNumber of acids: $acids\n" }

    Your code is unfortunately incomplete. How did you populate $noOfAcids? Also, $ in the second regex can't match after "aa", as the string doesn't end there. Note that $1 et al. are only changed after a successful match, so if the second regex doesn't match, the value of $1 keeps the value from the previous match.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: getting a sequence of numbers and leters
by johngg (Canon) on Mar 23, 2020 at 23:17 UTC

    It might be easier to split the line on white space and slice out the relevant items.

    johngg@shiraz:~/perl/Monks$ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<__EOD__ or die $!; BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019 __EOD__ chomp( my $line = <$inFH> ); close $inFH or die $!; my( $bogus, $acids ) = ( split m{\s+}, $line )[ 1, 2 ]; say qq{Bogus : $bogus\nNumber of amino acids : $acids} +;' Bogus : PPI3_SYNY3 Number of amino acids : 276

    I hope this is helpful.

    Cheers,

    JohnGG

      I work with many space separated files and I agree that split is often the easiest way to go. I would recommend using the character form of the "split on whitespace" in this situation because it has the property of ignoring leading whitespace, as I demo at Re: But I want null values in my array. This is just one of many, "yeah but's" and exceptions that I've discovered over time.
Re: getting a sequence of numbers and leters
by hippo (Bishop) on Mar 23, 2020 at 21:39 UTC

    Your second regex never matches which is why both are the same. As stated, here's a potential solution.

    use strict; use warnings; use Test::More tests => 2; my $content = 'BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019'; $content =~ /^(BOGUS\s+\S+)/; is $1, 'BOGUS PPI3_SYNY3'; $content =~ /(\d+\s\w+) linear/; is $1, '276 aa';

      Thank you so much hippo it worked :)

Re: getting a sequence of numbers and leters
by BillKSmith (Monsignor) on Mar 23, 2020 at 22:59 UTC
    The following code reads the content of a (simulated) file, extracts the two values and tests them. A regex is defined for each field. They are combined into a single regex which describes the rest of the line and specifies that both values should be extracted. If the match is successful, both values are tested.
    use strict; use warnings; use Test::Simple tests => 3; my $fname = \do { my $line = "BOGUS PPI3_SYNY3 276 aa contentar BCT 13-NOV-2019\n"; }; open my $FH, '<', $fname or die "Cannot open $fname for input"; my $content = <$FH>; close $FH; my $get_bogus = qr/ [A-Z0-9_]+ /x; my $get_acid = qr/ [0-9]+\s[a-z]+ /x; my ( $bogus, $acids ) = $content =~ / BOGUS \s+ ($get_bogus) \s+ ($get_acid) /x; if (ok( ( defined($bogus) and defined($acids) ), 'Match') ) { ok( $bogus eq "PPI3_SYNY3" , "extracted '$bogus'"); ok( $acids eq '276 aa', "extracted '$acids'"); }

    OUTPUT:

    1..3 ok 1 - Match ok 2 - extracted 'PPI3_SYNY3' ok 3 - extracted '276 aa'
    Bill
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11114571]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-19 20:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found