getting a sequence of numbers and leters

shabird has asked for the wisdom of the Perl Monks concerning the following question:

I have a file in which i have the following line

BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019

I want to extract BOGUS PPI3_SYNY3 and "276 aa" using regular expression. my code is as follows

$content =~ /BOGUS\s+([A-Z0-9_]+)/; 
$bogus = $1;
$noOfAcids =~ /\s+([0-9\sa-z]$)/;
$acids = $1 ;
print("bogus: $bogus\nNumber of acids:$acids\n
[download]

but unfortunately both regex gives the same output.

OUTPUT

bogus: PPI3_SYNY3

Number of amino acids:PPI3_SYNY3

Comment on getting a sequence of numbers and leters Download Code

Replies are listed 'Best First'.
Re: getting a sequence of numbers and leters by choroba (Cardinal) on Mar 23, 2020 at 17:48 UTC
It should be doable using a single regex: `#!/usr/bin/perl use warnings; use strict; my $line = "BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019\n"; if (my ($bogus, $acids) = $line =~ /BOGUS\s+([A-Z0-9_]+) ([0-9]+ [a-z] ++)/) { print "bogus: $bogus\nNumber of acids: $acids\n" }` [download] Your code is unfortunately incomplete. How did you populate $noOfAcids? Also, $ in the second regex can't match after "aa", as the string doesn't end there. Note that $1 et al. are only changed after a successful match, so if the second regex doesn't match, the value of $1 keeps the value from the previous match. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re: getting a sequence of numbers and leters by johngg (Canon) on Mar 23, 2020 at 23:17 UTC
It might be easier to split the line on white space and slice out the relevant items. `johngg@shiraz:~/perl/Monks$ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<__EOD__ or die $!; BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019 __EOD__ chomp( my $line = <$inFH> ); close $inFH or die $!; my( $bogus, $acids ) = ( split m{\s+}, $line )[ 1, 2 ]; say qq{Bogus : $bogus\nNumber of amino acids : $acids} +;' Bogus : PPI3_SYNY3 Number of amino acids : 276` [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l]
Re^2: getting a sequence of numbers and leters by Marshall (Canon) on Mar 24, 2020 at 18:21 UTC
I work with many space separated files and I agree that split is often the easiest way to go. I would recommend using the character form of the "split on whitespace" in this situation because it has the property of ignoring leading whitespace, as I demo at Re: But I want null values in my array. This is just one of many, "yeah but's" and exceptions that I've discovered over time.	[reply]
Re: getting a sequence of numbers and leters by hippo (Bishop) on Mar 23, 2020 at 21:39 UTC
Your second regex never matches which is why both are the same. As stated, here's a potential solution. `use strict; use warnings; use Test::More tests => 2; my $content = 'BOGUS PPI3_SYNY3 276 aa linear BCT 13-NOV-2019'; $content =~ /^(BOGUS\s+\S+)/; is $1, 'BOGUS PPI3_SYNY3'; $content =~ /(\d+\s\w+) linear/; is $1, '276 aa';` [download]	[reply] [d/l]
Re^2: getting a sequence of numbers and leters by shabird (Sexton) on Mar 24, 2020 at 07:06 UTC
Thank you so much hippo it worked :)	[reply]
Re: getting a sequence of numbers and leters by BillKSmith (Monsignor) on Mar 23, 2020 at 22:59 UTC
The following code reads the content of a (simulated) file, extracts the two values and tests them. A regex is defined for each field. They are combined into a single regex which describes the rest of the line and specifies that both values should be extracted. If the match is successful, both values are tested. use strict; use warnings; use Test::Simple tests => 3; my $fname = \do { my $line = "BOGUS PPI3_SYNY3 276 aa contentar BCT 13-NOV-2019\n"; }; open my $FH, '<', $fname or die "Cannot open $fname for input"; my $content = <$FH>; close $FH; my $get_bogus = qr/ [A-Z0-9_]+ /x; my $get_acid = qr/ [0-9]+\s[a-z]+ /x; my ( $bogus, $acids ) = $content =~ / BOGUS \s+ ($get_bogus) \s+ ($get_acid) /x; if (ok( ( defined($bogus) and defined($acids) ), 'Match') ) { ok( $bogus eq "PPI3_SYNY3" , "extracted '$bogus'"); ok( $acids eq '276 aa', "extracted '$acids'"); } [download] OUTPUT: `1..3 ok 1 - Match ok 2 - extracted 'PPI3_SYNY3' ok 3 - extracted '276 aa'` [download] Bill	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.


We don't bite newbies here... much
	PerlMonks