coltman has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I am new to PERL and hope you can help me with a regex question.

What I want is to find the largest integer in each string. The following are two example of the strings that I am currenly working on:

ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376

ASB C 134 PPKOREAK EFEAF 290

So, I want the number 1090 for the first string, 290 for the second. As I got thousands of such strings, I don't know how many integers there are in each string (some string could be extremely long with more than 100 integers in it.

I wonder if there is any way to read the integers one by one: get first integer as $var, then go to the second integer and compare that with $var, if larger then replace $var, otherwise, go ahead with the third integer, repeat until the last integer in the string.

I tried some thing like /\d+?/g, but it does not seem to be working. :(

I really appreciate it if you can help me out here.

BTW, to be a bit more complex, is there any way I can get the text before each integer using $'?
  • Comment on Help with regex, how to get the largest integer in a string?

Replies are listed 'Best First'.
Re: Help with regex, how to get the largest integer in a string?
by liverpole (Monsignor) on Apr 19, 2007 at 00:27 UTC
    Hi coltman,

    You can iterate over a given string using the /g (global) quantifier, as well as regex captures:

    use strict; use warnings; my $string = "ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 D +AFFEE 376"; my $max; # This is undefined until set while ($string =~ /(\d+)/g) { if (!defined($max) or $max < $1) { $max = $1; } } print "Max value is $max\n";

    Or you could make it into a subroutine (eg. max_val) like so:

    use strict; use warnings; my $string = "ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 D +AFFEE 376"; my $max = max_val($string); print "Max value is $max\n"; sub max_val { my $str = shift; my $max; while ($string =~ /(\d+)/g) { if (!defined($max) or $max < $1) { $max = $1; } } return $max; # This will be undefined if no integer found }

    And, of course, you can read more about regular expressions with perlretut and perlre.

    Update:  Oh, and perlrequick.  I always forget that one.

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: Help with regex, how to get the largest integer in a string?
by BrowserUk (Patriarch) on Apr 19, 2007 at 01:45 UTC

    Simple is usually fastest in perl.

    #! perl -slw use strict; use List::Util qw[ max ]; my %maxs; $maxs{ $. } = max m[(\d+)]g while <DATA>; print "line: $_ max: $maxs{ $_ }" for sort keys %maxs; __DATA__ ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 ASB C 134 PPKOREAK EFEAF 290 A 100 B 1000 C 2000 D 3000 E 4000 F 5000 G 6000 H 7000 I 8000 J 9000 +K 10000 L 100000 M 200000 N 2

    Output:

    c:\test>junk4 line: 1 max: 1090 line: 2 max: 290 line: 3 max: 200000

    Update: Using an array for output presentation, instead of a hash (as implied by ysth++ below):

    #! perl -slw use strict; use List::Util qw[ max ]; my @maxs; $maxs[ $. ] = max m[(\d+)]g while <DATA>; print "line: $_ max: $maxs[ $_ ]" for 1 .. $#maxs; __DATA__ ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 ASB C 134 PPKOREAK EFEAF 290 A 100 B 1000 C 2000 D 3000 E 4000 F 5000 G 6000 H 7000 I 8000 J 9000 +K 10000 L 100000 M 200000 N 2

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      A hash?!
Re: Help with regex, how to get the largest integer in a string?
by thezip (Vicar) on Apr 19, 2007 at 00:44 UTC

    Since There's always MTOWTDI (and I'm experimenting with look-ahead assertions):

    #!/perl/bin/perl -w use strict; use Data::Dumper; use List::Util qw/ max /; my $s = "ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE + 376"; # use a look-ahead assertion to split string into array of strings and + numbers # the corresponding string occupies the index immediately preceding th +e number my @arr = split /(\D+)(?=\d+)/, $s; # grep only for the numbers in the list, selecting the largest one print max(grep { /^\d+$/ } @arr), "\n"; __OUTPUT__ 1090

    Update: Corrected typo diligently found by liverpole++


    Where do you want *them* to go today?
      Just a question: What is the lookahead needed for?
      It seems to mee that
      my @arr = split /(\D+)/, $s;
      works as well.
      Max
        Whenever split() will do, I avoid regex. ++

        Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Help with regex, how to get the largest integer in a string?
by GrandFather (Saint) on Apr 19, 2007 at 00:52 UTC

    If you really want to do it with a regex you can:

    use strict; use warnings; use noname1; my $str = 'ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFF +EE 376'; my $result; my $biggest = 0; 1 while ($str =~ /(\w+)\s+(\d+)(?{if($2 > $biggest) {$result=[$1, $2]; + $biggest=$2}})/gx); print "$result->[0] $result->[1]";

    Prints:

    EFEAF 1090

    DWIM is Perl's answer to Gödel
Re: Help with regex, how to get the largest integer in a string?
by snoopy (Curate) on Apr 19, 2007 at 03:03 UTC
    BTW, to be a bit more complex, is there any way I can get the text before each integer using $'?

    Here's an attempt at the last part of your question..

    This solution uses split. By putting brackets around the regex, we can get it to return pairs of values - the integers and preceeding text.

    #!/usr/bin/perl use warnings; use strict; use List::Util; sub get_text_and_ints { my @pairs = split(/([+-]?\d+)/); my @text = (); my @integers = (); while (@pairs) { push (@text, shift (@pairs)); push (@integers, shift (@pairs)) if (@pairs); } return (\@text, \@integers); } # # Testing # foreach (<DATA>) { chomp; my ($text,$ints) = get_text_and_ints($_); my $max_int = List::Util::max (@$ints); print "/$_/ - has ".(defined $max_int? "maximum of $max_int":"no int +egers")."\n"; for (my $i = 0; $i < @$ints; $i++) { print " - integer $ints->[$i] is proceeded by /$text->[$i]/\n"; } } __DATA__ ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 3762 ASB C 134 PPKOREAK EFEAF 290 What Is The Sqrt Of -1? This Line Is Only Text 101 Dalmations 42

Re: Help with regex, how to get the largest integer in a string?
by McDarren (Abbot) on Apr 19, 2007 at 13:36 UTC
    Here is a fairly simple solution that nobody seems to have suggested. It processes the data line by line, pulling all the integers from each line into an array using a simple regex with the g quantifier, then reverse sorts the array, and finally prints the first element. The code is probably a bit more verbose than it needs to be, but that is deliberate as you mention that you are new to PERL Perl.
    #!/usr/bin/perl -w use strict; # Read each line one by one while (my $line = <DATA>) { chomp($line); # Dispense with trailing newline # Extract everything that looks like an integer into an array my @ints = $line =~ m/(\d+)/g; # Sort the array, highest to lowest my @sorted_ints = reverse sort { $a <=> $b } @ints; # Ouput the line, and the highest integer (1st element of the sort +ed array) print "DATA:$line\nHIGHEST INTEGER:$sorted_ints[0]\n"; } __DATA__ ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 ASB C 134 PPKOREAK EFEAF 290 BLAH 99 BLAH 123 FRED 27 BARNEY 427
    Output:
    DATA:ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 HIGHEST INTEGER:1090 DATA:ASB C 134 PPKOREAK EFEAF 290 HIGHEST INTEGER:290 DATA:BLAH 99 BLAH 123 FRED 27 BARNEY 427 HIGHEST INTEGER:427
    Hope this helps,
    Darren :)
Re: Help with regex, how to get the largest integer in a string?
by OfficeLinebacker (Chaplain) on Apr 19, 2007 at 02:02 UTC
    General comment:

    In some cases, don't we want to use c as well as g so as to advance our pos() within the string regardless of if we match?

    Re: $':

    I have read that using the regexp variables in that family is really inefficient and likely best avoided when possible.

    Code:

    Assuming a) that there are spaces separating all groups of consecutive digits or letters and b) each group consists only of either digits or letters and c) (crucial for my implementation of the 'more complex' problem), digit groups are never consecutive (meaning, you'll never see a pattern like QKLJB 9234 KJLH 324 9874 in the data). Alternately, if there are consecutive integers, you only care about the most recent letter group (even if it was five groups ago). Also, you only care about the last group of letters before each integer :)

    #!blah #untested! #use strict, etc. my $bs; #your big string--how you get the data in there is up 2 u my ($lms, $lmn); #last matched string,num my $max = 0;# biggest while ($bs !~ m/\G\z/ms){ if ($bs =~ m/\G([[:upper:]]+)\s*/g){#[:alpha:],whatever $lms = $1; }elsif ($bs =~ m/\G(\d+)\s*/g){#[:alpha:],whatever #up to the closing brace before else, #$lms corresponds to the text before this number $lmn = $1; if ($lmn > $max){ #at this point, $lms is the text before the biggest int so far $max = $lmn; } }else{ die "unexpected data near character".pos(); } }

    I like computer programming because it's like Legos for the mind.
Re: Help with regex, how to get the largest integer in a string?
by QM (Parson) on Apr 19, 2007 at 18:10 UTC
    Quick attempt at simple answer...

    On each line, grab all non-zero integers, sort, return the highest.

    print +(sort {$b <=> $a} grep /^\d+$/, split)[0] while <>;

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Help with regex, how to get the largest integer in a string?
by johngg (Canon) on Apr 19, 2007 at 23:15 UTC
    This uses a descending numerical sort and grabs the first element to get the largest integer. It also keeps the associated text that goes before the integer.

    my $rxPair = qr{(?x) ( [A-Z]+ \s+ \d+ ) }; print map { qq{$_->[0]\n @{ $_->[1] }\n} } map { my $raLine = $_; my $raMax = ( sort {$b->[1] <=> $a->[1] } @{ $raLine->[1] } )[0]; [ $raLine->[0], $raMax ] } map { chomp; [ $_, [ map { [ split ] } m{$rxPair}g ] ] } <DATA>; __END__ ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 ASB C 134 PPKOREAK EFEAF 290 BLAH 99 BLAH 123 FRED 27 BARNEY 427

    Here's the output

    ASBSDEC 34 GADVVEEVEETTE 56 IOEOREAK GKJEOG EFEAF 1090 DAFFEE 376 EFEAF 1090 ASB C 134 PPKOREAK EFEAF 290 EFEAF 290 BLAH 99 BLAH 123 FRED 27 BARNEY 427 BARNEY 427

    A little late but never mind.

    Cheers,

    JohnGG