punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Merry Monks,

I have a series of small text strings. Each contains a statement of a range of values, but in different forms:

1 - 5
3 to 24
6 thru 9
usually from 5 up to 12, but could be as much as 20 or more

I want to extract the numbers from the phrases so I can determine the max and min in the entire set of phrases. Making that determination is no problem, it's hw to extract the numbers.....

I can't always count on there being spaces between the numbers and the other text. A phrase could be mis-typed:

7-8

Thanks

Forget that fear of gravity,
Get a little savagery in your life.

  • Comment on extract numbers from unformatted text strings

Replies are listed 'Best First'.
Re: extract numbers from unformatted text strings
by punch_card_don (Curate) on May 23, 2005 at 16:39 UTC
    Wow - quick replies, thanks. In the meantime, I found:
    @myarray = ($mystring =~ m/(\d+)/g);

    Forget that fear of gravity,
    Get a little savagery in your life.

Re: extract numbers from unformatted text strings
by Forsaken (Friar) on May 23, 2005 at 16:39 UTC
    Update: more careful reading yielded that you *can't* count on there being spaces so solution one goes down the trash...

    Bad solution...ignore assuming for the moment that there will always be spaces after the first and before the next number, and that *no other* numericals will be in the string, you could match against /(\d+)\s.*\s(\d+)/ which will put the 2 numbers in $1 and $2 respectively.

    Or another alternative would be to simply match for /(d\+)/g which will extract all numbers from the string. It all depend on what they look like exactly and what kind of pits you have to avoid.


    Remember rule one...
Re: extract numbers from unformatted text strings
by TedPride (Priest) on May 23, 2005 at 17:12 UTC
    You can't count on knowing what the separator is going to be. It could be a string containing pretty much anything other than digits. Therefore:
    use strict; use warnings; my $min = 1000000; my $max = 0; while (<DATA>) { m/(\d+)\D+(\d+)/; $min = $1 if $1 < $min; $max = $2 if $2 > $max; } print "$min-$max"; __DATA__ 6-7 1 - 5 3 to 24 6 thru 9 2 through 11

      Nice, but don't forget the first rule of a capturing m// - test for success before using $1...

      while (<DATA>) { next unless m/(\d+)\D+(\d+)/; # or warn && next, or die $min = $1 if $1 < $min; $max = $2 if $2 > $max; }

      Also, rather than imposing an artificial limit on min (and max), what about addind a definedness check?

      $min = $1 if ( ! defined( $min ) || ( $1 < $min ) ); $max = $2 if ( ! defined( $max ) || ( $1 > $max ) );

      I'll grant that the since this doesn't handle negative numbers (yet) anyway, max is likely fine the way it is.

Re: extract numbers from unformatted text strings
by sh1tn (Priest) on May 23, 2005 at 16:36 UTC
    /\s*(\d+)\D+(\d+)/; print "min:$1\tmax:$2\n"


Re: extract numbers from unformatted text strings
by davidrw (Prior) on May 23, 2005 at 16:35 UTC
    You can always check for optional spaces where needed by using \s* to check for 0 or more whitescape characters:
    $s =~ /(\d+)(?:to|thru|up to|\s*-\s*)(\d+)/;

    What does your regex/string parsing code look like right now?
Re: extract numbers from unformatted text strings
by tlm (Prior) on May 24, 2005 at 03:50 UTC
    use List::Util qw( min max ); while ( <INPUT> ) { my @n = /(\d+)/g; warn "Bad line $_\n", next unless @n; my ( $min, $max ) = ( min( @n ), max( @n ) ); do_something_with( $min, $max ); }

    the lowliest monk