Baz has asked for the wisdom of the Perl Monks concerning the following question:

I have some text which contains telephone numbers in the following format:
Tel: 06X YYY YYYY
All im intesested in is the extention part, i.e. the 06X part. I need to make a list of the frequency of each extention type - which are 060....069 in the piece of text. The general regex for getting the X value of the extention is:
$largeStr =~ /Tel: 06(\d+)/;
Then $1 dictates the extention type. But im not sure how this works in the context of $largeStr containing a more than 1 telephone number. How can I find the frequency of the 10 extention types in $largeStr

Replies are listed 'Best First'.
Re: Regex: plucking numbers from a large string
by RMGir (Prior) on May 01, 2002 at 18:15 UTC
    Offhand, I'd say this calls for //g.
    my @counts; # since there are only 10 possible values, all # digits, why use a hash? while($largeStr=~/Tel: 06(\d)/g) { $counts[$1]++ }
    I removed the "+" from after the \d, since you really only expect one digit for your extensions, right?

    After the loop is done, the @counts array should have the answers you're looking for.
    --
    Mike

    Edit: Wow, foreach doesn't work, but while does. Wierd...

Re: Regex: plucking numbers from a large string
by broquaint (Abbot) on May 01, 2002 at 18:29 UTC
    You could always use the funky regex eval features

    my @exts;
    $largeStr =~ /Tel: 06(\d+)(?{push @exts, $1})/g;

    Or the ever handy \G zero-width assertion

    my @exts; push @exts, $1 while $largeStr =~ /\GTel: 06(\d+)/g; # or better yet my %exts; $exts{$1}++ while $largeStr =~ /\GTel: 06(\d+)/g;
    Then getting the frequency is just a matter of looping through the keys of %exts
    print qq[found "$_" $exts{$_} times],$/ for sort keys %exts;

    HTH

    _________
    broquaint

    update: removed first suggestion as it doesn't seem to work as I expected :-/

      Or the ever handy \G zero-width assertion

      Which is great if your data is "Tel: 061Tel: 062Tel: 063", 'cause you'd have to use something to match stuff in between, and there's probably a better solution to this than using .*.

      You can use m//g in list context, and get a list of matches (or a list of captures if you use them):

      my @extensions = $large_string =~ /Tel: 06(\d+)/g; my %extension; $extension{$_}++ for @extensions;
      If you don't need the list, you can of course use the match itself as for's expression.

      - Yes, I reinvent wheels.
      - Spam: Visit eurotraQ.
      

        If you don't need the list, you can of course use the match itself as for's expression.

        I thought so, too :(

        $ perl -e'$x="1 2 3 4 5 5 5 5 5 5"; $counts[$1]++ for $x=~/(\d)/g; pri +nt "$_ $c ounts[$_]\n" foreach (0..$#counts)' 0 1 2 3 4 5 10
        How's that for a wierd problem?

        Even stranger, if you s/for/while/:

        $ perl -e'$x="1 2 3 4 5 5 5 5 5 5"; $counts[$1]++ while $x=~/(\d)/g; p +rint "$_ $counts[$_]\n" foreach (0..$#counts)' 0 1 1 2 1 3 1 4 1 5 6
        This is with 5.6.1.

        Ignore me; it makes sense that $1 would be the last value with a for loop. $_ works fine.

        $ perl -e'$x="1 2 3 4 5 5 5 5 5 5"; $counts[$_]++ for $x=~/(\d)/g; pri +nt "$_ $counts[$_]\n" foreach (0..$#counts)' 0 1 1 2 1 3 1 4 1 5 6
        With a while, you have to use $1; that's confused me.
        $ perl -e'$x="1 2 3 4 5 5 5 5 5 5"; $counts[$1]++ while $x=~/(\d)/g; p +rint "$_ $counts[$_]\n" foreach (0..$#counts)' 0 1 1 2 1 3 1 4 1 5 6

        --
        Mike
Re: Regex: plucking numbers from a large string
by abaxaba (Hermit) on May 01, 2002 at 18:27 UTC
    (@extensions) = $largeStr =~ /Tel:\s(06\d)/g;
    ÅßÅ×ÅßÅ
    "It is a very mixed blessing to be brought back from the dead." -- Kurt Vonnegut
Re: Regex: plucking numbers from a large string
by mephit (Scribe) on May 01, 2002 at 21:37 UTC
    my %hash; hash{$_}++ foreach ($largeStr =~ /Tel: (06\d+)/g);
    That'll count the "extensions" for ya.