neilh has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have a string of numbers that I need to do some work over. The string looks like
000000000000022102840002210284
This is three groups of 10 digits, each group will have leading zeroes to pad it to 10 digits. The first group will almost always be all zero, but not always.
I have tried the following line, but it matches the whole group:
$split[1]=~s/(\d\d\d\d\d\d\d\d\d\d)(\d\d\d\d\d\d\d\d\d\d\d)(\d\d\d\d\d +\d\d\d\d\d\d)/$1/;
Can anyone suggest an alternative regexp to obtain the three groups of 10 digits?

Thanks
Neil

Replies are listed 'Best First'.
Re: Regexp to extract groups of numbers
by BrowserUk (Patriarch) on Feb 07, 2005 at 03:13 UTC

    You are using the wrong tool for breaking up fixed width fields.

    $s = '000000000000022102840002210284'; print unpack '(a10)*', $s; 0000000000 0002210284 0002210284

    If there is any doubt that the data will only contain digits, verify that with $s =~ m[^\d+$]; first.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Regexp to extract groups of numbers
by davido (Cardinal) on Feb 07, 2005 at 03:17 UTC

    Fixed width formatted data is easily handled by unpack. Here's an example where the string is treated as three ten-digit strings and divided thataway using unpack.

    my $string = q/000000000000022102840002210284/; my( @numbers ) = unpack "a10a10a10", $string; print "$_\n" foreach @numbers;

    As for using a regexp, here's a more concise one:

    my $string = q/000000000000022102840002210284/; my( @numbers ) = $string =~ m/(\d{10})(\d{10})(\d{10})/; print "$_\n" foreach @numbers;

    Enjoy!


    Dave

Re: Regexp to extract groups of numbers
by Zaxo (Archbishop) on Feb 07, 2005 at 03:20 UTC

    Your regex is fine, though it could be said more economically. The problem is that you don't assign the results of a match to a list or array, my @nums = $split[1] =~ /^(\d{10})(\d{10})(\d{10})$/; Your substitution is throwing away the second two numbers.

    Since you have fixed-width fields and no seperators, unpack seems natural. my @nums = unpack 'a10 a10 a10', $split[1]; You know the format of the fields, so you really shouldn't have to worry about whether they are digits (Assuming you don't have to detect bogus data).

    After Compline,
    Zaxo

Re: Regexp to extract groups of numbers
by nedals (Deacon) on Feb 07, 2005 at 05:29 UTC
    A real easy regex.
    $_ = "000000000000022102840002210284"; my @nums = (/\d{10}/g);
      I got to ask, why the parenthesis?
Re: Regexp to extract groups of numbers
by neilh (Pilgrim) on Feb 07, 2005 at 03:21 UTC
    My thanks to everyone for the advice on unpack. The perldoc is in front of me as I speak.

    Now back to my corner
    Neil

      Another comment...

      When deciding between the use of unpack versus a regular expression with capturing, there isn't exactly a hard fast rule. But a good rule of thumb is this: If you're picking things out based on character position, use unpack, and if you're picking things out based on a sequence or pattern, use pattern matching (a regular expression). As it happens, regular expressions are flexible enough to handle both width-formatted data and pattern formatted data, but sometimes it's just better to use an actual screwdriver rather than the funky one that's part of a 200-tool swiss army knife. ;)

      Oh, and split is a whole other option, most ideally suited for pattern-delimited data.


      Dave

Re: Regexp to extract groups of numbers
by hubb0r (Pilgrim) on Feb 07, 2005 at 04:37 UTC
    Not the most elegant example, but haven't seen an answer using it yet:
    my $var|= '000000000000022102840002210284'; my $length = 10; my @nums; push(@nums, substr($var, 0, $length, '')) for (1 .. length($var) / $le +ngth); print join("\t",@nums);
    edited to put length in variable so that it's easier to change universally.
Re: Regexp to extract groups of numbers
by gube (Parson) on Feb 07, 2005 at 05:02 UTC

    Hai, Easy in Regexp try

    $str = '000000000000022102840002210284'; ($a,$b,$c)=$str =~ m#(\d{10})(\d{10})(\d{10})#gsi; print "$a\t$b\t$c"; or $s = '000000000000022102840002210284'; ($a,$b,$c) = unpack '(a10)*', $s; print "$a\t$b\t$c";

    Regards,
    Gubendran.L
Re: Regexp to extract groups of numbers
by DrHyde (Prior) on Feb 07, 2005 at 10:43 UTC
    my $pie = 123456789; my @pies = ($pie =~ /(\d{3})/g); print join("\n", @pies);
    However, I would prefer to use the substr() function thus:
    $pie = 123456789; my $index = 0; while($index < length($pie)) { print substr($pie, $index, 3)."\n"; $index += 3; }
    I choose substr() over unpack() because pack() and unpack() make my brane hurt.
      I would not use substr, but if I were, I wouldn't use a while loop like you did - but I'd use a for:
      for (my($i, $s) = (0, 3); $i < length($pie); $i += $s) { print substr($pie, $i, $s), "\n"; }

        Except that three arg for loops are just while loops in disguise. :-)

        Pretty much anyway.

        ---
        demerphq

Re: Regexp to extract groups of numbers
by sh1tn (Priest) on Feb 08, 2005 at 00:40 UTC
    In hash structure:
    $_ = "000000000000022102840002210284"; $nums{++$c} = $1 while /(\d{10})/g; print "$_ $nums{$_}$/" for sort keys %nums __END__ 1 0000000000 3 0002210284 2 0002210284
Re: Regexp to extract groups of numbers
by Anonymous Monk on Feb 08, 2005 at 18:59 UTC
    #!/usr/bin/perl
    ###########################################
    use warnings;
    use strict;

    my $test = "000000000000022102840002210284" ;
    print "bad 30 digit number!\n\n" unless $test =~ /^\d{30}$/ ;
    $test =~ /^(\d{10})(\d{10})(\d{10})/ ;
    print "1 = $1\n2 = $2\n3 = $3\n";
      The 30-digit test you run is implied if you simply check the results of your capturing regex:
      #!/usr/bin/perl ########################################### use warnings; use strict; my $test = "000000000000022102840002210284"; if ( $test =~ /^(\d{10})(\d{10})(\d{10})$/ ){ print "1 = $1\n2 = $2\n3 = $3\n"; } else { print STDERR "Bad 30-digit number!$/"; }
      Of course, there are tons of ways to handle this problem. =)
      mhoward - at - hattmoward.org