Largins has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering if there is either a regex that will parse a string like strtol. I have the following string:

"[(58)(4)] Federal Census [(60)(3)] County/Parish, (File 1 of X)"

I used:   my @values = split(" ", $_);
to split on space, but if there is a '[' in the array element, I want to extract the numbers
for exmple from [(60)(3)] i want the 60 and then the 3, one at a time. i have never been really good with regular expressions. There so easy to use, someone wrote a book about them

Thanks in advance
Largins

to split on space, but if there is a

Replies are listed 'Best First'.
Re: perl equiv regex for C strtol
by toolic (Bishop) on May 01, 2012 at 00:49 UTC
    for exmple from [[(60)(3)]] i want the 60 and then the 3, one at a tim +e
    Here is one way:
    my $s = '[[(60)(3)]]'; my @nums = $s =~ /\((\d+)\)/g;

    See also: perlre

      Hello

      This works perfectly, thanks much. I do need to spend some time relearning regular expressions. Expecially now that I have a Linux box up and running again.
      This one will save the evening
      Thanks again - I have never been disappointed with the response that I get from PerkMonks. L.

Re: perl equiv regex for C strtol
by GrandFather (Saint) on May 01, 2012 at 00:44 UTC

    If you just want the numbers then /(\d+)/g will do the trick. It may be that you have further constraints than that, in which case start by reading perlretut.

    True laziness is hard work

      Thanks for the advice.
      I have been writing code since 1968, actually the first few lines prior to that. I had a series back in the 70's in Kilobaud magazine, and spent 15 years in the Unix environment. I have used regular expressions, and even felt quite confident when it came to knowing how I was going to do something before I started. Unlike playing the piano, however I have had to re-learn how to use every time I stop for a few years.
      Now I am 65, and am lucky I remember which side of the bed to get out of
      I will however learn again, I just need a little help this evening.
      As always, I have found it on PerlMonks!
      Thanks Again - L.

      br

        Hi

        Regexp::English

        #!/usr/bin/perl -- use Regexp::English; print Regexp::English ->new ->remember ->multiple ->digit, "\n"; __END__ (?^:((?:\d+)))

        Becomes

        my( @numbers ) = $str -~ m{ (\d+) }gx;
        /g means global match (match them all), x means ignore literal white space in pattern

        YAPE::Regex::Explain can explain a lot of these (though it could use some updating)

        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr{ (\d+) }x, )->explain; __END__ The regular expression: (?x-ims: (\d+) ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

        See also Visual (perl/Tk) regex tweaking utility (its like a living cheatsheet). See also perlrequick

        And see Regexp::Common

        $ perl -MRegexp::Common -e " print $RE{num}{int} " (?:(?:[+-]?)(?:[0123456789]+))

        And the magic of /x revealed

        #!/usr/bin/perl -- use strict; use warnings; my $REint = qr{ #--------------------------------------------------------------------- +- (?: # group, but do not capture: #--------------------------------------------------------------------- +- (?: # group, but do not capture: #--------------------------------------------------------------------- +- [+-]? # any character of: '+', '-' (optional # (matching the most amount possible)) #--------------------------------------------------------------------- +- ) # end of grouping #--------------------------------------------------------------------- +- (?: # group, but do not capture: #--------------------------------------------------------------------- +- [0123456789]+ # any character of: '0', '1', '2', '3', # '4', '5', '6', '7', '8', '9' (1 or # more times (matching the most amount # possible)) #--------------------------------------------------------------------- +- ) # end of grouping #--------------------------------------------------------------------- +- ) # end of grouping #--------------------------------------------------------------------- +- }x; # end of qr print "$_\n" for "1 plus 1 is 2" =~ m/$REint/g __END__ 1 1 2
Re: perl equiv regex for C strtol
by Marshall (Canon) on May 01, 2012 at 02:14 UTC
    Well it is amazing, but check out Perl C Lib. Instead of strtol(s, &p, n) there is Strtol(s, &p, n). There are usually better regex ways to do this in Perl, but a lot of these C functions are available.

    If you care to explain more what your are doing with what appear to be census records, I am sure more suggestions about how to organize and access the data will be forthcoming.

    BTW: the Perl "mantra" for such situations is: "Use split when you know what you want to throw away. Use regex when you know what you want to keep". There are exceptions, but that is a pretty good general rule about when to use split vs regex.

      I do believe the OP is wanting to program in perl, not c. perlclib - Internal replacements for standard C library functions

        I asked about the intent and the application. Let's see what happens.
Re: perl equiv regex for C strtol
by locked_user sundialsvc4 (Abbot) on May 01, 2012 at 02:06 UTC

    That’s a really good response (fav’d), because it focuses on what is wanted instead of what surrounds it (as, I alas must confess, my solution probably would have done).   The /g modifier works really well here, because with it you can match multiple times in the same string.

    from perldoc perlre: g and c Global matching, and keep the Current position after failed matching. Unlike i, m, s and x, these two flags affect the way the regex is used rather than the regex itself. See "Using regular expressions in Perl" in perlretut for further explanation of the g and c modifiers.