symgryph has asked for the wisdom of the Perl Monks concerning the following question:

I am having a devil of a time removing zeroes from the following type of output generated by brain dead excel program.

010.231.000.049,41145,010.231.000.049,1363,CDU01V43 010.231.000.050,20,010.116.223.024,2803,ZVC629

I did try some various regexes, but they don't seem to hit all the zeroes when I cat line by line. Here are some of the regexes that I used:

while (<STDIN>) { chomp; s/^0//; s/\.0([^\.])/.$1/g; print $_ . "\n"; }
"Two Wheels good, Four wheels bad."

Replies are listed 'Best First'.
Re: Remove zero padding from excel mangled Ip addresses
by druthb (Beadle) on Mar 20, 2012 at 22:39 UTC

    I'm not afraid of regexes, per se, but I frequently have to write code that someone who isn't as decent at them as I am must read and maintain. I don't doubt for a moment that this is do-able with regex, but if it takes me an hour to figure it out, it'll take my teammates two to sort out what it's doing.

    In cases like that, I'd use split and sprintf to tidy those numbers up:

    my (@octets) = split /\./,$input_string; my $output_string = sprintf "%d\.%d\.%d\.%d", $octets[0], $octets[1], $octets[2], $octets[3];
    Crystal-clear, utterly unambiguous, and it works. It's just my style, and TIMTOWTDI.

    D Ruth Bavousett
Re: Remove zero padding from excel mangled Ip addresses
by GrandFather (Saint) on Mar 20, 2012 at 23:12 UTC

    Think about when you need to remove zeros, then write a regex that does that. Be warned, it's trickier than you think and needs somewhat beyond basic level regex knowledge.

    The requirements come down to remove all leading 0 digits except where there is no following digit. Consider:

    use strict; use warnings; while (<DATA>) { s/(?<!\d)0+(?=\d)//g; print; } __DATA__ 010.231.000.049,41145,010.231.000.049,1363,CDU01V43 010.231.000.050,20,010.116.223.024,2803,ZVC629

    Prints:

    10.231.0.49,41145,10.231.0.49,1363,CDU1V43 10.231.0.50,20,10.116.223.24,2803,ZVC629

    The (? bits are look back and look ahead anchors. See the perlre documentation for more info on what they do (look for "Look-Around Assertions").

    True laziness is hard work

      Just a nitpick, cuz you did tell him to play with it, but notice that CDU01V43 got changed to CDU1V43... that is most likely not an expected behavior. The above regex removes all 0's unless preceded by a number and followed by a number, which means words with 0's in them will be removed as long as there's a digit following them.

      So you might want to split on commas and detect an IP address (dotted quartet) prior to running the above regex (which works great on all IPs I tested).

        or just change the \d to a \w: s/(?<!\w)0+(?=\d)//g;.

        True laziness is hard work
Re: Remove zero padding from excel mangled Ip addresses
by morgon (Priest) on Mar 20, 2012 at 20:32 UTC
    perl -pe 's/\b0*(?=\d)//g' <your input-file>
Re: Remove zero padding from excel mangled Ip addresses
by salva (Canon) on Mar 21, 2012 at 14:27 UTC
    the regexp you want to use is s/\b0+(?=\d)//g:
    while (<>) { my @csv = split /,/; s/\b0+(?=\d)//g for @csv[0,2]; print join(',', @csv), "\n"; }
      What would be the full code that I pasted into the excel module to get this work. I am new to regex and would like to learn it. Usually if I see the code I can reverse engineer it and understand it. Thanks, JRich
Re: Remove zero padding from excel mangled Ip addresses
by aaron_baugher (Curate) on Mar 21, 2012 at 00:54 UTC

    Sounds like you want to remove leading zeroes, which appear to be defined as: one or more zeros that appear at the beginning of the line or following a dot or comma, and preceding a digit. This probably won't be the shortest or most elegant method, but it uses simple concepts and doesn't require any recent regex features:

    s/(^|[.,])0+(\d)/$1$2/g;

    Aaron B.
    My Woefully Neglected Blog, where I occasionally mention Perl.

      What "recent regex features" are you avoiding? If you mean the look around anchors, they have been there at least since Perl 5.8.8 (http://perldoc.perl.org/5.8.8/perlre.html) which was released before some Perl monks were born.

      True laziness is hard work

        I wasn't avoiding any in particular; just acknowledging that there are probably newer features which would make a simpler regex than my "capture the characters on each side to get rid of what's between them" method. But that's the method that comes to mind most easily for me, for whatever reason, so I thought a newbie might get something out of it as another way to do it.

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.