Remove zero padding from excel mangled Ip addresses

symgryph has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Remove zero padding from excel mangled Ip addresses by druthb (Beadle) on Mar 20, 2012 at 22:39 UTC
I'm not afraid of regexes, per se, but I frequently have to write code that someone who isn't as decent at them as I am must read and maintain. I don't doubt for a moment that this is do-able with regex, but if it takes me an hour to figure it out, it'll take my teammates two to sort out what it's doing. In cases like that, I'd use `split` and `sprintf` to tidy those numbers up: `my (@octets) = split /\./,$input_string; my $output_string = sprintf "%d\.%d\.%d\.%d", $octets[0], $octets[1], $octets[2], $octets[3];` [download] Crystal-clear, utterly unambiguous, and it works. It's just my style, and TIMTOWTDI. D Ruth Bavousett	[reply] [d/l] [select]
Re: Remove zero padding from excel mangled Ip addresses by GrandFather (Saint) on Mar 20, 2012 at 23:12 UTC
Think about when you need to remove zeros, then write a regex that does that. Be warned, it's trickier than you think and needs somewhat beyond basic level regex knowledge. The requirements come down to remove all leading 0 digits except where there is no following digit. Consider: `use strict; use warnings; while (<DATA>) { s/(?<!\d)0+(?=\d)//g; print; } __DATA__ 010.231.000.049,41145,010.231.000.049,1363,CDU01V43 010.231.000.050,20,010.116.223.024,2803,ZVC629` [download] Prints: `10.231.0.49,41145,10.231.0.49,1363,CDU1V43 10.231.0.50,20,10.116.223.24,2803,ZVC629` [download] The `(?` bits are look back and look ahead anchors. See the perlre documentation for more info on what they do (look for "Look-Around Assertions"). True laziness is hard work	[reply] [d/l] [select]
Re^2: Remove zero padding from excel mangled Ip addresses by raybies (Chaplain) on Mar 21, 2012 at 16:50 UTC
Just a nitpick, cuz you did tell him to play with it, but notice that CDU01V43 got changed to CDU1V43... that is most likely not an expected behavior. The above regex removes all 0's unless preceded by a number and followed by a number, which means words with 0's in them will be removed as long as there's a digit following them. So you might want to split on commas and detect an IP address (dotted quartet) prior to running the above regex (which works great on all IPs I tested).	[reply]
Re^3: Remove zero padding from excel mangled Ip addresses by GrandFather (Saint) on Mar 21, 2012 at 22:42 UTC
or just change the \d to a \w: `s/(?<!\w)0+(?=\d)//g;`. True laziness is hard work	[reply] [d/l]
Re^4: Remove zero padding from excel mangled Ip addresses by raybies (Chaplain) on Mar 22, 2012 at 14:44 UTC
Re: Remove zero padding from excel mangled Ip addresses by morgon (Priest) on Mar 20, 2012 at 20:32 UTC
`perl -pe 's/\b0*(?=\d)//g' <your input-file>` [download]	[reply] [d/l]
Re: Remove zero padding from excel mangled Ip addresses by salva (Canon) on Mar 21, 2012 at 14:27 UTC
the regexp you want to use is `s/\b0+(?=\d)//g`: `while (<>) { my @csv = split /,/; s/\b0+(?=\d)//g for @csv[0,2]; print join(',', @csv), "\n"; }` [download]	[reply] [d/l] [select]
Re^2: Remove zero padding from excel mangled Ip addresses by Anonymous Monk on Apr 03, 2012 at 13:48 UTC
What would be the full code that I pasted into the excel module to get this work. I am new to regex and would like to learn it. Usually if I see the code I can reverse engineer it and understand it. Thanks, JRich	[reply]
Re: Remove zero padding from excel mangled Ip addresses by aaron_baugher (Curate) on Mar 21, 2012 at 00:54 UTC
Sounds like you want to remove leading zeroes, which appear to be defined as: one or more zeros that appear at the beginning of the line or following a dot or comma, and preceding a digit. This probably won't be the shortest or most elegant method, but it uses simple concepts and doesn't require any recent regex features: `s/(^\|[.,])0+(\d)/$1$2/g;` [download] Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l]
Re^2: Remove zero padding from excel mangled Ip addresses by GrandFather (Saint) on Mar 21, 2012 at 01:54 UTC
What "recent regex features" are you avoiding? If you mean the look around anchors, they have been there at least since Perl 5.8.8 (http://perldoc.perl.org/5.8.8/perlre.html) which was released before some Perl monks were born. True laziness is hard work	[reply]
Re^3: Remove zero padding from excel mangled Ip addresses by aaron_baugher (Curate) on Mar 21, 2012 at 04:26 UTC
I wasn't avoiding any in particular; just acknowledging that there are probably newer features which would make a simpler regex than my "capture the characters on each side to get rid of what's between them" method. But that's the method that comes to mind most easily for me, for whatever reason, so I thought a newbie might get something out of it as another way to do it. Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply]