paulbooker has asked for the wisdom of the Perl Monks concerning the following question:

As an exercise I am reducing the size of an svg file with a number of successful substitutions.

But removing trailing zeroes from decimal fractions is beating me.

s/(-?\d\.\d*)/$1 + 0/ge is giving the best result for me, but is also removing decimal points from urls in line 1 of the svg. So zeroes are nicely removed but the svg is broken in both cases.

Can anyone spot why? (Strawberry Perl on Win 7)

wdsmin.pl
#!/usr/bin/perl use v5.22; use strict; use warnings; open my $fh, '<', 'sample.svg' or die "Cannot read : $!\n"; open my $out_fh, '>', 'samplemin.svg' or die "Cannot write : $!\n"; while (<$fh>) { s/(-?\d\.\d*)/$1 + 0/ge; # trailing zeroes on decimal fractions print $out_fh $_; } close $out_fh; <>;
sample.svg
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org +/1999/xlink" version="1.1" width="148.00mm" height="105.00mm" viewBox +="0 0 56.1468 39.8339"> <line transform="translate(0.0000, 6.6544)" stroke-width="0.2000" stro +ke="#000" x1="20.0390" y1="-0.0000" x2="56.1077" y2="-0.0000"/> </svg>

s/(-?\d\.\d*)/$1 + 0/ge removes "." between w3 and org <svg xmlns="http://www.w3org/2000/svg" xmlns:xlink="http://www.w3org/1999/xlink"

s/(-?\d*\.\d*)/$1 + 0/ge also replaces "." with 0 between www and w3 <svg xmlns="http://www0w3org/2000/svg" xmlns:xlink="http://www0w3org/1999/xlink" plus a "use warnings" flag about "." being not numeric.

I could repair the svg header with another line of Perl, but I just want to understand what's going on here.

Replies are listed 'Best First'.
Re: regex for trailing zeroes
by Eily (Monsignor) on Feb 23, 2016 at 12:49 UTC

    \d* allows 0 or more digits. So "." has 0 digits on both sides which is why it matches with your third regex. If you want one digit or more you can use + instead (see perlre) : /-?\d+\.\d+/$1 + 0/ge;

    To avoid turning any number, you could use some of the context the number is in, like only allow the regex to match if it is not followed by a letter: s/-?\d+\.\d+(?!\w)/$1 + 0/ge; (?!reg) is a negative look-ahead assertion, it will look at what goes next to check that it does not match, but will backtrack has soon has the verification is done to not include what was just checked into the match.
    s/-?\d+\.\d+(?=[)"])/$1 + 0/ge; does a similar thing the other way around, perl only allows a match that is followed by ) or ", this is a positive look-ahead assertion (once again, perl checks the rest of the string, but does not include it in the match).

    Edit: s/only only/only allow/

      Thank you, Eily and choroba, that works for me

      Important lesson learnt!

Re: regex for trailing zeroes
by choroba (Cardinal) on Feb 23, 2016 at 12:49 UTC
    It seems just replacing * with + does the trick. There's always at least one digit after the dot, right? Just don't use IP addresses in the links.
    s/(-?\d\.\d+)/$1 + 0/ge;

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Domain names are allowed to start with a digit. For example 3m.com