uv2007 has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I'd appreciate to know how to convert a line including percentage into a number, so (for example) this line:

ENSP00000233379 1058 30 1206 1298 96.1% 13 + 96524483 9 +6533474 8992

will become:

ENSP00000233379 1058 30 1206 1298 0.961 13 + 96524483 9 +6533474 8992

Thanks.

Replies are listed 'Best First'.
Re: Converting percentage into number
by davorg (Chancellor) on Jan 10, 2007 at 11:45 UTC
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { s|(\d+\.\d+)%|$1/100|eg; print; } __DATA__ ENSP00000233379 1058 30 1206 1298 96.1% 13 + 96524483 96533474 8992

    Depending on your data, you may need to adjust the regex that matches the percentages.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I'm all for using alternative regular expression delimiters from the slash (/) (especially when matching paths). I do think the pipe symbol (|) is potentially confusing though as, at first glance, my eye sees the alternation metacharacter, not a delimiter. Perhaps it's just me.

      Cheers,

      JohnGG

        It's just you. :-p

        I moderately often use alternative delimiters, and when I do I very often choose the pipe symbol - it stands out nicely in the line noise that regexen generally devolve into (even when using /x). Alternatively I use ! for similar reasons, but generally only if I anticipate requiring alternation in the regex.

        And no, I've not been bitten during maintenance by that choice. ;)


        DWIM is Perl's answer to Gödel
Re: Converting percentage into number
by l.frankline (Hermit) on Jan 10, 2007 at 11:53 UTC

    Hi,

    Hope this code will help you:

    while (<DATA>) { if ($_=~m#\s+(\d+\.\d+)\%(.+)$#) { print $` . " " . $1 / 100 . $2 . "\n"; } } __DATA__ ENSP00000233379 1058 30 1206 1298 96.1% 13 + 96524483 96533474 8992 ENSP00000233379 1058 30 1206 1298 64.3% 13 + 96524483 96533474 8992 ENSP00000233379 1058 30 1206 1298 23.2% 13 + 96524483 96533474 8992

    Output:

    ENSP00000233379 1058 30 1206 1298 0.961 13 + 96524483 96533474 8992
    ENSP00000233379 1058 30 1206 1298 0.643 13 + 96524483 96533474 8992
    ENSP00000233379 1058 30 1206 1298 0.232 13 + 96524483 96533474 8992

    regards,
    Franklin

    Don't put off till tomorrow, what you can do today.

      Simplifying slightly:

      while (<DATA>) { if (/(\d+\.\d+)%/) { print $` . $1 / 100 . $'; } }

      But I still prefer my s/// solution as using $` and its friends isn't recommended.

      The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches.
      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

Re: Converting percentage into number
by Melly (Chaplain) on Jan 10, 2007 at 12:07 UTC

    Assuming that your percent-values are always preceded by whitespace, and is always a valid numeric, the following should cope with values with and without decimal places:

    s/(\S+)%/$1*0.01/eg #e = evaluate right-side as expression, g = global +, i.e. change all percent-values found, not just the first
    map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
    Tom Melly, pm@tomandlu.co.uk

      If you're going to worry about with and without decimal places (++ for that), try:

      #!/usr/bin/perl use strict; use warnings; use Regexp::Common; while (<DATA>) { s|($RE{num}{decimal})%|$1 * 0.01|eg; print; } __DATA__ ENSP00000233379 1058 30 1206 1298 96.1% 13 + 96524483 96533474 8992 ENSP00000233379 1058 30 1206 1298 96% 13 + 96524483 96533474 8992
      Ok, so you have to go install Regexp::Common. Getting this right every time is well worth that minor expense, IMO. Otherwise, if somehow the data gets corrupted so there is, say, "S%" in there somewhere, your code will generate an error. To take care of "S6%", just add a \b:
      s|\b($RE{num}{decimal})%|$1 * 0.01|eg;
      It will leave the corruption alone this way. Detecting the corruption is as simple as checking if there are remaining %'s after the substitution:
      if (/%/) { print STDERR "Input has been corrupted!"; # die ? }
      Ok, maybe I'm going beyond the original post's scope here... ;-)