Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All-

I have a couple input files with a column of numbers I would like to normalize from 1 to 3. For example, one file has values ranging from 0.002 to 0.080, the other 0.4 to 0.8 ... and so on ... i.e. they all have different ranges, but I would like to normalize them to the same scale.

Is there a way to write a normalization script (or is there an existing module) that takes an input file and a range (1 to 3) as argument, finds the minimum and maximum value of the numbers in the file, then normalizes the values in the file to the specified range in the argument? Thanks for your help!

Replies are listed 'Best First'.
Re: number normalization
by japhy (Canon) on Aug 09, 2003 at 20:51 UTC
    Normalization is just the proportional adjusting of a set of numbers to another set, right? So if you have a set of numbers between $l1 (low) and $h1 (high), and want to normalize them from between $l2 (low) and $h2 (high), wouldn't you just do:
    my @normalized = map { $l2 + ($_ - $l1) * ($h2 - $l2) / ($h1 - $l1) } @data;

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: number normalization
by CountZero (Bishop) on Aug 09, 2003 at 20:56 UTC

    Asking the question is answering it!

    Use the following algorithm:

    1. Call the minimum of your normalized range norm_min and the maximum of your normalized range norm_max.
    2. Go through your list of figures and find the minimum (mini) and maximum (maxi) of the list.
    3. Go again through the list, subtract mini from your number, multiply by (norm_max - norm_min) / (maxi - mini) and finally add norm_min.
    $norm_min=1; $norm_max=3; $mini; $maxi; $mini=<DATA>; $maxi=$mini; @figures=<DATA>; foreach (@figures) { chomp; $mini=$_ if $_<$mini; $maxi=$_ if $_>$maxi; } print "$mini $maxi\n"; foreach (@figures) { chomp; my $result=($_ - $mini) * ($norm_max - $norm_min) / ($maxi - $mini +) + $norm_min; print "$_: $result\n"; } __DATA__ 10 3 50 -8 100 67

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: number normalization
by Zaxo (Archbishop) on Aug 09, 2003 at 20:58 UTC

    You want to produce a linear function mapping the minimum value in a list to 1 and the maximum in the list to 3. List::Util provides min() and max() functions. The coefficient of the linear term will be $m = (3 - 1)/(max(@list)-min(@list)), and the constant term will be $c = 1 - $m * min(@list). The map builtin will be handy to then apply that to the file data in @list.

    After Compline,
    Zaxo

      can anyone provide a sample equation for a non-linear equation example? i.e. log, exponential, etc. ... (anything else you can think of). i would like to emphasize the numbers that are larger vs. the numbers that are smaller.
        perl -e ' for( 0.03, 0.4, 5, 10 ) { print int(10**$_), "\n" } ' 1 2 100000 10000000000
Re: number normalization
by bobn (Chaplain) on Aug 09, 2003 at 20:37 UTC

    #!/usr/bin/perl -w # tested A LITTLE my ($file, $low, $hi) = @ARGV; open ($file, "<$file") or die "open of '$file' failed: $!"; chomp(@vals = <$file>); print "@vals\n"; @inrange = sort { $a <=> $b } (@vals); $inmin = $inrange[0]; $inmax = $inrange[-1]; $delta = ( $hi - $low )/($inmax - $inmin); @normal = map { $delta * ( $_ - $inmin) + $low } @vals; print "@normal\n";

    --Bob Niederman, http://bob-n.com
Re: number normalization
by Anonymous Monk on Aug 10, 2003 at 13:33 UTC
    (sorry if this shows up twice, i didn't see it the first time i posted it).

    how about an equation that would normalize a number based on a non-linear equation similar to this equation ($hi and $lo are the $hi and $lo values in the dataset, and $HIVAL and $LOVAL are the output ranges for the normalized dataset)

    $num = 1 + ($output{$key}-$lo)*($HIVAL-$LOVAL)/($hi-$lo);

    thanks for all your input!!

      You can do that by composing my answer with japhy's:

      $l2 = 5; $h2 = 15; @data = ( 0.03, 0.1, 0.15, 0.5, 0.7 ); # exponential normalization $l1 = 0.03; $h1 = 0.7; ($l1,$h1) = map { 10**$_ } ($l1,$h1); @normalized = map { $l2 + (10**$_ - $l1) * ($h2 - $l2) / ($h1 - $l1) } @data; print " @normalized \n"; # 5 5.47560740136863 5.86545098040508 10.306017857952 15 # logarithmic normalization $l1 = 0.03; $h1 = 0.7; ($l1,$h1) = map { log($_) } ($l1,$h1); @normalized = map { $l2 + (log($_) - $l1) * ($h2 - $l2) / ($h1 - $l1) } @data; print " @normalized \n"; # 5 8.82227791363971 10.1095165638026 13.9317944774423 15