Here's another option:

use strict; use warnings; use Text::ASCIITable; use open qw(:std :utf8); my %hash; my $tb = Text::ASCIITable->new(); $tb->setCols( 'WordF1', 'WordF2', 'Difference' ); while (<>) { next if $. < 3; push @{ $hash{$1} }, $2 while /\|\s+(\w+)\s+\|\s+([.\d]+)/g; } for my $word ( keys %hash ) { if ( @{ $hash{$word} } == 2 ) { $hash{$word} = $hash{$word}->[0] - $hash{$word}->[1]; } else { delete $hash{$word}; } } for my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { $tb->addRow( $word, $word, sprintf( '%0.05f', $hash{$word} ) ); } print $tb;

Usage: perl inFile [>outFile]

The last, optional parameter directs output to a file.

Output on your dataset:

.--------------------------------------. | WordF1 | WordF2 | Difference | +------------+------------+------------+ | politici | politici | 0.01940 | | referendum | referendum | 0.01726 | | verità | verità | 0.01454 | | scandalo | scandalo | 0.00978 | | consenso | consenso | 0.00887 | | vergogna | vergogna | 0.00592 | '------------+------------+------------'

The script initially creates a hash of arrays (HoA), pairing the word with the associated value(s). Next, it iterates through the hash, removing key/value pairs for those words occurring in only one file, then pairs the word with the calculated difference. Lastly, it builds the table, sorting the rows in descending Difference, since your original table displayed words in descending percentage. Use $hash{$a} <=> $hash{$b} if you want the rows shown in ascending Difference.

You said, "I have processed two text files..." I (somehow) get the impression that each of the two files contain a corpus which underwent processing resulting in generating your original table (perhaps you sent a program a list of files to analyze)--this, instead of merely having word/value pairs in those two files. Is this correct? If not, and you do have these word/value pairs in those files, consider the offered file solutions.

Hope this helps!

Edit: Below is a script which takes two files containing the two data sets you posted earlier. It's just slightly modified from the script above:

use strict; use warnings; use Text::ASCIITable; use open qw(:std :utf8); my %hash; my $tb = Text::ASCIITable->new(); $tb->setCols( 'WordF1', 'WordF2', 'Difference' ); while (<>) { my ( $word, $val ) = (split)[ 0, -1 ]; push @{ $hash{$word} }, $val; } for my $word ( keys %hash ) { if ( @{ $hash{$word} } == 2 ) { $hash{$word} = $hash{$word}->[0] - $hash{$word}->[1]; } else { delete $hash{$word}; } } for my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { $tb->addRow( $word, $word, sprintf( '%0.06f', $hash{$word} ) ); } print $tb;

Usage: perl inFile1 inFile2 [>outFile]

Output on your datasets:

.----------------------------------------------------. | WordF1 | WordF2 | Difference | +-------------------+-------------------+------------+ | consensi | consensi | 0.000626 | | disonesti | disonesti | 0.000507 | | antidemocratico | antidemocratico | 0.000102 | | antidemocraticità | antidemocraticità | 0.000029 | | antidemocratica | antidemocratica | -0.000014 | | antidemocratici | antidemocratici | -0.000017 | | consensuali | consensuali | -0.000040 | | antidemocratiche | antidemocratiche | -0.000130 | | consensuale | consensuale | -0.000230 | | consenso | consenso | -0.008922 | '-------------------+-------------------+------------'

In reply to Re: Aligning text and then perfom calculations by Kenosis
in thread Aligning text and then perfom calculations by epimenidecretese

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.