print first column, unique word in each row along with number times that word is repeated in each row

mao9856 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all

With the help of perl monks I was able to get desired output as follows which is my new input now. I have abc.txt file which as 26 columns and 1072 rows with data as follows:

ID    file1   file2   file3   file4   file5 ...file25
ID1   ABC12   ABC12     -     ABC12     -      ABC12
ID2   XYZ11     -       -       -     XYZ11      -
ID3     -       -     EFG21   EFG21     -      EFG21
......
......
ID1072  -     PQR34   PQR34     -     PQR34      -
[download]

I want to read each row, print all IDs (first column) as such and print unique word in each row (say ABC12) and the times the word is repeated (4 in this case).

I split this problem into two codes and then thought of combining the two codes to get output

I printed first column using following code:

#!/usr/bin/perl
use warnings;
use strict;

open (my $f, '<', 'abc.txt') or die;

while (my $line = <$f>) {
  my @elems = split ' ', $line;
  print " ", $elems[0];
  print "\n";
}
[download]

This code is working and it prints first column as desired output

Then, I removed first column from my data. With the remaining data (pm.txt), I tried following code and it does give me unique word and their counts but it prints only 1057 rows.

#!/usr/bin/perl
use strict;
use warnings;

my %wordcount = ();

open(theFile,"<pm.txt");

while(my $line = <theFile>)
{
    chomp($line);
    my @words = split(' ', $line);
    foreach my $word(@words)
    {
        $wordcount{$word} += 1;
    }
}

foreach my $key(keys %wordcount)
{
    print "Word: $key Repeat_Count: " . ($wordcount{$key} ) . "\n";
}
[download]

I thought of combining the two codes for solution, but it doesn't turned out what i expected. I want my output to be like this.

ID1  ABC12  4
ID2  XYZ11  2
ID3  EFG21  3
......
......
ID1072 PQR34 3
[download]

Thank you in advance

Comment on print first column, unique word in each row along with number times that word is repeated in each row Select or Download Code

Replies are listed 'Best First'.
Re: print first column, unique word in each row along with number times that word is repeated in each row by Eily (Monsignor) on Jan 10, 2018 at 10:30 UTC
OK so it looks like you are talking about find common data in multiple files. Could you please give us the code you use from there, in a <readmore> tag? Right now what you want to do is parse the input data, and format it into a file, then parse that file for processing. You could skip the second parsing step if you collect the data in a useful data structure on the first try.	[reply]
Re^2: print first column, unique word in each row along with number times that word is repeated in each row by mao9856 (Sexton) on Jan 11, 2018 at 03:19 UTC
Yes please. This was the code that got me desired output which is my new input now Re^11: find common data in multiple files by poj	[reply]
Re^3: print first column, unique word in each row along with number times that word is repeated in each row by poj (Abbot) on Jan 11, 2018 at 06:37 UTC
Add 2 more columns into the array and shift the others up #!/usr/bin/env perl use strict; use warnings; my %data = (); #@ARGV = map { "File$_" }(1..4); my $num = @ARGV; # input for my $i (0..$num-1){ open my $fh,'<',$ARGV[$i] or die "$!"; while (<$fh>) { my ( $key, $value ) = split; $data{$key}[0] = $value; $data{$key}[1] += 1; # count $data{$key}[$i+2] = $value; } close $fh; } # output print join ("\t", 'ID', 'Name','Count', @ARGV),"\n"; foreach my $key ( sort keys %data ) { my @line = map { $_ \|\| '-' } @{ $data{$key} }[0..$num+1]; if (grep $_ eq '-',@line){ print join ("\t", $key, @line),"\n"; } } [download] poj	[reply] [d/l]
Re^4: print first column, unique word in each row along with number times that word is repeated in each row by mao9856 (Sexton) on Jan 11, 2018 at 07:11 UTC
Re: print first column, unique word in each row along with number times that word is repeated in each row by BillKSmith (Monsignor) on Jan 10, 2018 at 13:35 UTC
C:\Users\Bill\forums\monks>type abc.txt ID1 ABC12 ABC12 - ABC12 - ABC12 ID2 XYZ11 - - - XYZ11 - ID3 - - EFG21 EFG21 - EFG21 ID1072 - PQR34 PQR34 - PQR34 - C:\Users\Bill\forums\monks>type mao9856.pl #!perl -n use strict; use warnings; my @files = split; print shift @files, "\n"; my %uniq; $uniq{$_}++ foreach (@files) ; while (my($name, $count) = each %uniq) { print " ", $name, " ", $count, "\n"; } C:\Users\Bill\forums\monks>perl mao9856.pl abc.txt ID1 - 2 ABC12 4 ID2 XYZ11 2 - 4 ID3 - 3 EFG21 3 ID1072 PQR34 3 - 3 [download] Bill	[reply] [d/l]