mao9856 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all

With the help of perl monks I was able to get desired output as follows which is my new input now. I have abc.txt file which as 26 columns and 1072 rows with data as follows:

ID file1 file2 file3 file4 file5 ...file25 ID1 ABC12 ABC12 - ABC12 - ABC12 ID2 XYZ11 - - - XYZ11 - ID3 - - EFG21 EFG21 - EFG21 ...... ...... ID1072 - PQR34 PQR34 - PQR34 -

I want to read each row, print all IDs (first column) as such and print unique word in each row (say ABC12) and the times the word is repeated (4 in this case).

I split this problem into two codes and then thought of combining the two codes to get output

I printed first column using following code:

#!/usr/bin/perl use warnings; use strict; open (my $f, '<', 'abc.txt') or die; while (my $line = <$f>) { my @elems = split ' ', $line; print " ", $elems[0]; print "\n"; }

This code is working and it prints first column as desired output

Then, I removed first column from my data. With the remaining data (pm.txt), I tried following code and it does give me unique word and their counts but it prints only 1057 rows.

#!/usr/bin/perl use strict; use warnings; my %wordcount = (); open(theFile,"<pm.txt"); while(my $line = <theFile>) { chomp($line); my @words = split(' ', $line); foreach my $word(@words) { $wordcount{$word} += 1; } } foreach my $key(keys %wordcount) { print "Word: $key Repeat_Count: " . ($wordcount{$key} ) . "\n"; }

I thought of combining the two codes for solution, but it doesn't turned out what i expected. I want my output to be like this.

ID1 ABC12 4 ID2 XYZ11 2 ID3 EFG21 3 ...... ...... ID1072 PQR34 3

Thank you in advance

  • Comment on print first column, unique word in each row along with number times that word is repeated in each row
  • Select or Download Code

Replies are listed 'Best First'.
Re: print first column, unique word in each row along with number times that word is repeated in each row
by Eily (Monsignor) on Jan 10, 2018 at 10:30 UTC

    OK so it looks like you are talking about find common data in multiple files. Could you please give us the code you use from there, in a <readmore> tag? Right now what you want to do is parse the input data, and format it into a file, then parse that file for processing. You could skip the second parsing step if you collect the data in a useful data structure on the first try.

        Add 2 more columns into the array and shift the others up

        #!/usr/bin/env perl use strict; use warnings; my %data = (); #@ARGV = map { "File$_" }(1..4); my $num = @ARGV; # input for my $i (0..$num-1){ open my $fh,'<',$ARGV[$i] or die "$!"; while (<$fh>) { my ( $key, $value ) = split; $data{$key}[0] = $value; $data{$key}[1] += 1; # count $data{$key}[$i+2] = $value; } close $fh; } # output print join ("\t", 'ID', 'Name','Count', @ARGV),"\n"; foreach my $key ( sort keys %data ) { my @line = map { $_ || '-' } @{ $data{$key} }[0..$num+1]; if (grep $_ eq '-',@line){ print join ("\t", $key, @line),"\n"; } }
        poj
Re: print first column, unique word in each row along with number times that word is repeated in each row
by BillKSmith (Monsignor) on Jan 10, 2018 at 13:35 UTC
    C:\Users\Bill\forums\monks>type abc.txt ID1 ABC12 ABC12 - ABC12 - ABC12 ID2 XYZ11 - - - XYZ11 - ID3 - - EFG21 EFG21 - EFG21 ID1072 - PQR34 PQR34 - PQR34 - C:\Users\Bill\forums\monks>type mao9856.pl #!perl -n use strict; use warnings; my @files = split; print shift @files, "\n"; my %uniq; $uniq{$_}++ foreach (@files) ; while (my($name, $count) = each %uniq) { print " ", $name, " ", $count, "\n"; } C:\Users\Bill\forums\monks>perl mao9856.pl abc.txt ID1 - 2 ABC12 4 ID2 XYZ11 2 - 4 ID3 - 3 EFG21 3 ID1072 PQR34 3 - 3
    Bill