Re^3: how to change this code into perl

Replies are listed 'Best First'.
Re^4: how to change this code into perl by Laurent_R (Canon) on Aug 30, 2015 at 17:57 UTC
OK, a real script that should detect all lines having duplicate keys (quick script, untested, no time now, but based on something I am doing quite often, so, hopefully, I've it right). `my ($previous_key, $previous_line); open my $IN, "<", $infile or die "cannot open $infile $!"; while (<$IN>) { my $key = $1 if /^(\w+)/; if ($key eq $previous_key) { print $previous_line if defined $previous_line; print $_; undef $previous_line; } else { $previous_line = $_; } $previous_key = $key; }` [download]	[reply] [d/l]
Re^4: how to change this code into perl by Laurent_R (Canon) on Aug 30, 2015 at 17:42 UTC
Sure, where there are two entries with the same key, it only prints the second one (the duplicate, not the original one); when there are three, it will print only the second one and the third one. And of course, it will work only if the lines are properly sorted. If you need to print all the lines that are duplicates, then it is slightly more complicated, because you need to keep track of recent history. And then, yes, it is probably better to write a real script. Another way is to use a hash to keep track of everything in memory.	[reply]
Re^4: how to change this code into perl by perlnewbie012215 (Novice) on Aug 30, 2015 at 19:07 UTC
Hi poj, thank you for the quick response, I tried the script and could not get the duplicate rows, the outcome came up with zero rows. below is the script i tried `open IN,'<','/home/scripts/imageoutcome.txt' or die "Could not open $i +nfile : $!"; my %count = (); my @lines = (); while (<IN>){ push @lines,$_; # print $_; if (/^(\S+)/){ ++$count{$1}; } } close IN; open OUT,'>','/home/scripts/outcome.txt' or die "Could not open $outfi +le : $!"; #print @lines; for (@lines){ if (/^(\S+)/){ print $count{$1}; print OUT $_ if $count{$1} > 0; } } close OUT;` [download]	[reply] [d/l]
Re^5: how to change this code into perl by poj (Abbot) on Aug 30, 2015 at 19:15 UTC
Did you try it with the sample you provided ? `1 twenty 2 thirty 1 forty 1 fifty` [download] Update : Does your file have spaces at the beginning of the lines ? poj	[reply] [d/l]
Re^6: how to change this code into perl by perlnewbie012215 (Novice) on Aug 30, 2015 at 20:16 UTC
It seems like some special characters and space, I delete those and its working perfectly now	[reply]
Re^4: how to change this code into perl by perlnewbie012215 (Novice) on Aug 30, 2015 at 17:29 UTC
the file will be around 20000 rows and the first columns will always be text..	[reply]
Re^5: how to change this code into perl by poj (Abbot) on Aug 30, 2015 at 17:44 UTC
`#!perl use strict; use warnings; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open IN,'<',$infile or die "Could not open $infile : $!"; my %count = (); my @lines = (); while (<IN>){ push @lines,$_; if (/^(\S+)/){ ++$count{$1}; } } close IN; open OUT,'>',$outfile or die "Could not open $outfile : $!"; for (@lines){ if (/^(\S+)/){ print OUT $_ if $count{$1} > 1; } } close OUT;` [download] poj	[reply] [d/l]
Re^4: how to change this code into perl by perlnewbie012215 (Novice) on Aug 30, 2015 at 19:14 UTC
Thank you very much Laurent_R, I tried the script and its printing all the rows, instead of duplicates. Laurent_R, this code looks very interesting, can you please explain it `#!/usr/bin/perl my ($previous_key, $previous_line); open my $IN, "<", '/home/scripts/imageoutcome.txt' or die "cannot open + $infile $!"; while (<$IN>) { my $key = $1 if /^(\w+)/; if ($key eq $previous_key) { print $previous_line if defined $previous_line; print $_; undef $previous_line; } else { $previous_line = $_; } $previous_key = $key; }` [download]	[reply] [d/l]
Re^5: how to change this code into perl by Laurent_R (Canon) on Aug 31, 2015 at 10:47 UTC
I tried the script and its printing all the rows Then you have to show me your input data. I've just tried that script with the following input data: `aa blah bb blah bb blahblah bb foo cc dlqskjf cc cfkqs dd dkls ee dsjkqjs ff blah gg klsqdj gg sqkl` [download] and it print only the lines where the first column is a duplicate, as shown in this output: `bb blah bb blahblah bb foo cc dlqskjf cc cfkqs gg klsqdj gg sqkl` [download] This seems to work perfectly. Otherwise, the way it works is that it reads the file one line at a time, and store this line ($previous_line), as well as the comparison key until the next line is read. If they have the same key, then I print the previous line (if defined) and the current one; in such case, I undef the previous line to prevent it from being printed twice if there are triplicates. If it does not work properly for you, please show your input and/or test data.	[reply] [d/l] [select]
Re^4: how to change this code into perl by poj (Abbot) on Aug 30, 2015 at 17:23 UTC
How big are the files and is the first column always numeric ? poj	[reply]
Re^4: how to change this code into perl by perlnewbie012215 (Novice) on Aug 30, 2015 at 19:40 UTC
Hi poj, you are correct, I forgot chomp, its working now. thank you so much for helping me.	[reply]
Re^4: how to change this code into perl by perlnewbie012215 (Novice) on Sep 01, 2015 at 22:39 UTC
Hi Laurent_R, That was my bad, I had hidden characters in it, thats why I did not work. Your script is working...thank you so much for helping me and explaining it..	[reply]