Re^3: how to change this code into perl
by BrowserUk (Patriarch) on Aug 30, 2015 at 08:14 UTC
|
You're right. I tried to type a pattern I've used before, directly into the browser without having had my morning coffee. It should be:
perl -anle"@L = @F, next if $. == 1; print if $L[0] eq $F[0]; @L = @F;
+" in.txt > out.txt
| [reply] [d/l] |
|
|
syntax error at -e line 2, at EOF
Execution of -e aborted due to compilation errors.
I am pretty new to perl, what does the error mean. I googled but its not very clear
| [reply] [d/l] |
|
|
| [reply] |
Re^3: how to change this code into perl
by perlnewbie012215 (Novice) on Aug 30, 2015 at 16:34 UTC
|
Thank you Laurent_R!!! the one liner is not printing all the lines, say I have three duplicates its only printing the last two or one duplicate, not all of them.
1 twenty
2 thirty
1 forty
1 fifty
output
1 twenty
1 forty
1 fifty
is there a way to script it instead of a oneliner. Thank you guys | [reply] [d/l] [select] |
|
|
OK, a real script that should detect all lines having duplicate keys (quick script, untested, no time now, but based on something I am doing quite often, so, hopefully, I've it right).
my ($previous_key, $previous_line);
open my $IN, "<", $infile or die "cannot open $infile $!";
while (<$IN>) {
my $key = $1 if /^(\w+)/;
if ($key eq $previous_key) {
print $previous_line if defined $previous_line;
print $_;
undef $previous_line;
} else {
$previous_line = $_;
}
$previous_key = $key;
}
| [reply] [d/l] |
|
|
Sure, where there are two entries with the same key, it only prints the second one (the duplicate, not the original one); when there are three, it will print only the second one and the third one. And of course, it will work only if the lines are properly sorted.
If you need to print all the lines that are duplicates, then it is slightly more complicated, because you need to keep track of recent history. And then, yes, it is probably better to write a real script.
Another way is to use a hash to keep track of everything in memory.
| [reply] |
|
|
| [reply] |
|
|
#!perl
use strict;
use warnings;
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
open IN,'<',$infile or die "Could not open $infile : $!";
my %count = ();
my @lines = ();
while (<IN>){
push @lines,$_;
if (/^(\S+)/){
++$count{$1};
}
}
close IN;
open OUT,'>',$outfile or die "Could not open $outfile : $!";
for (@lines){
if (/^(\S+)/){
print OUT $_ if $count{$1} > 1;
}
}
close OUT;
poj | [reply] [d/l] |
|
|
Hi poj, thank you for the quick response, I tried the script and could not get the duplicate rows, the outcome came up with zero rows. below is the script i tried
open IN,'<','/home/scripts/imageoutcome.txt' or die "Could not open $i
+nfile : $!";
my %count = ();
my @lines = ();
while (<IN>){
push @lines,$_;
# print $_;
if (/^(\S+)/){
++$count{$1};
}
}
close IN;
open OUT,'>','/home/scripts/outcome.txt' or die "Could not open $outfi
+le : $!";
#print @lines;
for (@lines){
if (/^(\S+)/){
print $count{$1};
print OUT $_ if $count{$1} > 0;
}
}
close OUT;
| [reply] [d/l] |
|
|
1 twenty
2 thirty
1 forty
1 fifty
Update : Does your file have spaces at the beginning of the lines ?
poj | [reply] [d/l] |
|
|
|
|
| [reply] |
|
|
Thank you very much Laurent_R, I tried the script and its printing all the rows, instead of duplicates. Laurent_R, this code looks very interesting, can you please explain it
#!/usr/bin/perl
my ($previous_key, $previous_line);
open my $IN, "<", '/home/scripts/imageoutcome.txt' or die "cannot open
+ $infile $!";
while (<$IN>) {
my $key = $1 if /^(\w+)/;
if ($key eq $previous_key) {
print $previous_line if defined $previous_line;
print $_;
undef $previous_line;
} else {
$previous_line = $_;
}
$previous_key = $key;
}
| [reply] [d/l] |
|
|
I tried the script and its printing all the rows
Then you have to show me your input data. I've just tried that script with the following input data:
aa blah
bb blah
bb blahblah
bb foo
cc dlqskjf
cc cfkqs
dd dkls
ee dsjkqjs
ff blah
gg klsqdj
gg sqkl
and it print only the lines where the first column is a duplicate, as shown in this output:
bb blah
bb blahblah
bb foo
cc dlqskjf
cc cfkqs
gg klsqdj
gg sqkl
This seems to work perfectly.
Otherwise, the way it works is that it reads the file one line at a time, and store this line ($previous_line), as well as the comparison key until the next line is read. If they have the same key, then I print the previous line (if defined) and the current one; in such case, I undef the previous line to prevent it from being printed twice if there are triplicates.
If it does not work properly for you, please show your input and/or test data.
| [reply] [d/l] [select] |
|
|
Hi Laurent_R, That was my bad, I had hidden characters in it, thats why I did not work. Your script is working...thank you so much for helping me and explaining it..
| [reply] |
|
|
| [reply] |