Re: Read file line by line and check equal lines
by Util (Priest) on Mar 06, 2007 at 07:31 UTC
|
- use strict and use warnings.
- I don't think that your trick of $line % 2 == 0 can work; both singles and pairs can occur on even or odd lines.
- If you "pre-load" $prev before you begin your while(<>){...} loop, you can avoid keeping track of $line == 1 and $line >= 2.
Here is my (loosely tested) version:
#!perl
use strict;
use warnings;
my $last_line = <DATA>;
my $seen_count = 1;
while (<DATA>) {
if ( $last_line ne $_ ) {
print $last_line if $seen_count == 1;
$last_line = $_;
$seen_count = 0;
}
$seen_count++;
}
print $last_line if $seen_count == 1;
__END__
a1a
a1a
b1b
c1c
c1c
d1d
d1d
e1e
f1f
g1g
g1g
h1h
h1h
i1i
j1j
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Read file line by line and check equal lines
by rinceWind (Monsignor) on Mar 06, 2007 at 07:53 UTC
|
You may be looking for a Perl solution, so I'll give you feedback. You may be learning Perl, or may not be running on a Unix platform.
First a comment about lexical variables and use strict;. Get yourself into the habit of declaring your variables with my, and only declaring them in the narrowest scope that you need them. A Super search on "use strict" will give you many answers that explain why this is a good idea.
Secondly, your test for evenness of the line count is the wrong way to do things. Each time you get an odd number of singular lines, it will throw out the evenness check.
Here's my stab:
use strict;
use warnings;
my $prev;
while (<DATA>) {
chomp;
# Note that $prev will be undef only on the first time round.
my $curr = $_;
if (!defined $prev)
{
$prev = $curr;
next;
}
if ($curr ne $prev) {
print "$prev\n";
$prev = $curr;
}
else
{
undef $prev;
}
}
print "$prev\n" if defined $prev;
__DATA__
a1a
a1a
b1b
c1c
c1c
d1d
d1d
e1e
f1f
g1g
g1g
h1h
h1h
i1i
j1j
Note that you don't deal with, or say whay you want to happen when you get three or more identical lines in the input.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Read file line by line and check equal lines
by graq (Curate) on Mar 06, 2007 at 08:38 UTC
|
I thought I would tempt you with this version.
#!/usr/bin/perl
use strict;
use warnings;
my $previous;
&unique while(<DATA>);
sub unique
{
return if $previous and $previous eq $_;
print;
$previous = $_;
}
__DATA__
a1a
a1a
b1b
c1c
c1c
d1d
d1d
e1e
f1f
g1g
g1g
h1h
h1h
i1i
j1j
k1k
k1k
k1k
| [reply] [Watch: Dir/Any] [d/l] |
|
Why the sub call with a leading ampersand (&unique)? Since perl 5 that form is no longer necessary and has the side-effect of using (and exposing)the caller's @_ instead of building its own. You are not using @_ at all, so it's unnecessary.
Instead, your sub uses two global variavles, one lexical ($previous) and the package variable $_. That should be avoided except in very special cases. In this case it is hard to see why you use a sub at all. Just expand the code in the loop body. That would be clearer.
Anno
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
That doesn't exactly solve the stated problem, does it? As I read it, the OP only wants those lines that only appear one time in the input. What you've given is a way to display a given line from the input at most one time.
Here's a solution (untested, so there's probably boundary problems) that only needs to keep at most 3 lines in memory under the constraint that the lines are already sorted.
#!/usr/bin/perl
use strict; use warnings;
my ($p1, $p2);
while(<DATA>) {
next unless $p1 and $p2;
if ($p2 eq $p1) {$p2 = $p1 = undef; redo; }
if ($p2 ne $p1) { print $p2; next; }
} continue { $p2 = $p1; $p1 = $_; }
__DATA__
a1a
a1a
b1b
c1c
c1c
d1d
d1d
e1e
f1f
g1g
g1g
h1h
h1h
i1i
j1j
k1k
k1k
k1k
| [reply] [Watch: Dir/Any] [d/l] |
Re: Read file line by line and check equal lines
by diotalevi (Canon) on Mar 06, 2007 at 07:14 UTC
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Read file line by line and check equal lines
by McDarren (Abbot) on Mar 06, 2007 at 13:32 UTC
|
"Also it is huge file so i cannot use array or hash."
How huge?
Have you tried it with a hash - you might be surprised :)
Update: Note - as correctly pointed out by chrism01, the below won't work where you have odd numbers of duplicates. See below for a solution that I believe addresses that issue.
Give the following a go:
#!/usr/bin/perl -w
use strict;
my %wanted;
while (<DATA>) {
exists $wanted{$_} ? delete $wanted{$_} : $wanted{$_}++;
}
print sort keys %wanted;
__DATA__
a1a
a1a
b1b
c1c
c1c
d1d
d1d
e1e
f1f
g1g
g1g
h1h
h1h
i1i
j1j
Output:
b1b
e1e
f1f
i1i
j1j
Update: or as a one-liner:
perl -ne 'exists $x{$_}?delete $x{$_}:$x{$_}++;}{print for sort keys
+%x;' < input.txt > output.txt
Try running that on your input file. The point about using a hash in that way is that you are only creating hash keys for those lines that are unique (and only appear once), so it's actually quite efficient. Whenever you are thinking "unique", a hash is almost certainly what you want.
Cheers,
Darren :) | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Mcdarren,
I like your 1st version, but it seems to me it'll only work for even nums of duplicates eg if an item occurs 3 (5,7,9...) times, it'll be re-instated/preserved by your script?
Of course, the OP's example file only has duplicates in 2s, but the description doesn't state whether this is always the case.
I agree about using a hash, but I'd keep a count of all lines and test for cnt == 1 after looping through the input
Cheers
Chris
| [reply] [Watch: Dir/Any] |
|
perl -ne '$x{$_}++;}{for(sort keys %x){print if $x{$_}==1;}' < input.t
+xt
(I'm not a golfer by any strech of the imagination, so I imagine that could be shortened significantly)
Cheers,
Darren :) | [reply] [Watch: Dir/Any] [d/l] |
Re: Read file line by line and check equal lines
by hangon (Deacon) on Mar 07, 2007 at 05:22 UTC
|
Something similar to this should do it. The trick is not to update $lastline until you don't have a match. As a side note, in the past I have successfully loaded around 100K lines into an array. You may be surprised at what Perl can handle.
open (IN, "$input_file");
open (OUT, ">$output_file");
my $lastline = <IN>;
print OUT $lastline;
while(<IN>){
my $line = $_;
if ($line eq $lastline){
next;
}
print OUT $line;
$lastline = $line;
}
update: corrected typo | [reply] [Watch: Dir/Any] [d/l] |
Re: Read file line by line and check equal lines
by thezip (Vicar) on Mar 07, 2007 at 06:49 UTC
|
Update My apologies -- I completely missed the line about "no arrays or hashes" -- sorry for the noise
This way has always worked for me:
use strict;
use warnings;
use Data::Dumper;
my %hash;
open(IFH, "<", "data.txt");
while(<IFH>) {
chomp;
# keep a running count of occurrences for each line string
$hash{$_}++;
}
close IFH;
my @uniq = sort grep { $hash{$_} == 1} keys %hash;
print Dumper(\%hash);
print Dumper(\@uniq);
__OUTPUT__
$VAR1 = {
'j1j' => 1,
'i1i' => 1,
'b1b' => 1,
'a1a' => 2,
'f1f' => 1,
'e1e' => 1,
'h1h' => 2,
'c1c' => 2,
'g1g' => 2,
'd1d' => 2
};
$VAR1 = [
'b1b',
'e1e',
'f1f',
'i1i',
'j1j'
];
Where do you want *them* to go today?
| [reply] [Watch: Dir/Any] [d/l] |
Re: Read file line by line and check equal lines
by Moron (Curate) on Mar 06, 2007 at 13:00 UTC
|
(update: tested and corrected by now) perl -e '$_{ $_ }++ for (<>); print grep { $_{$_}==1 } keys %_;' <inpu
+t >output
| [reply] [Watch: Dir/Any] [d/l] |
Re: Read file line by line and check equal lines
by thezip (Vicar) on Mar 07, 2007 at 07:51 UTC
|
use strict;
use warnings;
my @arr = ();
open(IFH, "<", "data.txt");
my $cur = scalar <IFH>;
push @arr, $cur;
# @arr contains, at most, N identical lines
# .ie if "d1d" occurs five times in a row, then
# @arr will contain the 5 occurrences of "d1d"
# @arr is reset to one element as new strings
# are encountered
while($cur = <IFH>) {
if ($cur eq $arr[0]) {
push @arr, $cur;
}
else {
# if here, we have a new string, so check
# the size of @arr to see if current string is unique
print $arr[0] if scalar(@arr) == 1;
@arr = ($cur);
}
}
print $arr[0] if scalar(@arr) == 1;
close IFH;
__OUTPUT__
b1b
e1e
f1f
i1i
j1j
Where do you want *them* to go today?
| [reply] [Watch: Dir/Any] [d/l] |