in reply to Re: how to change this code into perl
in thread how to change this code into perl

I am afraid this:
perl -anle"next unless @L; print if $L[0] eq $F[0]; @L = @F;" in.txt > + out.txt
will never enter into the loop because the next statement at the beginning will prevent @L from ever being set.

Perhaps this instead:

perl -anwle 'BEGIN{$L = "";} print if $F[0] eq $L; $L = $F[0];' in.tx +t > out.txt
Although the BEGIN block isn't really necessary if the warnings are not activated:
perl -anle 'print if $F[0] eq $L; $L = $F[0];' in.txt > out.txt

Replies are listed 'Best First'.
Re^3: how to change this code into perl
by BrowserUk (Patriarch) on Aug 30, 2015 at 08:14 UTC

      thank you BrowserUK, I am getting this error.

      syntax error at -e line 2, at EOF Execution of -e aborted due to compilation errors.

      I am pretty new to perl, what does the error mean. I googled but its not very clear

Re^3: how to change this code into perl
by perlnewbie012215 (Novice) on Aug 30, 2015 at 16:34 UTC

    Thank you Laurent_R!!! the one liner is not printing all the lines, say I have three duplicates its only printing the last two or one duplicate, not all of them.

    1 twenty 2 thirty 1 forty 1 fifty
    output 1 twenty 1 forty 1 fifty

    is there a way to script it instead of a oneliner. Thank you guys

      OK, a real script that should detect all lines having duplicate keys (quick script, untested, no time now, but based on something I am doing quite often, so, hopefully, I've it right).
      my ($previous_key, $previous_line); open my $IN, "<", $infile or die "cannot open $infile $!"; while (<$IN>) { my $key = $1 if /^(\w+)/; if ($key eq $previous_key) { print $previous_line if defined $previous_line; print $_; undef $previous_line; } else { $previous_line = $_; } $previous_key = $key; }
      Sure, where there are two entries with the same key, it only prints the second one (the duplicate, not the original one); when there are three, it will print only the second one and the third one. And of course, it will work only if the lines are properly sorted.

      If you need to print all the lines that are duplicates, then it is slightly more complicated, because you need to keep track of recent history. And then, yes, it is probably better to write a real script.

      Another way is to use a hash to keep track of everything in memory.

      the file will be around 20000 rows and the first columns will always be text..

        #!perl use strict; use warnings; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open IN,'<',$infile or die "Could not open $infile : $!"; my %count = (); my @lines = (); while (<IN>){ push @lines,$_; if (/^(\S+)/){ ++$count{$1}; } } close IN; open OUT,'>',$outfile or die "Could not open $outfile : $!"; for (@lines){ if (/^(\S+)/){ print OUT $_ if $count{$1} > 1; } } close OUT;
        poj

      Hi poj, thank you for the quick response, I tried the script and could not get the duplicate rows, the outcome came up with zero rows. below is the script i tried

      open IN,'<','/home/scripts/imageoutcome.txt' or die "Could not open $i +nfile : $!"; my %count = (); my @lines = (); while (<IN>){ push @lines,$_; # print $_; if (/^(\S+)/){ ++$count{$1}; } } close IN; open OUT,'>','/home/scripts/outcome.txt' or die "Could not open $outfi +le : $!"; #print @lines; for (@lines){ if (/^(\S+)/){ print $count{$1}; print OUT $_ if $count{$1} > 0; } } close OUT;

        Did you try it with the sample you provided ?

        1 twenty 2 thirty 1 forty 1 fifty

        Update : Does your file have spaces at the beginning of the lines ?

        poj

      How big are the files and is the first column always numeric ?

      poj

      Thank you very much Laurent_R, I tried the script and its printing all the rows, instead of duplicates. Laurent_R, this code looks very interesting, can you please explain it

      #!/usr/bin/perl my ($previous_key, $previous_line); open my $IN, "<", '/home/scripts/imageoutcome.txt' or die "cannot open + $infile $!"; while (<$IN>) { my $key = $1 if /^(\w+)/; if ($key eq $previous_key) { print $previous_line if defined $previous_line; print $_; undef $previous_line; } else { $previous_line = $_; } $previous_key = $key; }
        I tried the script and its printing all the rows
        Then you have to show me your input data. I've just tried that script with the following input data:
        aa blah bb blah bb blahblah bb foo cc dlqskjf cc cfkqs dd dkls ee dsjkqjs ff blah gg klsqdj gg sqkl
        and it print only the lines where the first column is a duplicate, as shown in this output:
        bb blah bb blahblah bb foo cc dlqskjf cc cfkqs gg klsqdj gg sqkl
        This seems to work perfectly.

        Otherwise, the way it works is that it reads the file one line at a time, and store this line ($previous_line), as well as the comparison key until the next line is read. If they have the same key, then I print the previous line (if defined) and the current one; in such case, I undef the previous line to prevent it from being printed twice if there are triplicates.

        If it does not work properly for you, please show your input and/or test data.

      Hi Laurent_R, That was my bad, I had hidden characters in it, thats why I did not work. Your script is working...thank you so much for helping me and explaining it..

      Hi poj, you are correct, I forgot chomp, its working now. thank you so much for helping me.