Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks
I have a file with lines that look like this:
1 ahjewgfje 1 gopjregre 2 kkkkkkk 3 figjiorger 3 rekopfroeer 3 ejfjviknced 4 erjgirjgerio 5 eieuiee 5 reopjtfrpeoi

Can you help me to keep only these lines in which the preceding number appears only once? In the example above, I would keep the lines starting with 1,3 and 5.

Replies are listed 'Best First'.
Re: Remove unique lines from file
by kennethk (Abbot) on Mar 20, 2015 at 22:24 UTC
Re: Remove unique lines from file
by Laurent_R (Canon) on Mar 21, 2015 at 09:33 UTC
    Can you help me to keep only these lines in which the preceding number appears only once? In the example above, I would keep the lines starting with 1,3 and 5.
    That's seems contradictory. If you keep the lines starting with 1,3 and 5, then you keep the lines where the preceding number appears more than once.

    Please explain better what you really want.

    Je suis Charlie.

      They seem to mean the lines where the previous line number only occurred one time. There was only one line with a 2 or a 4 at the front. That still doesn't explain why to keep lines starting with 1.

      Considering the title of the thread, I think he wants to keep lines whose leading number is unique. The confusion between "keep" and "remove" is probably one of viewpoint.
      Bill

        The difference between "keep" and "remove" is as large a gap as exists! Perhaps more of a problem of vocabulary than viewpoint?

        Dum Spiro Spero
Re: Remove unique lines from file
by LanX (Saint) on Mar 20, 2015 at 22:25 UTC
Re: Remove unique lines from file
by AppleFritter (Vicar) on Mar 20, 2015 at 23:22 UTC

    Here's a quick and dirty solution:

    #!/usr/bin/perl use strict; use warnings; my @lines = (); my %numbers = (); while(<DATA>) { my ($number) = m/^(\d+)/; push @lines, [$number, $_]; $numbers{$number}++; } foreach (@lines) { print $_->[1] if $numbers{$_->[0]} > 1; } __DATA__ 1 ahjewgfje 1 gopjregre 2 kkkkkkk 3 figjiorger 3 rekopfroeer 3 ejfjviknced 4 erjgirjgerio 5 eieuiee 5 reopjtfrpeoi

    What this does is iterate through the data (from the special DATA filehandle), extract the number at the beginning of each line using a regular expression (in list context, so it returns the captured values), and populate an array of arrays where each element of the first array is an anonymous two-element array containing the extracted number and the entire line. It also keeps a running total of how often each number was seen.

    Once all that's done it goes through the array of arrays; for each element ($_, representing a line), it checks whether the extracted number ($_->[0], the first element of the anonymous array currently being looked at) has a running total of more than one, and if so, prints the line in question ($_->[1], the second element of the anonymous array).

    One downside is that this'll slurp the entire file into memory before printing anything, which may be a problem if your files are very large.

    It'll also work no matter whether lines with the same number are separated by lines with different numbers or whether they're not (as in your sample data). Whether this is a feature or a bug only you can say.