ezekiel has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file where each line starts with a string such as in this example:

A1234 ... other text here ... A1234 ... other text here ... B1234 ... other text here ... C2133 ... other text here ... C2133 ... other text here ... C2133 ... other text here ...

Now what I want to do is count how many unique strings appear at the start of the line in the file. In the snippet above there are three: A1234, B1234, C2133.

I can do this by writing a script to loop over the rows, extract the string in question with a regular expression, keeping track of those I've seen as keys of a hash, and adding one to the count if I haven't seen it before.

However, this feels like something I should be able to be do as a one liner, but I can't think of how to do it?

Any suggestions?

Replies are listed 'Best First'.
Re: Counting lines starting with a unique string
by Aristotle (Chancellor) on Jun 18, 2002 at 05:04 UTC
    Well, you can. perl -ne '/^(.\d{4})/ && $seen{$1}++; END { print "$k seen $v times\n" while ($k,$v) = each %seen }' ____________
    Makeshifts last the longest.
Re: Counting lines starting with a unique string
by jmcnamara (Monsignor) on Jun 18, 2002 at 08:03 UTC

    Here are some crude one-liners:
    perl -ne 'END{print"$v\t$k\n"while($k,$v)=each%h} $h{substr$_,0,5} +++' file perl -ne 'END{printf"%7d %s\n",$h{$_},$_for keys%h} $h{substr$_,0, +5}++' file perl -pe '$h{substr$_,0,5}++ }{ print"$h{$_}\t$_\n"for+keys%h' fil +e

    If you have GNU uniq then you can do the following:     sort file | uniq -w5 -c

    However, the -w option isn't part of classical uniq.

    --
    John.

Re: Counting lines starting with a unique string
by thunders (Priest) on Jun 18, 2002 at 05:47 UTC
    well it's more than one line but should print the number of unique lines(untested)
    open(FILE,"whatever.log"); my %seen = (); while(my $line = <FILE>){ my ($first) = ($line =~ /^(/S+)/)[0]; $seen{$first}++; } print scalar keys %seen;
A reply falls below the community's threshold of quality. You may see it by logging in.