Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have a text file where some words/events occur more than once in the same line. I have to number them in the order of appeareance. I was checking previous user's question about counting occurrences but at the end what it generates is a total of occurrences for each word, and what I need is, having this line:

doc123/print doc456/read doc789/print doc145/print doc123/read

The output will be:
doc123/print1 doc456/read1 doc789/print2 doc145/print3 doc123/read2

So I need the counting, and the renaming of each event(same word) adding a number in the order that event occurs.
I created a hash table where the key is the event and the "count" is the value, but before update the count I concatenated the value to the event in a another variable, and join each result, for each line the count start in zero (0). But it doesn't work!!! >:(

it is ok that I used hash tables or is there another way, maybe with just regular expressions?

Thanks!!
  • Comment on Count occurrences and rename words in order

Replies are listed 'Best First'.
Re: Count occurrences and rename words in order
by davorg (Chancellor) on Sep 18, 2002 at 15:38 UTC
    #!/usr/bin/perl -w use strict; $_ = 'doc123/print doc456/read doc789/print doc145/print doc123/read'; my %counts; s|(/\w+)|$1 . ++$counts{$1}|ge; print;

    Seems to give the right output for the sample you gave.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      If the data is in a file (say logfile) a one-liner could be used.
      $ perl -pe 's|(/\w+)|$1 . ++$counts{$1}|ge' logfile>countedlogfile $ cat logfile doc123/print doc456/read doc789/print doc145/print doc123/read $ cat countedlogfile doc123/print1 doc456/read1 doc789/print2 doc145/print3 doc123/read2

      --

      flounder

        Yeah, that works. It's unclear what the original poster wanted to do for multiple lines. I assumed that the counts should be cleared. In which case you'd do something like this:

        $ perl -pe '%counts=();s|(/\w+)|$1 . ++$counts{$1}|ge' logfile>counted +logfile
        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

      Thanks Monk
      It works for the first line of the text file, but the events on the rest of the file continue the numeration from the first line, and the event from each line should be independent.
      But I will it use to count the events in a whole document!!:)

        You just need to reset %count to be empty before processing each line.

        while (<INPUT>) { my %counts; s|(/\w+)|$1 . ++$counts{$1}|ge; print OUTPUT; }
        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Count occurrences and rename words in order
by BrowserUk (Patriarch) on Sep 18, 2002 at 15:39 UTC

    Something like this?

    #! perl -sw my $data ='doc123/print doc456/read doc789/print doc145/print doc123/r +ead '; my @events = qw(print read); for (@events) { my $count = 1; $data =~ s/($_)/$1.$count++/ge; } print $data; __END__ # Output C:\test>198864 doc123/print1 doc456/read1 doc789/print2 doc145/print3 doc123/read2 C:\test>

    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
      It was something like that but I have more than those two events, but it works!!

      Thank you very much
Re: Count occurrences and rename words in order
by Molt (Chaplain) on Sep 18, 2002 at 15:40 UTC

    Not quite sure what this is needed for, but the following seems to meet your requirements well enough.

    #!/usr/bin/perl use warnings; use strict; my $test = 'doc123/print doc456/read doc789/print doc145/print doc123/ +read'; my %count; print join '/', grep {s/^([a-z]+)\s/"$1".(++$count{$1}).' '/ie || $_} split '/', $test;
      Thanks for your help, It works!! Just one question why do you need to check the beginning of the string?
      I need this for reschedule the events according with document (DOC) parameters.

        After the split on '/' each part is tested individually, since it looks like the command bit has to start at the beginning of the segment- and because I like to make my regexps as specific and safe as possible- I added the ^ anchor there.