icg has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have an input file with records like:
A 100 B 200 B 300 C 400 D 500 C 600 D 700 ..etc.
A,B,C,D,.. are tags and 100, 200, etc, are the data. Everytime the tag is encountered a unique sequence number needs to be generated. For e.g.
A 100 -> seqno = 1, B 200 -> seqno = 1, B 300 -> seqno = 2, C 400 -> seqno = 1, D 500 -> seqno = 1, C 600 -> seqno = 2, D 700 -> seqno = 2, etc.
Can anyone give a solution?

Thank you, Gowtham

Janitored by holli - fixed formatting

Replies are listed 'Best First'.
Re: Generating sequence nos. for data
by merlyn (Sage) on Jun 07, 2005 at 12:30 UTC
    If I understand you, you want to enumerate each occurrance of the first column of your data (the alphabetic letter). Maybe something like:
    my %sequence; while (<>) { my @line = split; my $sequence = ++$sequence{$line[0]}; print "@line $sequence\n"; }

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Thank you for the reply. If the tags are in continuous order generating sequence number would be easy. For e.g. A 100 A 200 A 300. Each time A is encountered sequence number can be incremented. However if a series of tags are repeating, for e.g. A 100 B 100 A 200 B 300, then for the first occurance of A and B, the sequence number should be 1 and for the next it should be incremented. In other words whenever a tag is repeated in the input file, the sequence number should be incremented. The repeatition of the tags is not guarenteed. They may occur. Thank you, Gowtham
        Yes, that's exactly what my code does, presuming "A 100" is on one line, and "B 100" is on the next. You aren't formatting it more clearly than that, so I'm not sure what you mean unless you say more.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: Generating sequence nos. for data
by salva (Canon) on Jun 07, 2005 at 12:33 UTC
    use strict; use warnings; my %seq; for (@data) { my ($tag) = split; no warnings 'uninitialized'; my $seq = ++$seq{$tag} ; print "$_ -> seqno = $seq\n"; }
Re: Generating sequence nos. for data
by mikeraz (Friar) on Jun 07, 2005 at 12:54 UTC

    I believe what you could use is a hash of arrays. Each hash key would be the tag (A, B, C, ...) the arrays would be the pushed values, with the array indicies being the sequence numbers you're after.

    If I'm understanding you correctly, this issue was recently discussed in node464229. They do a fine job describing how to do this.

    Be Appropriate && Follow Your Curiosity
Re: Generating sequence nos. for data
by graff (Chancellor) on Jun 08, 2005 at 04:20 UTC
    It looks like most of the earlier replies have missed your point that each line of input contains multiple "tag value" pairs. So you need a loop over each input line:
    my %sequences; # this will be a hash of arrays while (<DATA>) { @fields = split; for ( my $i=0; $i<@fields; $i+=2 ) { push @{$sequences{$fields[$i]}}, $fields[$i+1]; } } for my $tag ( sort keys %sequences ) { my $i = 0; for my $val ( @{$sequences{$tag}} ) { printf( "%s [ %d ] : %s\n", $tag, $i+1, $sequences{$tag}[$i++] + ); } } __DATA__ A 100 B 200 C 400 A 150 C 250 D 550 B 350 A 200 B 300 C 500 A 600 B 700 C 800 D 900
    Now, you didn't actually say how you want to handle the cases where the same tag occurs on multiple lines of input (like in the sample data provided here): should the index numbers reset to 1 for each line, or should they increment continuously over the entire input stream (as done here)?

    Some of the stuff you said in a sub-reply to merlyn seemed not to make sense:

    A $$ indicates the start of the record. The record is split on the tag. A hash is populated with key being the tag and value being the data. Some fields within a record repeat in a well defined manner.
    If I understand that, splitting the record into a hash would be completely wrong: it only lets you keep one value for each distinct tag on a line, no matter how many times that tag appears.

    I hope the snippet provided here makes sense -- note that it keeps all the values for each tag in an array that is stored as the value of the hash element for that tag. Then, we just use the array index (plus 1) to provide the sequence numbers that you want for each of the values.

Re: Generating sequence nos. for data
by TedPride (Priest) on Jun 07, 2005 at 17:55 UTC
    use strict; use warnings; my %c; while (<DATA>) { chomp; @_ = split / /, $_, 2; print "$_ -> seqno = " . ++$c{$_[0]} . "\n"; } __DATA__ A 100 B 200 B 300 C 400 D 500 C 600 D 700
    A reply falls below the community's threshold of quality. You may see it by logging in.