Generating sequence nos. for data

icg has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Generating sequence nos. for data by merlyn (Sage) on Jun 07, 2005 at 12:30 UTC
If I understand you, you want to enumerate each occurrance of the first column of your data (the alphabetic letter). Maybe something like: `my %sequence; while (<>) { my @line = split; my $sequence = ++$sequence{$line[0]}; print "@line $sequence\n"; }` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re^2: Generating sequence nos. for data by icg (Acolyte) on Jun 07, 2005 at 12:58 UTC
Thank you for the reply. If the tags are in continuous order generating sequence number would be easy. For e.g. A 100 A 200 A 300. Each time A is encountered sequence number can be incremented. However if a series of tags are repeating, for e.g. A 100 B 100 A 200 B 300, then for the first occurance of A and B, the sequence number should be 1 and for the next it should be incremented. In other words whenever a tag is repeated in the input file, the sequence number should be incremented. The repeatition of the tags is not guarenteed. They may occur. Thank you, Gowtham	[reply]
Re^3: Generating sequence nos. for data by merlyn (Sage) on Jun 07, 2005 at 13:03 UTC
Yes, that's exactly what my code does, presuming "A 100" is on one line, and "B 100" is on the next. You aren't formatting it more clearly than that, so I'm not sure what you mean unless you say more. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^4: Generating sequence nos. for data by icg (Acolyte) on Jun 07, 2005 at 13:22 UTC
Re: Generating sequence nos. for data by salva (Canon) on Jun 07, 2005 at 12:33 UTC
`use strict; use warnings; my %seq; for (@data) { my ($tag) = split; no warnings 'uninitialized'; my $seq = ++$seq{$tag} ; print "$_ -> seqno = $seq\n"; }` [download]	[reply] [d/l]
Re^2: Generating sequence nos. for data by merlyn (Sage) on Jun 07, 2005 at 13:05 UTC
no warnings 'uninitialized'; What's that doing in there? Incrementing from undef to 1 is never a warnable offense. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: Generating sequence nos. for data by mikeraz (Friar) on Jun 07, 2005 at 12:54 UTC
I believe what you could use is a hash of arrays. Each hash key would be the tag (A, B, C, ...) the arrays would be the pushed values, with the array indicies being the sequence numbers you're after. If I'm understanding you correctly, this issue was recently discussed in node464229. They do a fine job describing how to do this. Be Appropriate && Follow Your Curiosity	[reply]
Re: Generating sequence nos. for data by graff (Chancellor) on Jun 08, 2005 at 04:20 UTC
It looks like most of the earlier replies have missed your point that each line of input contains multiple "tag value" pairs. So you need a loop over each input line: `my %sequences; # this will be a hash of arrays while (<DATA>) { @fields = split; for ( my $i=0; $i<@fields; $i+=2 ) { push @{$sequences{$fields[$i]}}, $fields[$i+1]; } } for my $tag ( sort keys %sequences ) { my $i = 0; for my $val ( @{$sequences{$tag}} ) { printf( "%s [ %d ] : %s\n", $tag, $i+1, $sequences{$tag}[$i++] + ); } } __DATA__ A 100 B 200 C 400 A 150 C 250 D 550 B 350 A 200 B 300 C 500 A 600 B 700 C 800 D 900` [download] Now, you didn't actually say how you want to handle the cases where the same tag occurs on multiple lines of input (like in the sample data provided here): should the index numbers reset to 1 for each line, or should they increment continuously over the entire input stream (as done here)? Some of the stuff you said in a sub-reply to merlyn seemed not to make sense: A $$ indicates the start of the record. The record is split on the tag. A hash is populated with key being the tag and value being the data. Some fields within a record repeat in a well defined manner. If I understand that, splitting the record into a hash would be completely wrong: it only lets you keep one value for each distinct tag on a line, no matter how many times that tag appears. I hope the snippet provided here makes sense -- note that it keeps all the values for each tag in an array that is stored as the value of the hash element for that tag. Then, we just use the array index (plus 1) to provide the sequence numbers that you want for each of the values.	[reply] [d/l]
Re: Generating sequence nos. for data by TedPride (Priest) on Jun 07, 2005 at 17:55 UTC
`use strict; use warnings; my %c; while (<DATA>) { chomp; @_ = split / /, $_, 2; print "$_ -> seqno = " . ++$c{$_[0]} . "\n"; } __DATA__ A 100 B 200 B 300 C 400 D 500 C 600 D 700` [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.