in reply to Tagging the last elements

Here is my "go" at this (below)....The problem statement is tough and requires some strange looking code.

The data is read into an AoA and also the "ordinal" number before the combination of the "name and code" is also read into a hash table, a HoA with ordinal numbers. "ordinal number" here is a misnomer and is an artifact of first approach to the code. I expect the OP to rename things.

As it turns out, only the combination of the name and code at the end of line is unique. flintoff has JJ113952 and peter also has JJ113952.

The ordinal numbers for each combination of name/code are sorted. I also showed how to sort the original DB with a "weird sort order", but I'm not sure this is necessary and of course this can be modified to get all kinds of orders.

To print the data, if the name/code como is unique, this is a singleton. To print "end" token, there are "gaps" in what I called the ordinal numbers, so I use the maximum one what was seen. This is the -1 index.

#!/usr/bin/perl -w use strict; use Data::Dumper; my @DB; my %hash; while (<DATA>) { my($n1,$n2,$letter,$ordinal,$name,$code) = (my @record = split(/[\s,]+/,$_)); push (@DB, [@record]); push (@{$hash{"$name$code"}},$ordinal); } foreach my $name_code (keys %hash) { @{$hash{$name_code}}= sort{$a<=>$b} @{$hash{$name_code}}; } @DB = sort by_weird_sort_order @DB; #optional sort foreach my $line (@DB) { my($n1,$n2,$letter,$ordinal,$name,$code) = @$line; $ordinal = "single" if @{$hash{"$name$code"}} == 1; $ordinal = "end" if ( @{$hash{"$name$code"}}>1 and (@{$hash{"$name$code"}})[-1] == $ordinal ); printf STDOUT ("%-7s %-7s %-3s %s,%s,%s\n", $n1,$n2,$letter,$ordinal,$name,$code); } sub by_weird_sort_order { my ($a_ordinal,$a_name) = (@$a)[3,4]; my ($b_ordinal,$b_name) = (@$b)[3,4]; $a_name cmp $b_name or $b_ordinal <=> $a_ordinal } #PRINTS: #62556 63635 y single,andrew,JJ113954 #126185 126699 s single,austin,JJ113956 #441474 441538 b end,catherine,JJ029490 #442666 442843 9 8,catherine,JJ029490 #445778 445905 0 7,catherine,JJ029490 #446059 446273 l 6,catherine,JJ029490 #450319 450379 f 5,catherine,JJ029490 #81099 81630 y single,flintoff,JJ113952 #68766 69005 j end,morgan,JJ113955 #63868 63897 h 1,morgan,JJ113955 #158146 158367 i end,peter,JJ113952 #135588 136297 8 2,peter,JJ113952 #135356 135449 3 1,peter,JJ113952 __DATA__ 62556 63635 y 1,andrew,JJ113954 63868 63897 h 1,morgan,JJ113955 68766 69005 j 2,morgan,JJ113955 81099 81630 y 1,flintoff,JJ113952 126185 126699 s 1,austin,JJ113956 135356 135449 3 1,peter,JJ113952 135588 136297 8 2,peter,JJ113952 158146 158367 i 3,peter,JJ113952 441474 441538 b 9,catherine,JJ029490 442666 442843 9 8,catherine,JJ029490 445778 445905 0 7,catherine,JJ029490 446059 446273 l 6,catherine,JJ029490 450319 450379 f 5,catherine,JJ029490

Replies are listed 'Best First'.
New Problem
by crochunter (Novice) on Jul 28, 2009 at 12:38 UTC
    Thanks a lot guys ;) After few more modifications it worked... Now I have another problem and below is my code and how I want to get the output.I want to give serial number based on the value1 and value2.
    Here is my code
    #!/usr/bin/perl my %query_score; while ( <DATA> ) { chomp; ($value1,$value2,$Mark,$Name,$Country) = split(/\t/,$_); push( @{ $query_score{"$Name:$Country"}{position} },$value2); $query_score{"$Name:$Country"}{Mark} = $Mark; $query_score{"$Name:$Country"}{Start} = $value1; } foreach $key ( sort keys %query_score ) { ($Name,$Country) = split(/:/,$key); @positions = sort @{ $query_score{$key}{position} }; $Mark = $query_score{$key}{Mark}; $value1 = $query_score{$key}{Start}; $min = shift(@positions); $max = pop(@positions); print("$value1\t$value2\t$Mark\t$Country\t$min\t$max\n"); } __DATA__ 532 1148 a andrew2 Norway 1547 1573 b mathew3 US 2013 2190 c mathew US 2096 2158 d mathew US 2896 2980 e docker5 UK 3919 4622 f king4 Aus 4180 4353 g king Aus 6621 6758 h lover4 Canada 7475 7568 i nun8 Mexico 7645 7725 j brazil9 Brazil 7817 8008 k brazil9 Brazil 8172 8309 l brazil9 Brazil 8399 8536 m brazil9 Brazil
    I am getting an OUTPUT like this:-
    3919 4622 f king4 Aus 8536 8399 8536 m Brazil 7725 8536 4180 8536 g Aus 4353 6621 8536 h Canada 6758 7475 8536 i Mexico 7568
    BUT I want my output to be like this:-
    Value1 Value2 Mark Name Country SerialNo 532 1148 a andrew2 Norway start 1547 1573 b mathew3 US start 2013 2190 c mathew US between 2096 2158 d mathew US end 2896 2980 e docker5 UK start 3919 4622 f king4 Aus start 4180 4353 g king Aus start 6621 6758 h lover4 Canada start 7475 7568 i nun8 Mexico start 7645 7725 j brazil9 Brazil start 7817 8008 k brazil9 Brazil between 8172 8309 l brazil9 Brazil between 8399 8536 m brazil9 Brazil end
    Thanks in advance
      crochunter:

      Since your example code is small enough, you might try using the debugger to step through it and see what part of the code is making the values disappear. Alternatively, you could use Data::Dumper (or equivalent) to print the data structure in various locations and see what matches your expectations and find where your expectations are violated. This won't be very painful, and it's very helpful in learning the language better.

      ...roboticus
      I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab).

      That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to

      split(" ",$_)
      (note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space."

      If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first.

      (updated to fix formatting)

        The default split is: split (/\s+/,$_); or split (' ',$_);.

        Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. This a bit different than the above split(" ",$_);. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.

        Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic.

        An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.

      First, you should be running with warnings and strict!
      #!/usr/bin/perl -w use strict;
      This provides HUGE clues as to what might be wrong!

      I think you'll find that the /\s+/ hint by graff is needed and also consider:

      $min = shift(@positions); $max = pop(@positions);
      What happens if min and max are the same? i.e. just one position?
      $max = (@positions)[-1]; $min = (@positions)[0];
      will handle that situation.

      Update: a small update, also keep in mind that list slice allows multiple values to the left hand side, my ($min,$max) = (@positions)[0,-1]; would work also. The -1 index means the last one in the array, -2 would be second to last etc. But FAR AND AWAY, the best thing you can do to improve your code is religious use of warnings and strict!