Re: Tagging the last elements

Here is my "go" at this (below)....The problem statement is tough and requires some strange looking code.

The data is read into an AoA and also the "ordinal" number before the combination of the "name and code" is also read into a hash table, a HoA with ordinal numbers. "ordinal number" here is a misnomer and is an artifact of first approach to the code. I expect the OP to rename things.

As it turns out, only the combination of the name and code at the end of line is unique. flintoff has JJ113952 and peter also has JJ113952.

The ordinal numbers for each combination of name/code are sorted. I also showed how to sort the original DB with a "weird sort order", but I'm not sure this is necessary and of course this can be modified to get all kinds of orders.

To print the data, if the name/code como is unique, this is a singleton. To print "end" token, there are "gaps" in what I called the ordinal numbers, so I use the maximum one what was seen. This is the -1 index.

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my @DB;
my %hash;

while (<DATA>)
{
  my($n1,$n2,$letter,$ordinal,$name,$code) = 
              (my @record = split(/[\s,]+/,$_));
  push (@DB, [@record]);
  push (@{$hash{"$name$code"}},$ordinal);
}

foreach my $name_code (keys %hash)
{
  @{$hash{$name_code}}= sort{$a<=>$b} @{$hash{$name_code}};
}

@DB = sort by_weird_sort_order @DB; #optional sort

foreach my $line (@DB)
{
   my($n1,$n2,$letter,$ordinal,$name,$code) = @$line;
   $ordinal = "single" if @{$hash{"$name$code"}} == 1;
   $ordinal = "end" 
                 if (     @{$hash{"$name$code"}}>1 
                   and (@{$hash{"$name$code"}})[-1] == $ordinal
                    );
   printf STDOUT ("%-7s %-7s %-3s %s,%s,%s\n",
                 $n1,$n2,$letter,$ordinal,$name,$code);
}

sub by_weird_sort_order
{
   my ($a_ordinal,$a_name) = (@$a)[3,4];
   my ($b_ordinal,$b_name) = (@$b)[3,4];
        
         $a_name  cmp $b_name
                  or
        $b_ordinal <=> $a_ordinal                  
}

#PRINTS:
#62556   63635   y   single,andrew,JJ113954
#126185  126699  s   single,austin,JJ113956
#441474  441538  b   end,catherine,JJ029490
#442666  442843  9   8,catherine,JJ029490
#445778  445905  0   7,catherine,JJ029490
#446059  446273  l   6,catherine,JJ029490
#450319  450379  f   5,catherine,JJ029490
#81099   81630   y   single,flintoff,JJ113952
#68766   69005   j   end,morgan,JJ113955
#63868   63897   h   1,morgan,JJ113955
#158146  158367  i   end,peter,JJ113952
#135588  136297  8   2,peter,JJ113952
#135356  135449  3   1,peter,JJ113952

__DATA__
62556    63635    y    1,andrew,JJ113954
63868    63897    h    1,morgan,JJ113955
68766    69005    j    2,morgan,JJ113955
81099    81630    y    1,flintoff,JJ113952
126185    126699    s    1,austin,JJ113956
135356    135449    3    1,peter,JJ113952
135588    136297    8    2,peter,JJ113952
158146    158367    i    3,peter,JJ113952
441474    441538    b    9,catherine,JJ029490
442666    442843    9    8,catherine,JJ029490
445778    445905    0    7,catherine,JJ029490
446059    446273    l    6,catherine,JJ029490
450319    450379    f    5,catherine,JJ029490
[download]

Comment on Re: Tagging the last elements Download Code

Replies are listed 'Best First'.
New Problem by crochunter (Novice) on Jul 28, 2009 at 12:38 UTC
Thanks a lot guys ;) After few more modifications it worked... Now I have another problem and below is my code and how I want to get the output.I want to give serial number based on the value1 and value2. Here is my code #!/usr/bin/perl my %query_score; while ( <DATA> ) { chomp; ($value1,$value2,$Mark,$Name,$Country) = split(/\t/,$_); push( @{ $query_score{"$Name:$Country"}{position} },$value2); $query_score{"$Name:$Country"}{Mark} = $Mark; $query_score{"$Name:$Country"}{Start} = $value1; } foreach $key ( sort keys %query_score ) { ($Name,$Country) = split(/:/,$key); @positions = sort @{ $query_score{$key}{position} }; $Mark = $query_score{$key}{Mark}; $value1 = $query_score{$key}{Start}; $min = shift(@positions); $max = pop(@positions); print("$value1\t$value2\t$Mark\t$Country\t$min\t$max\n"); } __DATA__ 532 1148 a andrew2 Norway 1547 1573 b mathew3 US 2013 2190 c mathew US 2096 2158 d mathew US 2896 2980 e docker5 UK 3919 4622 f king4 Aus 4180 4353 g king Aus 6621 6758 h lover4 Canada 7475 7568 i nun8 Mexico 7645 7725 j brazil9 Brazil 7817 8008 k brazil9 Brazil 8172 8309 l brazil9 Brazil 8399 8536 m brazil9 Brazil [download] I am getting an OUTPUT like this:- `3919 4622 f king4 Aus 8536 8399 8536 m Brazil 7725 8536 4180 8536 g Aus 4353 6621 8536 h Canada 6758 7475 8536 i Mexico 7568` [download] BUT I want my output to be like this:- `Value1 Value2 Mark Name Country SerialNo 532 1148 a andrew2 Norway start 1547 1573 b mathew3 US start 2013 2190 c mathew US between 2096 2158 d mathew US end 2896 2980 e docker5 UK start 3919 4622 f king4 Aus start 4180 4353 g king Aus start 6621 6758 h lover4 Canada start 7475 7568 i nun8 Mexico start 7645 7725 j brazil9 Brazil start 7817 8008 k brazil9 Brazil between 8172 8309 l brazil9 Brazil between 8399 8536 m brazil9 Brazil end` [download] Thanks in advance	[reply] [d/l] [select]
Re: New Problem by roboticus (Chancellor) on Jul 28, 2009 at 12:55 UTC
crochunter: Since your example code is small enough, you might try using the debugger to step through it and see what part of the code is making the values disappear. Alternatively, you could use Data::Dumper (or equivalent) to print the data structure in various locations and see what matches your expectations and find where your expectations are violated. This won't be very painful, and it's very helpful in learning the language better. ...roboticus	[reply]
Re: New Problem by graff (Chancellor) on Jul 28, 2009 at 13:08 UTC
I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab). That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to `split(" ",$_)` [download] (note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space." If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first. (updated to fix formatting)	[reply] [d/l]
Re^2: New Problem by Marshall (Canon) on Jul 30, 2009 at 04:32 UTC
The default split is: `split (/\s+/,$_); or split (' ',$_);`. Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. ~~This a bit different than the above `split(" ",$_);`. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.~~ Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic. An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.	[reply] [d/l] [select]
Re^3: New Problem by graff (Chancellor) on Jul 30, 2009 at 04:47 UTC
Re: New Problem by Marshall (Canon) on Jul 30, 2009 at 04:50 UTC
First, you should be running with warnings and strict! `#!/usr/bin/perl -w use strict;` [download] This provides HUGE clues as to what might be wrong! I think you'll find that the /\s+/ hint by graff is needed and also consider: `$min = shift(@positions); $max = pop(@positions);` [download] What happens if min and max are the same? i.e. just one position? `$max = (@positions)[-1]; $min = (@positions)[0];` [download] will handle that situation. Update: a small update, also keep in mind that list slice allows multiple values to the left hand side, `my ($min,$max) = (@positions)[0,-1];` would work also. The -1 index means the last one in the array, -2 would be second to last etc. But FAR AND AWAY, the best thing you can do to improve your code is religious use of warnings and strict!	[reply] [d/l] [select]