New Problem

Thanks a lot guys ;) After few more modifications it worked... Now I have another problem and below is my code and how I want to get the output.I want to give serial number based on the value1 and value2.
Here is my code

#!/usr/bin/perl

my %query_score;
while ( <DATA> )

{
    chomp;
    ($value1,$value2,$Mark,$Name,$Country) = split(/\t/,$_);
    push( @{ $query_score{"$Name:$Country"}{position} },$value2);
    $query_score{"$Name:$Country"}{Mark} = $Mark;
    $query_score{"$Name:$Country"}{Start} = $value1;
   
}
foreach $key ( sort keys %query_score )
{
    ($Name,$Country) = split(/:/,$key);
    @positions =  sort @{ $query_score{$key}{position} };
    $Mark = $query_score{$key}{Mark};
    $value1 = $query_score{$key}{Start};
    $min = shift(@positions);
    $max = pop(@positions);
    print("$value1\t$value2\t$Mark\t$Country\t$min\t$max\n");
   
}


__DATA__
532     1148    a       andrew2  Norway
1547    1573    b       mathew3  US
2013    2190    c       mathew  US
2096    2158    d       mathew  US
2896    2980    e       docker5   UK
3919    4622    f       king4  Aus
4180    4353    g    king    Aus
6621    6758    h    lover4    Canada
7475    7568    i    nun8    Mexico
7645    7725    j    brazil9    Brazil
7817    8008    k    brazil9    Brazil
8172    8309    l    brazil9    Brazil
8399    8536    m    brazil9    Brazil
[download]

I am getting an OUTPUT like this:-

3919    4622    f       king4  Aus    8536                
8399    8536    m    Brazil    7725    8536
4180    8536    g    Aus    4353    
6621    8536    h    Canada    6758    
7475    8536    i    Mexico    7568
[download]

BUT I want my output to be like this:-

Value1  Value2  Mark   Name    Country    SerialNo
532     1148    a       andrew2  Norway   start
1547    1573    b       mathew3  US       start
2013    2190    c       mathew  US        between
2096    2158    d       mathew  US        end
2896    2980    e       docker5   UK      start
3919    4622    f       king4  Aus        start
4180    4353    g    king    Aus       start
6621    6758    h    lover4    Canada    start
7475    7568    i    nun8    Mexico    start
7645    7725    j    brazil9    Brazil    start
7817    8008    k    brazil9    Brazil    between
8172    8309    l    brazil9    Brazil    between
8399    8536    m    brazil9    Brazil    end
[download]

Thanks in advance

Comment on New Problem Select or Download Code

Replies are listed 'Best First'.
Re: New Problem by roboticus (Chancellor) on Jul 28, 2009 at 12:55 UTC
crochunter: Since your example code is small enough, you might try using the debugger to step through it and see what part of the code is making the values disappear. Alternatively, you could use Data::Dumper (or equivalent) to print the data structure in various locations and see what matches your expectations and find where your expectations are violated. This won't be very painful, and it's very helpful in learning the language better. ...roboticus	[reply]
Re: New Problem by graff (Chancellor) on Jul 28, 2009 at 13:08 UTC
I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab). That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to `split(" ",$_)` [download] (note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space." If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first. (updated to fix formatting)	[reply] [d/l]
Re^2: New Problem by Marshall (Canon) on Jul 30, 2009 at 04:32 UTC
The default split is: `split (/\s+/,$_); or split (' ',$_);`. Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. ~~This a bit different than the above `split(" ",$_);`. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.~~ Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic. An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.	[reply] [d/l] [select]
Re^3: New Problem by graff (Chancellor) on Jul 30, 2009 at 04:47 UTC
From the "perlfunc" manual description of split: ... If PATTERN is ... omitted, splits on whitespace (after skipping any leading whitespace)... {3rd paragraph} ... As a special case, specifying a PATTERN of space (’ ’) will split on white space just as "split" with no arguments does. Thus, "split(’ ’)" can be used to emulate awk’s default behavior, whereas "split(/ /)" will give you as many null initial fields as there are leading spaces. A "split" on "/\s+/" is like a "split(’ ’)" except that any leading whitespace produces a null first field. A "split" with no arguments really does a "split(’ ’, $_)" internally. {about 7 paragraphs further down}	[reply]
Re: New Problem by Marshall (Canon) on Jul 30, 2009 at 04:50 UTC
First, you should be running with warnings and strict! `#!/usr/bin/perl -w use strict;` [download] This provides HUGE clues as to what might be wrong! I think you'll find that the /\s+/ hint by graff is needed and also consider: `$min = shift(@positions); $max = pop(@positions);` [download] What happens if min and max are the same? i.e. just one position? `$max = (@positions)[-1]; $min = (@positions)[0];` [download] will handle that situation. Update: a small update, also keep in mind that list slice allows multiple values to the left hand side, `my ($min,$max) = (@positions)[0,-1];` would work also. The -1 index means the last one in the array, -2 would be second to last etc. But FAR AND AWAY, the best thing you can do to improve your code is religious use of warnings and strict!	[reply] [d/l] [select]