in reply to Re: Tagging the last elements
in thread Tagging the last elements

Thanks a lot guys ;) After few more modifications it worked... Now I have another problem and below is my code and how I want to get the output.I want to give serial number based on the value1 and value2.
Here is my code
#!/usr/bin/perl my %query_score; while ( <DATA> ) { chomp; ($value1,$value2,$Mark,$Name,$Country) = split(/\t/,$_); push( @{ $query_score{"$Name:$Country"}{position} },$value2); $query_score{"$Name:$Country"}{Mark} = $Mark; $query_score{"$Name:$Country"}{Start} = $value1; } foreach $key ( sort keys %query_score ) { ($Name,$Country) = split(/:/,$key); @positions = sort @{ $query_score{$key}{position} }; $Mark = $query_score{$key}{Mark}; $value1 = $query_score{$key}{Start}; $min = shift(@positions); $max = pop(@positions); print("$value1\t$value2\t$Mark\t$Country\t$min\t$max\n"); } __DATA__ 532 1148 a andrew2 Norway 1547 1573 b mathew3 US 2013 2190 c mathew US 2096 2158 d mathew US 2896 2980 e docker5 UK 3919 4622 f king4 Aus 4180 4353 g king Aus 6621 6758 h lover4 Canada 7475 7568 i nun8 Mexico 7645 7725 j brazil9 Brazil 7817 8008 k brazil9 Brazil 8172 8309 l brazil9 Brazil 8399 8536 m brazil9 Brazil
I am getting an OUTPUT like this:-
3919 4622 f king4 Aus 8536 8399 8536 m Brazil 7725 8536 4180 8536 g Aus 4353 6621 8536 h Canada 6758 7475 8536 i Mexico 7568
BUT I want my output to be like this:-
Value1 Value2 Mark Name Country SerialNo 532 1148 a andrew2 Norway start 1547 1573 b mathew3 US start 2013 2190 c mathew US between 2096 2158 d mathew US end 2896 2980 e docker5 UK start 3919 4622 f king4 Aus start 4180 4353 g king Aus start 6621 6758 h lover4 Canada start 7475 7568 i nun8 Mexico start 7645 7725 j brazil9 Brazil start 7817 8008 k brazil9 Brazil between 8172 8309 l brazil9 Brazil between 8399 8536 m brazil9 Brazil end
Thanks in advance

Replies are listed 'Best First'.
Re: New Problem
by roboticus (Chancellor) on Jul 28, 2009 at 12:55 UTC
    crochunter:

    Since your example code is small enough, you might try using the debugger to step through it and see what part of the code is making the values disappear. Alternatively, you could use Data::Dumper (or equivalent) to print the data structure in various locations and see what matches your expectations and find where your expectations are violated. This won't be very painful, and it's very helpful in learning the language better.

    ...roboticus
Re: New Problem
by graff (Chancellor) on Jul 28, 2009 at 13:08 UTC
    I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab).

    That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to

    split(" ",$_)
    (note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space."

    If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first.

    (updated to fix formatting)

      The default split is: split (/\s+/,$_); or split (' ',$_);.

      Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. This a bit different than the above split(" ",$_);. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.

      Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic.

      An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.

        From the "perlfunc" manual description of split:

        ... If PATTERN is ... omitted, splits on whitespace (after skipping any leading whitespace)... {3rd paragraph}

        ...

        As a special case, specifying a PATTERN of space (’ ’) will split on white space just as "split" with no arguments does. Thus, "split(’ ’)" can be used to emulate awk’s default behavior, whereas "split(/ /)" will give you as many null initial fields as there are leading spaces. A "split" on "/\s+/" is like a "split(’ ’)" except that any leading whitespace produces a null first field. A "split" with no arguments really does a "split(’ ’, $_)" internally. {about 7 paragraphs further down}

Re: New Problem
by Marshall (Canon) on Jul 30, 2009 at 04:50 UTC
    First, you should be running with warnings and strict!
    #!/usr/bin/perl -w use strict;
    This provides HUGE clues as to what might be wrong!

    I think you'll find that the /\s+/ hint by graff is needed and also consider:

    $min = shift(@positions); $max = pop(@positions);
    What happens if min and max are the same? i.e. just one position?
    $max = (@positions)[-1]; $min = (@positions)[0];
    will handle that situation.

    Update: a small update, also keep in mind that list slice allows multiple values to the left hand side, my ($min,$max) = (@positions)[0,-1]; would work also. The -1 index means the last one in the array, -2 would be second to last etc. But FAR AND AWAY, the best thing you can do to improve your code is religious use of warnings and strict!