crochunter has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I was trying tag the record with maximum value. Here is my code so far:-
#!/usr/bin/perl use List::Util qw(min max); open(FH, $ARGV[0]) || die("Cannot open:$!"); while(<FH>) { if($_ =~ /\A(\S+)\t(\S+)\t(\S+)\t(\S+)\,(\S+)\,(\S+)/xmg) { #print "$1\t$2\t$3\t$4,$5,$6\n"; if(($5 == $5) && ($6 == $6)) { push @list, $4; $min = min @list; $max = max @list; if($4 == $max) { #print "$1\t$2\t$3\t$4,$5,$6\n"; print "$max\n"; } } } }
Here is the input file
62556 63635 y 1,andrew,JJ113954 63868 63897 h 1,morgan,JJ113955 68766 69005 j 2,morgan,JJ113955 81099 81630 y 1,flintoff,JJ113952 126185 126699 s 1,austin,JJ113956 135356 135449 3 1,peter,JJ113952 135588 136297 8 2,peter,JJ113952 158146 158367 i 3,peter,JJ113952 441474 441538 b 9,catherine,JJ029490 442666 442843 9 8,catherine,JJ029490 445778 445905 0 7,catherine,JJ029490 446059 446273 l 6,catherine,JJ029490 450319 450379 f 5,catherine,JJ029490
Here is the type of output I am trying to achieve :-
62556 63635 y single,andrew,JJ113954 63868 63897 h 1,morgan,JJ113955 68766 69005 j end,morgan,JJ113955 81099 81630 y single,flintoff,JJ113952 126185 126699 s single,austin,JJ113956 135356 135449 3 1,peter,JJ113952 135588 136297 8 2,peter,JJ113952 158146 158367 i end,peter,JJ113952 441474 441538 b end,catherine,JJ029490 442666 442843 9 8,catherine,JJ029490 445778 445905 0 7,catherine,JJ029490 446059 446273 l 6,catherine,JJ029490 450319 450379 f 5,catherine,JJ029490
Thanks in advance

Replies are listed 'Best First'.
Re: Tagging the last elements
by jbt (Chaplain) on Jul 28, 2009 at 04:17 UTC
    Please "use strict", there are a number of errors in your program.

    It is probably not necessary to check the format of each line before processing.

    It is not clear to me what your first is if statement is supposed to do.

    You will need to rethink the logic to produce the desired output unless the numeric values are sorted by name.

    You will need additional logic to determine single and end.

      I think he wanted
      if( defined $5 and defined $6 ) { }
Re: Tagging the last elements
by graff (Chancellor) on Jul 28, 2009 at 06:45 UTC
    You won't be able to determine the intended output until you have read the entire input file into memory and have gone over all the records to check the numeric values associated with names in the fourth field. Maybe something like this:
    #!/usr/bin/perl use strict; use warnings; my @records; my %max; while (<DATA>) { my @fields = split; my ( $number, $name ) = ( $fields[3] =~ /^(\d+),(\w+)/ ); if ( ! $number or ! $name ) { warn "Line $.: unexpected input ignored: $_"; next; } $max{$name} = $number unless ( exists( $max{$name} ) and $max{$name} >= $number ); push @records, { name => $name, data => $_ }; } for my $rec ( @records ) { my $name = $rec->{name}; my $max = $max{$name}; my $flag = ( $max == 1 ) ? 'single' : 'end'; $rec->{data} =~ s/(\s)$max,$name,/$1$flag,$name,/; print $rec->{data}; } __DATA__ 62556 63635 y 1,andrew,JJ113954 63868 63897 h 1,morgan,JJ113955 68766 69005 j 2,morgan,JJ113955 81099 81630 y 1,flintoff,JJ113952 126185 126699 s 1,austin,JJ113956 135356 135449 3 1,peter,JJ113952 135588 136297 8 2,peter,JJ113952 158146 158367 i 3,peter,JJ113952 441474 441538 b 9,catherine,JJ029490 442666 442843 9 8,catherine,JJ029490 445778 445905 0 7,catherine,JJ029490 446059 446273 l 6,catherine,JJ029490 450319 450379 f 5,catherine,JJ029490

    (updated to add a condition for skipping lines that don't match /\d+,\w+/ in fourth field-- it pays to be careful...)

    In the for loop, the $rec->{data} =~ s/.../.../ will do nothing on the lines that don't contain the max value for a given name, because the regex won't match. For lines that contain a max value, the replacement will be either "single" or "end" depending on what the max value is.

    In order to read from a file name that you put on the command line (i.e. in @ARGV) when you run the script, just remove DATA from while(<DATA>) (and leave off the __DATA__ section at the end) -- there's no need to explicitly open a file whose name is provided via @ARGV.

Re: Tagging the last elements
by Marshall (Canon) on Jul 28, 2009 at 08:12 UTC
    Here is my "go" at this (below)....The problem statement is tough and requires some strange looking code.

    The data is read into an AoA and also the "ordinal" number before the combination of the "name and code" is also read into a hash table, a HoA with ordinal numbers. "ordinal number" here is a misnomer and is an artifact of first approach to the code. I expect the OP to rename things.

    As it turns out, only the combination of the name and code at the end of line is unique. flintoff has JJ113952 and peter also has JJ113952.

    The ordinal numbers for each combination of name/code are sorted. I also showed how to sort the original DB with a "weird sort order", but I'm not sure this is necessary and of course this can be modified to get all kinds of orders.

    To print the data, if the name/code como is unique, this is a singleton. To print "end" token, there are "gaps" in what I called the ordinal numbers, so I use the maximum one what was seen. This is the -1 index.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my @DB; my %hash; while (<DATA>) { my($n1,$n2,$letter,$ordinal,$name,$code) = (my @record = split(/[\s,]+/,$_)); push (@DB, [@record]); push (@{$hash{"$name$code"}},$ordinal); } foreach my $name_code (keys %hash) { @{$hash{$name_code}}= sort{$a<=>$b} @{$hash{$name_code}}; } @DB = sort by_weird_sort_order @DB; #optional sort foreach my $line (@DB) { my($n1,$n2,$letter,$ordinal,$name,$code) = @$line; $ordinal = "single" if @{$hash{"$name$code"}} == 1; $ordinal = "end" if ( @{$hash{"$name$code"}}>1 and (@{$hash{"$name$code"}})[-1] == $ordinal ); printf STDOUT ("%-7s %-7s %-3s %s,%s,%s\n", $n1,$n2,$letter,$ordinal,$name,$code); } sub by_weird_sort_order { my ($a_ordinal,$a_name) = (@$a)[3,4]; my ($b_ordinal,$b_name) = (@$b)[3,4]; $a_name cmp $b_name or $b_ordinal <=> $a_ordinal } #PRINTS: #62556 63635 y single,andrew,JJ113954 #126185 126699 s single,austin,JJ113956 #441474 441538 b end,catherine,JJ029490 #442666 442843 9 8,catherine,JJ029490 #445778 445905 0 7,catherine,JJ029490 #446059 446273 l 6,catherine,JJ029490 #450319 450379 f 5,catherine,JJ029490 #81099 81630 y single,flintoff,JJ113952 #68766 69005 j end,morgan,JJ113955 #63868 63897 h 1,morgan,JJ113955 #158146 158367 i end,peter,JJ113952 #135588 136297 8 2,peter,JJ113952 #135356 135449 3 1,peter,JJ113952 __DATA__ 62556 63635 y 1,andrew,JJ113954 63868 63897 h 1,morgan,JJ113955 68766 69005 j 2,morgan,JJ113955 81099 81630 y 1,flintoff,JJ113952 126185 126699 s 1,austin,JJ113956 135356 135449 3 1,peter,JJ113952 135588 136297 8 2,peter,JJ113952 158146 158367 i 3,peter,JJ113952 441474 441538 b 9,catherine,JJ029490 442666 442843 9 8,catherine,JJ029490 445778 445905 0 7,catherine,JJ029490 446059 446273 l 6,catherine,JJ029490 450319 450379 f 5,catherine,JJ029490
      Thanks a lot guys ;) After few more modifications it worked... Now I have another problem and below is my code and how I want to get the output.I want to give serial number based on the value1 and value2.
      Here is my code
      #!/usr/bin/perl my %query_score; while ( <DATA> ) { chomp; ($value1,$value2,$Mark,$Name,$Country) = split(/\t/,$_); push( @{ $query_score{"$Name:$Country"}{position} },$value2); $query_score{"$Name:$Country"}{Mark} = $Mark; $query_score{"$Name:$Country"}{Start} = $value1; } foreach $key ( sort keys %query_score ) { ($Name,$Country) = split(/:/,$key); @positions = sort @{ $query_score{$key}{position} }; $Mark = $query_score{$key}{Mark}; $value1 = $query_score{$key}{Start}; $min = shift(@positions); $max = pop(@positions); print("$value1\t$value2\t$Mark\t$Country\t$min\t$max\n"); } __DATA__ 532 1148 a andrew2 Norway 1547 1573 b mathew3 US 2013 2190 c mathew US 2096 2158 d mathew US 2896 2980 e docker5 UK 3919 4622 f king4 Aus 4180 4353 g king Aus 6621 6758 h lover4 Canada 7475 7568 i nun8 Mexico 7645 7725 j brazil9 Brazil 7817 8008 k brazil9 Brazil 8172 8309 l brazil9 Brazil 8399 8536 m brazil9 Brazil
      I am getting an OUTPUT like this:-
      3919 4622 f king4 Aus 8536 8399 8536 m Brazil 7725 8536 4180 8536 g Aus 4353 6621 8536 h Canada 6758 7475 8536 i Mexico 7568
      BUT I want my output to be like this:-
      Value1 Value2 Mark Name Country SerialNo 532 1148 a andrew2 Norway start 1547 1573 b mathew3 US start 2013 2190 c mathew US between 2096 2158 d mathew US end 2896 2980 e docker5 UK start 3919 4622 f king4 Aus start 4180 4353 g king Aus start 6621 6758 h lover4 Canada start 7475 7568 i nun8 Mexico start 7645 7725 j brazil9 Brazil start 7817 8008 k brazil9 Brazil between 8172 8309 l brazil9 Brazil between 8399 8536 m brazil9 Brazil end
      Thanks in advance
        crochunter:

        Since your example code is small enough, you might try using the debugger to step through it and see what part of the code is making the values disappear. Alternatively, you could use Data::Dumper (or equivalent) to print the data structure in various locations and see what matches your expectations and find where your expectations are violated. This won't be very painful, and it's very helpful in learning the language better.

        ...roboticus
        I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab).

        That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to

        split(" ",$_)
        (note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space."

        If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first.

        (updated to fix formatting)

        First, you should be running with warnings and strict!
        #!/usr/bin/perl -w use strict;
        This provides HUGE clues as to what might be wrong!

        I think you'll find that the /\s+/ hint by graff is needed and also consider:

        $min = shift(@positions); $max = pop(@positions);
        What happens if min and max are the same? i.e. just one position?
        $max = (@positions)[-1]; $min = (@positions)[0];
        will handle that situation.

        Update: a small update, also keep in mind that list slice allows multiple values to the left hand side, my ($min,$max) = (@positions)[0,-1]; would work also. The -1 index means the last one in the array, -2 would be second to last etc. But FAR AND AWAY, the best thing you can do to improve your code is religious use of warnings and strict!

Re: Tagging the last elements
by Anonymous Monk on Jul 28, 2009 at 04:18 UTC
    $1,$2,$3,$4... are global variables. max/min can change their values, so you have to save them to an intermediate variables. Also, you input/output is identical.