in reply to Re^2: sorting and merging in perl
in thread sorting and merging in perl

First you should show what you tried and how it failed (How do I post a question effectively?).

Replies are listed 'Best First'.
Re^4: sorting and merging in perl
by Sekhar Reddy (Acolyte) on Jul 29, 2018 at 14:37 UTC
    Below is the code, any Help Please, Thank you very much
    use warnings ; use Data::Dumper; my %results = (); while ( <DATA> ) { chomp ; my @row = split /,/, $_ ; print STDOUT "row values is:".$_."\n"; if ( (exists $results{ $row[0], $row1 })) { if ( ( $row2 ) < $results{ $row[0], $row1 }->{ 'ACTDATE' } ) { $results{ $row[0],$row1 }->{ 'ACTDATE' } = $row2 ; } print STDOUT "inactdate--".$row3." \n"; print STDOUT "in memory inact--".$results{ $row[0] }->{ 'INACTDATE' }. +"\n"; if ( ( defined $row3) && !( defined $results{ $row[0] }->{ 'INACTDATE' + } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; } elsif ( !( defined $row3) && ( defined $results{ $row[0] }->{ 'INACTDA +TE' } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; } elsif( ( $row3 ) > ( $results{ $row[0], $row1 }->{ 'INACTDATE' } ) ) { $results{ $row[0], $row1 }->{ 'INACTDATE' } = $row3; } } else { # Create new entry in hash $results{ $row[0],$row1 } = { 'A1' => $row[0], 'B1' => $row1, 'ACTDATE' => $row2, 'INACTDATE' => $row3, } } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }->{ 'A1' } ; my $b1 = $results{ $_ }->{ 'B1' } ; my $actDt = $results{ $_ }->{ 'ACTDATE' } ; my $inactDt = $results{ $_ }->{ 'INACTDATE' }; print "$a1,$b1,$actDt,$inactDt\n" ; } __DATA__ 7900724655,200906888,20180416,20180522 7900724655,200906889,20180724,20180728 7900724655,200906889,20180601,20180720 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,

    -------I am getting error as uninitialized variable----------

    row values is:7900724655,200906888,20180416,20180522 row values is:7900724655,200906889,20180724,20180728 row values is:7900724655,200906889,20180601,20180720 inactdate--20180720 Use of uninitialized value in concatenation (.) or string at web_New_A +DDON.pl line 16, <DATA> line 3. in memory inact-- row values is:7900724655,200906888,20180730,20180830 inactdate--20180830 Use of uninitialized value in concatenation (.) or string at web_New_A +DDON.pl line 16, <DATA> line 4. in memory inact-- row values is:7900724655,200906890,20180905,20180930 row values is:7900724655,200906890,20181005,20181030 inactdate--20181030

    2018-08-02 Athanasius added code tags

      Sekhar Reddy:

      Using code tags (<c>Your code goes here</c>) would be far better than trying to manually format your code:

      • It preserves spaces, so people can see your indentation
      • It's far easier than manually inserting all those paragraph tags
      • It's less error-prone as you don't have to manually look for <, > (et. al.) and patch them up.
      • It lets the site offer a "download" button to let people more easily grab your code

      If you had used code tags, it would look more like the following:

      use Data::Dumper; my %results = (); while ( <DATA> ) { chomp; my @row = split /,/, $_; print STDOUT "row values is:".$_."\n"; if ( (exists $results{ $row[0], $row[1] })) { if ( ( $row[2] ) < $results{ $row[0], $row[1] }->{ 'ACTDATE' } ) { $results{ $row[0],$row[1] }->{ 'ACTDATE' } = $row[2] ; } print STDOUT "inactdate--".$row[3]." \n"; print STDOUT "in memory inact--".$results{ $row[0] }->{ 'INACT +DATE' }."\n"; if ( ( defined $row[3]) && !( defined $results{ $row[0] }->{ ' +INACTDATE' } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; }

      Obviously not exactly like that, as I don't know how your code is normally formatted, nor did I bother to clean up the whole thing.

      OK, for the errors you're encountering: I'd suggest you start looking at how you're accessing the data items you're putting in results. You're storing some items into the hash like this:

      $results{ $row[0], $row[1] } = { 'A1' => $row[0], 'B1' => $row[1], 'ACTDATE' => $row[2], 'INACTDATE' => $row[3], }

      but you're sometimes referring to them like:

          $results{ $row[0],$row[1] }->{ 'ACTDATE' }

      but at other times you're doing things like:

          $results{ $row[0] }->{ 'INACTDATE' }

      I expect the mismatched hash keys are the reason that you're getting all the "uninitialized value warnings. Clean that up and that will probably get you closer to your desired result.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      roboticus did identify the error in the code you posted.

      The code you posted is (with some minor changes).

      Split the line into 4 named variables instead of @row (makes the problem clearer).

      Eliminate quoting keys to the hash when they exist of all word characters (so don't need to be quoted).

      #!/usr/bin/perl use strict; use warnings; my %results; open my $fh, '<', 'file1.csv' or die $!; while ( <$fh> ) { chomp ; my ($a1, $b1, $actDt, $inactDt) = split /,/; if (exists $results{$a1,$b1} ) { if ($actDt < $results{$a1,$b1}{ACTDATE}) { $results{$a1,$b1}{ACTDATE} = $actDt; } if (!$inactDt || !$results{$a1,$b1}{INACTDATE}) { $results{$a1,$b1}{INACTDATE} = '' ; } elsif($inactDt > $results{$a1,$b1}{INACTDATE} ) { $results{$a1,$b1}{INACTDATE} = $inactDt; } } else { # Create new entry in hash $results{$a1,$b1} = { A1 => $a1, B1 => $b1, ACTDATE => $actDt, INACTDATE => $inactDt, }; } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }{A1} ; my $b1 = $results{ $_ }{B1} ; my $actDt = $results{ $_ }{ACTDATE}; my $inactDt = $results{ $_ }{INACTDATE}; print "$a1,$b1,$actDt,$inactDt\n" ; }
      The output gives erroneous results because you include non-contiguous records. The output with is:
      7900724655,200906888,20180416,20180830 7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906868,20180416,20180830 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,

      To get the results you want doesn't require a hash.

      #!/usr/bin/perl use strict; use warnings; open my $fh, '<', 'file1.csv' or die $!; my ($actDt, $inactDt, $count) = ('', '', 0); my $previous_key = ''; while (<$fh>) { chomp; my ($a1, $b1, $begin, $end) = split /,/; my $key = "$a1,$b1"; if ($previous_key eq $key) { $inactDt = $end; $count++; } else { # either reached the end of a record or this is the first r +ecord print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1; $actDt = $begin; $inactDt = $end; $count = 1; } $previous_key = $key; } print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1;
      This gives the desired results:
      7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,
      The input file is:
      7900724655,200906888,20180416,20180522 7900724655,200906889,20180601,20180720 7900724655,200906889,20180724,20180728 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,

        Hi Athanasius,

        First of all thank you very much for your try. I tried with your code, but still that is also giving some incorrect results. Below fyi.

        Input data that i have considered below

        7900724666,200906888,20180416,20180522 7900724666,200906888,20180601,20180720 7900724666,200906888,20180406,20180411 7900724677,200906872,20180301,20180330 7900724677,200906871,20180101,20180228 7900724677,200906873,20180401,20180420 7900724688,200906881,20180101,20180228 7900724688,200906881,20180303,20180330 7900724688,200906882,20180404,20180430 7900724688,200906883,20180508,20180620 7900724699,200906891,20180101,20180228 7900724699,200906891,20180303,20180330 7900724699,200906892,20180404,20180430 7900724699,200906893,20180508, 7900724611,200906888,20180416,20180522 7900724611,200906889,20180724,20180728 7900724611,200906889,20180601,20180720 7900724611,200906888,20180730,20180830 7900724611,200906890,20180905,20180930 7900724611,200906890,20181005,20181030 7900724611,200906890,20181104, 7900724622,200906868,20180416,20180522 7900724622,200906869,20180601,20180720 7900724622,200906869,20180724,20180728 7900724622,200906868,20180730,20180830 7900724622,200906890,20180905,20180930 7900724622,200906890,20181005,20181030 7900724622,200906890,20181104,

        The output which i have got is below

        7900724666,200906888,20180416,20180411 7900724688,200906881,20180101,20180330 7900724699,200906891,20180101,20180330 7900724611,200906889,20180724,20180720 7900724611,200906890,20180905, 7900724622,200906869,20180601,20180728 7900724622,200906890,20180905,

        Here 1st line and 4th line in output are incorrect: Ex:

        7900724611,200906889,20180724,20180720

        Here expected result in output 4th line is 7900724611,200906889,20180601,20180728

        A reply falls below the community's threshold of quality. You may see it by logging in.