in reply to Re: sorting and merging in perl
in thread sorting and merging in perl

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: sorting and merging in perl
by marto (Cardinal) on Jul 28, 2018 at 18:25 UTC
      Below is the code, any Help Please, Thank you very much
      use warnings ; use Data::Dumper; my %results = (); while ( <DATA> ) { chomp ; my @row = split /,/, $_ ; print STDOUT "row values is:".$_."\n"; if ( (exists $results{ $row[0], $row1 })) { if ( ( $row2 ) < $results{ $row[0], $row1 }->{ 'ACTDATE' } ) { $results{ $row[0],$row1 }->{ 'ACTDATE' } = $row2 ; } print STDOUT "inactdate--".$row3." \n"; print STDOUT "in memory inact--".$results{ $row[0] }->{ 'INACTDATE' }. +"\n"; if ( ( defined $row3) && !( defined $results{ $row[0] }->{ 'INACTDATE' + } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; } elsif ( !( defined $row3) && ( defined $results{ $row[0] }->{ 'INACTDA +TE' } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; } elsif( ( $row3 ) > ( $results{ $row[0], $row1 }->{ 'INACTDATE' } ) ) { $results{ $row[0], $row1 }->{ 'INACTDATE' } = $row3; } } else { # Create new entry in hash $results{ $row[0],$row1 } = { 'A1' => $row[0], 'B1' => $row1, 'ACTDATE' => $row2, 'INACTDATE' => $row3, } } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }->{ 'A1' } ; my $b1 = $results{ $_ }->{ 'B1' } ; my $actDt = $results{ $_ }->{ 'ACTDATE' } ; my $inactDt = $results{ $_ }->{ 'INACTDATE' }; print "$a1,$b1,$actDt,$inactDt\n" ; } __DATA__ 7900724655,200906888,20180416,20180522 7900724655,200906889,20180724,20180728 7900724655,200906889,20180601,20180720 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,

      -------I am getting error as uninitialized variable----------

      row values is:7900724655,200906888,20180416,20180522 row values is:7900724655,200906889,20180724,20180728 row values is:7900724655,200906889,20180601,20180720 inactdate--20180720 Use of uninitialized value in concatenation (.) or string at web_New_A +DDON.pl line 16, <DATA> line 3. in memory inact-- row values is:7900724655,200906888,20180730,20180830 inactdate--20180830 Use of uninitialized value in concatenation (.) or string at web_New_A +DDON.pl line 16, <DATA> line 4. in memory inact-- row values is:7900724655,200906890,20180905,20180930 row values is:7900724655,200906890,20181005,20181030 inactdate--20181030

      2018-08-02 Athanasius added code tags

        Sekhar Reddy:

        Using code tags (<c>Your code goes here</c>) would be far better than trying to manually format your code:

        • It preserves spaces, so people can see your indentation
        • It's far easier than manually inserting all those paragraph tags
        • It's less error-prone as you don't have to manually look for <, > (et. al.) and patch them up.
        • It lets the site offer a "download" button to let people more easily grab your code

        If you had used code tags, it would look more like the following:

        use Data::Dumper; my %results = (); while ( <DATA> ) { chomp; my @row = split /,/, $_; print STDOUT "row values is:".$_."\n"; if ( (exists $results{ $row[0], $row[1] })) { if ( ( $row[2] ) < $results{ $row[0], $row[1] }->{ 'ACTDATE' } ) { $results{ $row[0],$row[1] }->{ 'ACTDATE' } = $row[2] ; } print STDOUT "inactdate--".$row[3]." \n"; print STDOUT "in memory inact--".$results{ $row[0] }->{ 'INACT +DATE' }."\n"; if ( ( defined $row[3]) && !( defined $results{ $row[0] }->{ ' +INACTDATE' } ) ) { $results{ $row[0] }->{ 'INACTDATE' } = undef ; }

        Obviously not exactly like that, as I don't know how your code is normally formatted, nor did I bother to clean up the whole thing.

        OK, for the errors you're encountering: I'd suggest you start looking at how you're accessing the data items you're putting in results. You're storing some items into the hash like this:

        $results{ $row[0], $row[1] } = { 'A1' => $row[0], 'B1' => $row[1], 'ACTDATE' => $row[2], 'INACTDATE' => $row[3], }

        but you're sometimes referring to them like:

            $results{ $row[0],$row[1] }->{ 'ACTDATE' }

        but at other times you're doing things like:

            $results{ $row[0] }->{ 'INACTDATE' }

        I expect the mismatched hash keys are the reason that you're getting all the "uninitialized value warnings. Clean that up and that will probably get you closer to your desired result.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        roboticus did identify the error in the code you posted.

        The code you posted is (with some minor changes).

        Split the line into 4 named variables instead of @row (makes the problem clearer).

        Eliminate quoting keys to the hash when they exist of all word characters (so don't need to be quoted).

        #!/usr/bin/perl use strict; use warnings; my %results; open my $fh, '<', 'file1.csv' or die $!; while ( <$fh> ) { chomp ; my ($a1, $b1, $actDt, $inactDt) = split /,/; if (exists $results{$a1,$b1} ) { if ($actDt < $results{$a1,$b1}{ACTDATE}) { $results{$a1,$b1}{ACTDATE} = $actDt; } if (!$inactDt || !$results{$a1,$b1}{INACTDATE}) { $results{$a1,$b1}{INACTDATE} = '' ; } elsif($inactDt > $results{$a1,$b1}{INACTDATE} ) { $results{$a1,$b1}{INACTDATE} = $inactDt; } } else { # Create new entry in hash $results{$a1,$b1} = { A1 => $a1, B1 => $b1, ACTDATE => $actDt, INACTDATE => $inactDt, }; } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }{A1} ; my $b1 = $results{ $_ }{B1} ; my $actDt = $results{ $_ }{ACTDATE}; my $inactDt = $results{ $_ }{INACTDATE}; print "$a1,$b1,$actDt,$inactDt\n" ; }
        The output gives erroneous results because you include non-contiguous records. The output with is:
        7900724655,200906888,20180416,20180830 7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906868,20180416,20180830 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,

        To get the results you want doesn't require a hash.

        #!/usr/bin/perl use strict; use warnings; open my $fh, '<', 'file1.csv' or die $!; my ($actDt, $inactDt, $count) = ('', '', 0); my $previous_key = ''; while (<$fh>) { chomp; my ($a1, $b1, $begin, $end) = split /,/; my $key = "$a1,$b1"; if ($previous_key eq $key) { $inactDt = $end; $count++; } else { # either reached the end of a record or this is the first r +ecord print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1; $actDt = $begin; $inactDt = $end; $count = 1; } $previous_key = $key; } print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1;
        This gives the desired results:
        7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,
        The input file is:
        7900724655,200906888,20180416,20180522 7900724655,200906889,20180601,20180720 7900724655,200906889,20180724,20180728 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,
          A reply falls below the community's threshold of quality. You may see it by logging in.
Re^3: sorting and merging in perl
by poj (Abbot) on Jul 30, 2018 at 14:17 UTC

      Its an additional scenario to the main requirement, I thought this additional scenario may not be done and hence i am looking for new has to store this result</p

        I'm struggling to understand the logic here. In the original post you have records

        7900724677,200906871,20180101,20180228
        7900724677,200906872,20180301,20180330
        7900724677,200906873,20180401,20180420
        

        creating the record

        7900724677,200906871:200906873,20180101,20180420

        but also you now want these records

        7900724655,200906888,20180416,20180522
        7900724655,200906889,20180601,20180720
        7900724655,200906889,20180724,20180728
        7900724655,200906888,20180730,20180830
        7900724655,200906890,20180905,20180930
        7900724655,200906890,20181005,20181030
        7900724655,200906890,20181104,
        

        to create 2 records

        7900724655,200906889,20180601,20180728
        7900724655,200906890,20180905,
        

        Is that correct ?

        poj