Tikplay has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I’m having issues with this and wondering if someone could provide some help. I'm parsing a .txt file and want to combine duplicated keys and it's values. Essentially, for each identifier I want to store it's height value. Each "sample" has 2 entries (A & B). I have the file stored like this:

while(...){ @data= split ("\t", $line); $curr_identifier= $data[0]; $markername= $data[1]; $position1= $data[2]; $height= $data[4]; if ($line >0){ $result[0] = $markername; $result[1] = $position1; $result[2] = $height; $result[3] = $curr_identifier; $data{$curr_identifier}= [@result]; } }

This seems to work fine, but my issue is that when I send this data to below function. It prints the $curr_identifier twice. I only want to populate unique identifiers and check for the presence of it's $height variable.

if (!defined $data{$curr_identifier}[2]){ $output1= "no height for both markers- failed"; } else { if ($data{$curr_identifier}[2] eq " ") { $output1 = $markername; } } print $curr_identifier, $output1 . "\t" . $output1 . "\n";

Basically, if sample height is present for both markers (A&B), then output is both markers. '1', 'A', 'B'

If height is not present, then output is empty for reported marker.

'2', 'A', ' '

'3', ' ', 'B'

My current output is printing out like this:

1, A

1, B

2, A

2, ' '

3, ' '

3, B'

_DATA_

Name Marker Position1 Height Time

1 A A 6246 0.9706

1 B B 3237 0.9706

2 A 0

2 B B 5495 0.9775

3 A A 11254 0.9694

3 B 0

Replies are listed 'Best First'.
Re: Combine duplicated keys in Hash array
by kcott (Archbishop) on Jul 03, 2020 at 06:29 UTC

    G'day Tikplay,

    Welcome to the Monastery.

    There's a number of issues with your post which makes providing an answer difficult. You seem to be relying on "dynamically scoped variables" which could be causing you problems; however, as you haven't provided sufficient code, that's just a guess. Take a look at perlintro, "How do I post a question effectively?" and SSCCE.

    You've posted the code you did provide in <code> tags, which is good; however, you should also do the same for your data. HTML renders consecutive whitespace characters as a single space: all of the tabs in your input are lost. The code below just uses those spaces for the DATA; the output uses tabs. I also added a couple of dummy lines to test the "failed" logic.

    As best as I can determine, this code performs the basic functionality you want:

    #!/usr/bin/env perl use strict; use warnings; my %data; while (<DATA>) { chomp; next unless length; my ($id, $height) = (split)[0,3]; $height = ' ' unless defined $height; push @{$data{$id}}, $height; } my $format = "'%d', '%s',\t'%s'\n"; for my $id (sort { $a <=> $b } keys %data) { if ($data{$id}[0] eq ' ' && $data{$id}[1] eq ' ') { print "$id: both height markers missing - failed\n"; } else { printf $format, $id, @{$data{$id}}; } } __DATA__ 1 A A 6246 0.9706 1 B B 3237 0.9706 2 A 0 2 B B 5495 0.9775 3 A A 11254 0.9694 3 B 0 4 A 4 B

    Here's the output:

    '1', '6246', '3237' '2', ' ', '5495' '3', '11254', ' ' 4: both height markers missing - failed

    — Ken

Re: Combine duplicated keys in Hash array
by Marshall (Canon) on Jul 03, 2020 at 02:48 UTC
    UPDATE: Sorry I think I messed up here. Is this something like you want? My brain is hurting today.
    use strict; use warnings; while (<DATA>) { next unless /\S/; chomp; my ($name, $markerA, $markerB, $height) = split (/\s+/,$_); $name //= ""; $markerA //= ""; $markerA ="" if $markerA =~ /^0$/; $markerB //= ""; $markerB ="" if $markerB =~ /^0$/; if ($markerA !~ /A/) { $markerA = $markerB; $markerB = "B"; } $height //= ""; if (!$height) { my $string; foreach ($name, $markerA, $markerB) { $string .= "\'$_\',"; } chop $string; print "$string\n"; } } =prints '2','A','' '3','','B' =cut #Name Marker1 Marker2 Height Time __DATA__ 1 A A 6246 0.9706 1 B B 3237 0.9706 2 A 0 2 B B 5495 0.9775 3 A A 11254 0.9694 3 B 0
    I think this could be done better. Below was a admittedly failed bogus solution for first post.
    use strict; use warnings; my %name; while(my $line =<DATA>) { next unless $line =~ /\S/; #skip blank lines chomp $line; my ($undef, $markerName,undef, undef, $height) = split (/\s+/, $line); $name{$markerName} = $height; } foreach (sort keys %name) { print "$_ => $name{$_}\n"; } =prints a => 55 b => 32 c => 34 =cut __DATA__ 1 a 23 99999 55 4 c 55 8888 34 5 b 45 88888 32
Re: Combine duplicated keys in Hash array
by BillKSmith (Monsignor) on Jul 03, 2020 at 18:40 UTC
    Your %data hash does not save both @result arrays for each "identifier". Note that I have changed your code to push the array reference into an array rather than storing them directly. I think I am processing the resulting hash correctly - at least it duplicates your desired output. Your output suggests that you were attempting to process the data has inside your loop rather than passing it on as your text indicates. You printed output for every line.
    use strict; use warnings; use Data::Dumper; my %data; while (my $line = <DATA>) { chomp $line; next if $line eq ''; my @data = split ("\t", $line); my $curr_identifier= $data[0]; my $markername= $data[1]; my $position1= $data[2]; my$height= $data[4]; my @result; if ($. > 1){ $result[0] = $markername; # line[1] $result[1] = $position1; # line[2] $result[2] = $height; # line[4] $result[3] = $curr_identifier; # line[0] #$data{$curr_identifier}= [@result]; push @{$data{$curr_identifier}}, [@result]; } } #print Dumper(\%data); foreach my $curr_identifier (sort keys %data) { my $curr_data = $data{$curr_identifier}; # Neither defined if (!defined $curr_data->[0][2] and !defined $curr_data->[1][2] ) +{ print "$curr_identifier - No height for either marker - failed +\n"; next; } # Both defined my $output0; my $output1; my $height_0 = $curr_data->[0][2]; my $height_1 = $curr_data->[1][2]; my $markername_0 = $curr_data->[0][0]; my $markername_1 = $curr_data->[1][0]; if ( defined $height_0 and defined $height_1 ) { $output0 = $curr_data->[0][0]; $output1 = $curr_data->[1][0]; } else { # Only one defined $output0 = (!defined $height_0) ? $markername_0 : q(''); $output1 = (!defined $height_1) ? $markername_1 : q(''); } print "$curr_identifier , $output0 , $output1\n"; } __DATA__ Name Marker Position1 Height Time 1 A A 6246 0.9706 1 B B 3237 0.9706 2 A 0 2 B B 5495 0.9775 3 A A 11254 0.9694 3 B 0 4 B B 4 A A

    OUTPUT:

    1 A B 2 A '' 3 '' B 4 - No height for either marker - failed
    Bill
      Thank you! This most mimics what I'm going for. Could you explain this behalf?
      my $height_0 = $curr_data->[0][2]; my $height_1 = $curr_data->[1][2]; my $markername_0 = $curr_data->[0][0]; my $markername_1 = $curr_data->[1][0];

        My use of push changed the nature of your structure %data from a hash of arrays to a hash of arrays of arrays. (refer: perldsc) Please uncomment my "print Dump" statement and run my code again. Refer to that dump as you read my explanation.

        The keys of the hash %data are the instances of $curr_identifier. The corresponding values are array refs. Each of these refers to an array of two elements (one for each line associated with the curr_identifier). Each of these elements is itself a reference to an array (in fact, each array is an instance of your originally "@result".)

        My for loop iterates through the keys ($curr_identifier) of the hash %data. For each key, it stores the corresponding value (A reference to an array of arrays) as $curr_data. The next four statements, which you explicitly asked about, dereference that reference (Section "Using References" in perlref) to get the two values of height and the two values of markername. I appended a zero or a one to each name to indicate which of the pair of lines it came from.

        Bill