lshadow has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have three ints describing locations which I read to a hash data structure:

$VAR1 = { '27069_1' => { 'stop' => 214601666, 'start' => 214600991 }, '14644_1' => { 'stop' => 166950686, 'start' => 166950125 }, }

I'd need to remove the elements that have equal start/stop locations the ones that have lt start comparing to other but the stop will be between second elements start and stop. And if the second elements start is between the first elements start and stop and the stop is gt than the first elements stop. I know this is really hard to explain. I try to achieve this by creating the not that complex data structure with:

$tss{$ch[0]}{$id}={ start=>$start, stop=>$stop };

For some reason it manages to remove 'some' elements that have overlapping region in somehow but not all. The removing is done in:

for(keys(%the_hash)){ my $ch=$_; my @a=keys(%{$tss{$ch}}); my $tid=pop(@a); my $s=$tss{$ch}{$tid}{'start'}; my $st=$tss{$ch}{$tid}{'stop'}; foreach(keys(%{$tss{$ch}})){my $k=$_; if($tss{$ch}{$k}{'start'} eq $s and $tss{$ch}{$k}{'sto +p'} eq $st #and $_ ne $tid ){ delete($tss{$ch}{$k}); } ..... and so on

The loop is quite long having just more elif-statements to see if the regions overlap. I think the problem lies somewhere in the looping technique I'm using. I tried others but with no succes. Also I could change the data structure. Now it is having a hash, which has like 20 numeric keys, and each of those having multiple different id's (unique keys) and each id has the start and stop property. I try to access first for each of the 20 first elements as the id's can be groupd due that to smaller groups.

Replies are listed 'Best First'.
Re: deleting elements in nested hash structure
by NetWallah (Canon) on Jun 02, 2012 at 18:00 UTC
    It may be easier to read, manage and extend this, if it is written in OO style.

    Based on corrected BrowserUk's (++) code, I offer:

    use strict; use warnings; use Data::Dump qw[ pp ]; #------------------------------------------ {package Event; sub new{ my ($class,@att) = @_; return bless {map {$_=>shift(@att)} qw|ID start stop|}, $class ; } sub Overlaps_with{ my ($self, $other)=@_; return 0 if $self->{start} > $other->{stop}; return 0 if $self->{stop} < $other->{start}; push @{$other->{OVERLAPS}}, $self->{ID},$self->{OVERLAPS}?@{$sel +f->{OVERLAPS}}:(); return 1; } 1; }#---- End of Package Event ------------------ my %tss = map { my $s = int( rand 1000 ); $_ => Event::->new ($_, $s, $s + int (rand 200) ); } 1 .. 15; pp \%tss; my @ids = sort{ $a <=> $b } keys %tss; for my $first ( 0 .. $#ids ) { for my $second ( $first+1 .. $#ids ) { if ($tss{$ids[$first]}->Overlaps_with ($tss{$ids[$second]})){ ## print "Deleting $ids[$first]\n"; delete $tss{ $ids[$first] }; last; } } } my @x = sort {$a->{start} <=> $b->{start} } values %tss; pp \@x;
    This version tracks the deleted item ID's.

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

Re: deleting elements in nested hash structure
by BrowserUk (Patriarch) on Jun 02, 2012 at 16:04 UTC

    Something like this?

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my %tss = map { my $s = int( rand 1000 ); $_ => { start => $s, stop => $s + int( rand 100 ) } } 1 .. 100; #pp \%tss; my @ids = sort{ $a <=> $b } keys %tss; my @toDelete; for my $first ( 0 .. $#ids ) { for my $second ( $first+1 .. $#ids ) { ## updated per NetWallah's suggestion below push( @toDelete, $ids[ $first ] ), last unless $tss{ $ids[ $first ] }{ start } > $tss{ $ids[ $seco +nd ] }{ stop } or $tss{ $ids[ $first ] }{ stop } < $tss{ $ids[ $seco +nd ] }{ start }; } } #pp \@toDelete; delete @tss{ @toDelete }; pp \%tss; __END__ C:\test>junk { 74 => { start => 982, stop => 1011 }, 80 => { start => 572, stop => 577 }, 87 => { start => 286, stop => 319 }, 90 => { start => 600, stop => 674 }, 91 => { start => 252, stop => 266 }, 92 => { start => 1, stop => 7 }, 94 => { start => 210, stop => 221 }, 95 => { start => 553, stop => 554 }, 96 => { start => 113, stop => 158 }, 97 => { start => 752, stop => 820 }, 98 => { start => 543, stop => 550 }, 99 => { start => 436, stop => 456 }, 100 => { start => 696, stop => 725 }, }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      um .. shouldn't that be
      if ( $tss{ $ids[ $first ] }{ start } > $tss{ $ids[ $second + ] }{ stop } or $tss{ $ids[ $first ] }{ stop } < $tss{ $ids[ $seco +nd ] }{ start } ){ # No Overlap }else{ push @toDelete, $ids[ $first ]; # <<<< Note: changed $ +first to $ids[first] last; # <<<< Added 'last' to avoid deleting $ +first multiple times }

                   I hope life isn't a big joke, because I don't get it.
                         -SNL

        last; # <<<< Added 'last' to avoid deleting $first multiple times

        It doesn't change the result, but yes, it is a more optimal implementation. I wouldn't code it that way though.

        Whilst if( noOverlap ) { do nothing } else { push @todelete }

        is functionally equivalent to push @toDelete unless noOverlap;

        I find the latter easier on both the eye and the brain.

        Hence, I#ll modify the above this way:

        push( @toDelete, $first ), last unless $tss{ $ids[ $first ] }{ start } > $tss{ $ids[ $seco +nd ] }{ stop } or $tss{ $ids[ $first ] }{ stop } < $tss{ $ids[ $seco +nd ] }{ start };

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?