How many hash keys do you have ? Because this is going to scale horribly -- O(N**2).

Rather than crank out a search loop, I'd have a go at getting Perl to do the heavy lifting:

use strict ; use warnings ; my %h = ('this is a test' => 2, 'is a test' => 2, 'a test' => 1, 'this is' => 1, 'is a' => 1, 'unique' => 1, 'also Unique' => 3, ) ; my @k = sort { length($b) <=> length($a) } keys(%h) ; my $s = "\0". join("\0", @k)."\0" ; # Assumption: "\0" doesn't appear + in any key study $s ; foreach my $k (@k) { $s =~ m/(.)$k(.)/ ; if (($1 ne "\0") || ($2 ne "\0")) { delete $h{$k} ; } ; } ; print join(', ', map "'$_'", sort keys %h), "\n" ;
(it might save a little time if all keys of the largest length were removed from @k before the foreach).

Update: unless I've misunderstood, and your objective isn't to remove all substrings...


Further Update: Having tried this on some 100,000 keys (chosen at random, but ensuring at least some substrings) I found that the study $s didn't help.

I was also worried about the (.) in the match. I found that:

my @k = sort { length($b) <=> length($a) } keys(%h2) ; my $s = "\0". join("\0", @k)."\0" ; my @a = () ; my $p = 0 ; foreach my $k (@k) { $p += length($k) + 1 ; push @a, $p ; } ; foreach my $k (@k) { pos $s = 0 ; $s =~ m/$k/g ; next if pos($s) == shift(@a) ; delete $h2{$k} ; } ;
was more than twice as fast (saving 85 seconds on my machine) for 100,000 keys; twice as fast (saving 0.8 seconds) for 10,000 keys; but for 1,000 keys the speed difference was in the noise.

But this:

my @k = sort { length($b) <=> length($a) } keys(%h) ; my $s = '' ; foreach my $k (@k) { delete($h4{$k}) and next if ($s =~ m/$k/) ; $s .= $k . "\0" ; } ;
ran a little faster still, and will work better the more substrings there are -- so this is what I would use.

For completeness:

my @k = sort { length($a) <=> length($b) } keys(%h3) ; my $lx = length($k[-1]) ; my $ll = 0 ; my $li = 0 ; for my $k (@k) { if (length($k) != $ll) { $ll = length($k) ; last if $ll == $lx ; ++$li until length($k[$li]) > $ll ; } ; for my $i ($li..$#k) { next if ($k[$i] !~ /$k/) ; delete $h3{$k} ; last ; } ; } ;
which cranks the search by hand, took 24.5s for 10,000 keys -- compared to 0.75 for the fastest method !


In reply to Re: hash substrings by gone2015
in thread hash substrings by perlcat

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.