in reply to retain longest multi words units from hash

#!/usr/bin/perl # https://perlmonks.org/?node_id=1219394 use strict; use warnings; use Data::Dump 'dd'; my %phrase_counts = ( 'rendition' => '3', 'automation' => '2', 'saturation' => '3', 'mass creation' => 2, 'automation technology' => 2, 'automation technology process' => 3, 'technology process' => 5, 'automation process' => 2, 'process automation' => 2, ); dd 'before', \%phrase_counts; my $words = ''; for my $w ( sort { length $b <=> length $a } keys %phrase_counts ) { if( $words =~ join '.*?', map "\\b$_\\b", split ' ', $w ) { delete $phrase_counts{$w}; } else { $words .= "$w\n"; } } dd 'after', \%phrase_counts;

Replies are listed 'Best First'.
Re^2: retain longest multi words units from hash (updated)
by LanX (Saint) on Jul 28, 2018 at 15:32 UTC
    Greetings King of RegEx-Obfuscation! ;-)

    What would be the average length of $word for 1 million entries? 10 MB?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    ==== Update

    you can improve preformance if you sort by number of whitespaces first and only add to $word in chunks of strings with the same number of whitespaces.

    There is no point searching in an n word string being included in another n-word string.

    Particularly you can stop searching once you reached the one word strings.

Re^2: retain longest multi words units from hash
by LanX (Saint) on Jul 28, 2018 at 16:57 UTC
    your interpretation of the question is different to mine.

    DB<50> p 'a b c' =~ join '.*?', map "\\b$_\\b", split ' ', 'a c' 1

    you seem to check for ordered subsets in the same order, while I only look for subsequences like a b and b c .

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Don't you just love ambiguous specs? :)

        > Don't you just love ambiguous specs? :)

        Almost as much as deciphering your code... ;-)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice