Re: retain longest multi words units from hash

People have posted other solutions. This is probably no worse or no better…

use strict;
use warnings;
use Data::Dump qw/dd/;

my %phrases = (
    'rendition' => '3',
    'automation' => '2',
    'saturation' => '3',
    'mass creation' => 2,
    'automation technology' => 2,
    'automation technology process' => 3,
    'technology process' => 5,
    'automation process' => 2,
);

sub filter_wordlist_thing {
    my %output = %{+shift};
    for my $key (grep / /, keys %phrases) {
        my @words = split / /, $key;
        my @word_combos =
            grep $_ ne $key,
            map join(" ", @words[$_->[0]..$_->[1]]),
            map { my $start = $_; map [$start, $_], $start .. $#words 
+} 0 .. $#words;
        delete @output{@word_combos};
    }
    \%output;
}

dd filter_wordlist_thing(\%phrases);
[download]

toby döt ink

Comment on Re: retain longest multi words units from hash Download Code

Replies are listed 'Best First'.
Re^2: retain longest multi words units from hash by LanX (Saint) on Jul 29, 2018 at 21:21 UTC
> This is probably no worse or no better… Looks like the same approach like in Re^2: retain longest multi words units from hash. I'm just speeding up the creation of `@word_combos` by pre-calculating the indices for an array slice. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l]
Re^3: retain longest multi words units from hash by tobyink (Canon) on Jul 30, 2018 at 10:43 UTC
Yeah, pretty similar, but doesn't impose a predetermined maximum word length on the hash keys. I also optimize by skipping single-word hash keys. toby döt ink	[reply]
Re^4: retain longest multi words units from hash by LanX (Saint) on Jul 30, 2018 at 22:46 UTC
> but doesn't impose a predetermined maximum word length on the hash keys. That was my intention. I could have easily dynamically added a new level of slice if needed, but the check would have cost time. OTOH it's very likely the OP can assume a maximum word length, which is still very cheap if generously chosen. > I also optimize by skipping single-word hash keys. Again I thought checking for single words might cost more than just deleting an empty slice. I have to admit I could have put the body of partitions() into the delete-slice , but preferred clarity here and left optimization to the OP. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]