comment on

Using constant is 3 times faster than your inline construct.

#! perl -slw
use strict;
use Readonly;
use Benchmark qw[ cmpthese ];

use constant {
    SUBSTITUTES =>       { # substitute these
        'DUTCH'       => 'NETHERLANDS',
        'GERMANY'     => 'DEUTSCHLAND',
        'AUST.'       => 'AUSTRALIA',
    },
    SKIPWORDS => { # skip these
        'BANK'        => 1,
        'CORP'        => 1,
        'GOVERNMENT'  => 1,
        'GOVT'        => 1,
        'LIMITED'     => 1,
        'LTD'         => 1,
        'NPV'         => 1,
        'COM'         => 1,
   },
};

sub wordsConstant {
  return [
    map {
        SUBSTITUTES->{$_} or $_
    } grep {
        !SKIPWORDS->{$_}
    } split /\s+/, shift ];
}

Readonly::Hash
    my %SUBSTITUTES => (
        'DUTCH'       => 'NETHERLANDS',
        'GERMANY'     => 'DEUTSCHLAND',
        'AUST.'       => 'AUSTRALIA',
);

Readonly::Hash
    my %SKIPWORDS => (
        'BANK'        => 1,
        'CORP'        => 1,
        'GOVERNMENT'  => 1,
        'GOVT'        => 1,
        'LIMITED'     => 1,
        'LTD'         => 1,
        'NPV'         => 1,
        'COM'         => 1,
);

sub wordsReadonly {
  return [
    map {
        $SUBSTITUTES{$_} or $_
    } grep {
        ! $SKIPWORDS{$_}
    } split /\s+/, shift ];
}

sub wordsInline {
  return [
    map {
      { # substitute these
        'DUTCH'       => 'NETHERLANDS',
        'GERMANY'     => 'DEUTSCHLAND',
        'AUST.'       => 'AUSTRALIA',
      }->{$_}
        or
      $_
    }
    grep {
      !{ # skip these
        'BANK'        => 1,
        'CORP'        => 1,
        'GOVERNMENT'  => 1,
        'GOVT'        => 1,
        'LIMITED'     => 1,
        'LTD'         => 1,
        'NPV'         => 1,
        'COM'         => 1,
      }->{$_}
    } split /\s+/, shift ];
}

our $testData = uc <<'EOD';
The quick dutch fox jumps over the lazy government dog.
The quick german fox jumps over the lazy bank dog.
The quick aust. fox limited jumps over the lazy corporate dog corp.
EOD

cmpthese -1, {
    constant => q[ wordsConstant( $testData ) ],
    inline   => q[ wordsInline( $testData ) ],
    readonly => q[ wordsReadonly( $testData ) ],
};

__END__
C:\test>559911
            Rate readonly   inline constant
readonly  2133/s       --     -22%     -81%
inline    2745/s      29%       --     -75%
constant 10957/s     414%     299%       --
[download]

In this specific example, the package is 5000+ lines long, and the words() function because it starts with w... will end up near the end of the file - 5000 lines away from the skip and substitute words.

With that much data, I'd definitely be putting it in a separate file on it's own. If you do not want to go to the bother of wrapping it up as a module, you could just put the constant hash definition into a file of it's own and use the simple do 'wordshash.pl'; just before the associated words() functions, though it would probably be better to wrap the function and data into a module and do something like use My::Words qw[ words ]; in the main code.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^3: Does Perl do constant anonymous hash creation optimisation? by BrowserUk
in thread Does Perl do constant anonymous hash creation optimisation? by jaa

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.