heezy has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash containing language codes..

my %validLanguages = ( "de" => "german", "en" => "english", "es" => "spanish", "fr" => "french", "it" => "italian", "ja" => "jap", "ko" => "korean", "ru" => "rus, "sv" => "WHATS THIS", "zh" => "WHATS THIS", "zh_TW" => "WHATS THIS" );

...and I want to build a procedure that checks if a piece of text passed to it as a parameter is made up of only...

Replies are listed 'Best First'.
Re: Regex checking text is made up of the keys of a hash.
by Zaxo (Archbishop) on Mar 01, 2003 at 03:02 UTC

    Given the fixed hash %validLanguages (note that there is a level of hell for devising that name),
    Start the sub definition and shift in one argument:

    sub is_valid_list { my $text = shift;
    now split the string on combinations of a character class of comma and whitespace:     my @langs = grep {defined} split /[\s,]+/, $text; look for nonexistent keys and return the negation of scalar of that list - gives zero if some were found, one if none were. End of sub.
    ! grep { ! exists $validLanguages{$_} } @langs; }
    That's untested but I figure it will work.

    Update: The original failed for strings with [\s,]+ at the ends. Inserted grep {defined} to take care of that.

    After Compline,
    Zaxo

Re: Regex checking text is made up of the keys of a hash.
by blakem (Monsignor) on Mar 01, 2003 at 03:04 UTC
    It might not really be what you want, but I believe it fits the specs:
    #!/usr/bin/perl -wT use strict; my %validLanguages = ( "de" => "german", "en" => "english", "es" => "spanish", "fr" => "french", "it" => "italian", "ja" => "japanise", "ko" => "korean", "ru" => "russian", "sv" => "WHATS THIS", "zh" => "WHATS THIS", "zh_TW" => "WHATS THIS" ); # test it for ('dog','cat','de,en',' ',',,,','deensvzh') { printf "%-10s %s a list of languages\n", "'$_'", (isItJustAListOfLanguages($_) ? "is" : "is NOT"); } sub isItJustAListOfLanguages{ my $text = shift; my @tokens = (keys %validLanguages, '\s',','); my $tokenpatt = join('|',@tokens); return $text =~ /^($tokenpatt)+$/; } __END__ 'dog' is NOT a list of languages 'cat' is NOT a list of languages 'de,en' is a list of languages ' ' is a list of languages ',,,' is a list of languages 'deensvzh' is a list of languages

    -Blake

      This is so cool, it works so well and it's only 4 lines!

      thanks a lot!

Re: Regex checking text is made up of the keys of a hash.
by BrowserUk (Patriarch) on Mar 01, 2003 at 04:15 UTC

    You might not want the case insensitivity or to allow spaces between the language token and the comma, but I added them for completeness.

    #! perl -slw use strict; my %validLanguages = ( "de" => "german", "en" => "english", "es" => "spanish", "fr" => "french", "it" => "italian", "ja" => "japanise", "ko" => "korean", "ru" => "russian", "sv" => "WHATS THIS", "zh" => "WHATS THIS", "zh_TW" => "WHATS THIS" ); my $re_langs = join'|', keys %validLanguages; $re_langs = qr[\s*(?:$re_langs)\s*(?:,|$)]io; sub isOnlyLangs{ my ($string) = @_; $string =~ s[$re_langs][]g; $string =~ m[^\s*$]; } sub isOnlyLangs_{ (my $s = $_[0]) =~ s[$re_langs][]g; !$s; } print isOnlyLangs($_) ? 'Passed : ' : 'Failed : ', "'$_'" for 'de, en, fr, ja', ' de , en , fr , ja , ', 'monkish fr, en', 'monkish, fr, en', 'FR', 'Fr', 'fr en', 'fr , en,', 'zh_tw';

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: Regex checking text is made up of the keys of a hash.
by hv (Prior) on Mar 01, 2003 at 14:19 UTC

    It isn't entirely clear how strict you want the match to be: whether any of "defr", "de fr", ",," or "de,,fr" should be accepted. Let's start by joining the valid languages:

    my $re_langs = sprintf '(?:%s)', join '|', keys %validLanguages;

    Note that wrapping the alternatives in non-capturing parens allows me to treat $re_langs as if it were an atom in the examples below.

    Now, if any combination of languages, whitespace and commas is ok:

    $text =~ /^(?:$re_langs|\s|,)*\z/;

    That allows the empty string and all the examples above. To fail on the empty string you can replace the '*' in the pattern (zero or more) with '+' (one or more).

    To allow any combination, but require whitespace or commas separating languages (so that "defr" is not allowed) we require each language to be followed either by a separator or end of string:

    $text =~ /^(?:$re_langs(?=\s|,|\z)|\s|,)*\z/;

    That pattern can also be made simpler and faster if the language strings always start and end with a word character:

    $text =~ /^(?:$re_langs\b|\s|,)*\z/;

    If additionally the comma is optional but cannot appear multiple times, so that "de fr" is ok but "de,,fr" is not, one way would be to extend the pattern to say precisely that:

    $text =~ m{ ^ (?: $re_langs \b | \s | , (?! \s*, ) # comma not followed by another comma # (not even with intervening whitespace) )* \z }x;

    However it is probably more efficient to encode the patterns that must follow each language:

    $text =~ m{ ^ \s* ( ,\s* )? # allow stuff to precede first language (?: $re_langs (?: \s+ | \s* , \s* | \z ) )* \z }x;

    Finally, if each language must be followed by a comma but the final comma is optional, and all whitespace is optional:

    $text =~ /^\s*(?:$re_langs\s*(?:,\s*|\z))*\z/;

    I hope that gives you some useful options to consider, but please keep in mind that all the examples above are untested.

    Hugo
Re: Regex checking text is made up of the keys of a hash.
by heezy (Monk) on Mar 01, 2003 at 23:08 UTC

    Thanks to everyone who replied to this posting, I eventually adopted the solution from blakem

    but my thanks goes to all of you!