comment on

It isn't entirely clear how strict you want the match to be: whether any of "defr", "de fr", ",," or "de,,fr" should be accepted. Let's start by joining the valid languages:

  my $re_langs = sprintf '(?:%s)', join '|', keys %validLanguages;
[download]

Note that wrapping the alternatives in non-capturing parens allows me to treat $re_langs as if it were an atom in the examples below.

Now, if any combination of languages, whitespace and commas is ok:

  $text =~ /^(?:$re_langs|\s|,)*\z/;
[download]

That allows the empty string and all the examples above. To fail on the empty string you can replace the '*' in the pattern (zero or more) with '+' (one or more).

To allow any combination, but require whitespace or commas separating languages (so that "defr" is not allowed) we require each language to be followed either by a separator or end of string:

  $text =~ /^(?:$re_langs(?=\s|,|\z)|\s|,)*\z/;
[download]

That pattern can also be made simpler and faster if the language strings always start and end with a word character:

  $text =~ /^(?:$re_langs\b|\s|,)*\z/;
[download]

If additionally the comma is optional but cannot appear multiple times, so that "de fr" is ok but "de,,fr" is not, one way would be to extend the pattern to say precisely that:

  $text =~ m{
    ^
    (?: $re_langs \b
      | \s
      | , (?! \s*, ) # comma not followed by another comma
                     # (not even with intervening whitespace)
    )*
    \z
  }x;
[download]

However it is probably more efficient to encode the patterns that must follow each language:

  $text =~ m{
    ^
    \s* ( ,\s* )?  # allow stuff to precede first language
    (?: $re_langs
      (?: \s+
        | \s* , \s*
        | \z
      )
    )* \z
  }x;
[download]

Finally, if each language must be followed by a comma but the final comma is optional, and all whitespace is optional:

  $text =~ /^\s*(?:$re_langs\s*(?:,\s*|\z))*\z/;
[download]

I hope that gives you some useful options to consider, but please keep in mind that all the examples above are untested.

Hugo

In reply to Re: Regex checking text is made up of the keys of a hash. by hv
in thread Regex checking text is made up of the keys of a hash. by heezy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.