LesleyB has asked for the wisdom of the Perl Monks concerning the following question:

Greetings O Wise Ones

I'm having a problem getting a good regular expression for this problem

For any given string, I would like to match on a string contains the sequence of characters "\n","\f","\t" or "\r" e.g. match on $str = "Test\n" but not on $str = "Test".

So my code looks like this ...

print $str."\n<br />"; if ( ($str =~ /\\r/) or ($str =~ /\\n/) or ($str =~ /\\f/) or ($str +=~ /\\t/) ) { ##0 print "found ws using multiple regexps and or with double backslas +hes\n<br />"; } if ($str =~ /\\r|\\n|\\t|\\f/) { print "matches on ws using simple ' +|' list with double backslashes\n<br />";} ## 1 if ($str =~ /[\\r,\\n,\\t,\\f]/) { print "matches in character clas +s\n<br />";} ##2 if ($str =~ /\r|\n|\t|\f/) { print "matches on single backslash\n<br + />";} ##3 if ( ($str =~ /\\n/) ) { print "matches \\\\n (with double backslash +)\n<br />";} ##4 if ($str =~/\s/) { print "matches on \\s \n<br />";}##5

and I get the following results

rrrrrrrrrr matches in character class
rrrrrrrr\nrr found ws using multiple regexps and or with double backslashes matches on ws using simple '|' list with double backslashes matches in character class matches \\n (with double backslash)
rrrrrrrr\rr found ws using multiple regexps and or with double backslashes matches on ws using simple '|' list with double backslashes matches in character class

Regexp 0 and 1 appear to work but is there a better or more succint way to do it?

My 'character class' regexp 2, doesn't work. It matches on a plain 'r'. I've obviously go tsomething wrong here but I have no idea what. If anyone can enlighten me I'd be glad to learn.

I believe regexp 3, with single backslashes doesn't work because it is looking for an actual whitespace character such as 0x0a or 0x0d and not a character string representing a whitespace character

Regexp 4 isn't really completing the task so it doesn't appear when I use \r but does when I use \n.

I think regexp 5 fails for the same reasons regexp No 3 fails

Is there a character class to deal with this regexp problem and/or a more succint way of expressing the regexp?

Replies are listed 'Best First'.
Re: How to match a string containing whitespace characters
by ikegami (Patriarch) on Sep 15, 2008 at 22:59 UTC

    I believe regexp 3, with single backslashes doesn't work because it is looking for an actual whitespace character

    Correct. See "Escape sequences" in perlre.

    I think regexp 5 fails for the same reasons regexp No 3 fails

    Correct.

    My 'character class' regexp 2, doesn't work. It matches on a plain 'r'.

    [\\r,\\n,\\t,\\f] will match a single character from the following six: backslash (\), comma (,), "r", "n", "t" and "f". See Using character classes in perlretut.

    Is there [...] a more succint way of expressing the regexp?

    From /\\r|\\n|\\t|\\f/, we can factor out the slash:

    /\\(?:r|n|t|f)/

    And since we are choosing between single characters, we can replace the alternation with a character class:

    /\\[rntf]/
Re: How to match a string containing whitespace characters
by johngg (Canon) on Sep 15, 2008 at 23:00 UTC
    If you are looking for real newlines, tabs etc. rather than literal \ns or \ts you can use a character class like this. (Note the use of single- and double-quoting contructs in the array @strings).

    use strict; use warnings; my @strings = ( q {Only literal \t's, \n's etc. here}, qq{There's a real\nnewline in this}, qq{A real\ttab\tdelimited\tstring}, ); my $rxWS = qr{[\r\n\t\f]}; foreach my $string ( @strings ) { print qq{$string\n}; print $string =~ $rxWS ? qq{Match found\n\n} : qq{No match\n\n} }

    The output.

    Only literal \t's, \n's etc. here No match There's a real newline in this Match found A real tab delimited string Match found

    I hope this is of use.

    Cheers,

    JohnGG

Re: How to match a string containing whitespace characters
by moritz (Cardinal) on Sep 15, 2008 at 22:14 UTC
    /\\[rntf]/ perhaps?

    The problem with your attempt #2 is that you try to use commas in the character class, and that you confuse sequences (first a \\, then something else) with a character class (any of ...).

      Or perhaps /\\[rtfn]/ for the humorous similarity :P

      I'm so adjective, I verb nouns!

      chomp; # nom nom nom

Re: How to match a string containing whitespace characters
by LesleyB (Friar) on Sep 16, 2008 at 09:40 UTC

    Thank you all for taking the time to reply and demonstate solutions and in particular, thank you ikegami for the explanation of the character class problem