Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Looking at perlre, why doesn't [\.|\s] below match 'number followed by period OR space character?

my( $item_num, $event, $sep, $day )= $entry =~ m{^ (\d+) [\.|\s] \s+ ( +.*?) \s+(-|\N{EN DASH}|\N{EM DASH})\s+ (.*?) $}x; example entries 123 Festival - Sunday # doesn't work 456. Pool Party - Tuesday # works

Replies are listed 'Best First'.
Re: Regex, match one or other character class
by choroba (Cardinal) on Apr 25, 2023 at 16:17 UTC
    Because there's only one space after the number, which is consumed by the character class, but there's no more spaces to be matched by the following \s+.

    You can just say "optional dot" instead, i.e.

    [.]?

    Working example:

    #!/usr/bin/perl use warnings; use strict; use experimental qw( signatures ); sub match($entry) { $entry =~ m{^ (\d+) [.]? \s+ (.*?) \s+(-|\N{EN DASH}|\N{EM DASH})\ +s+ (.*?) $}x; } use Test::More tests => 3; ok match('123 Festival - Sunday'); ok match('456. Pool Party - Tuesday'); ok ! match('789| Gig - Friday');

    Moreover, the | matches itself in a character class which is not what you want (try the last test against the original regex).

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Regex, match one or other character class
by hippo (Archbishop) on Apr 25, 2023 at 16:20 UTC

    [\.|\s] \s+ matches either a dot (or a pipe - thanks, LanX) or a space but then followed by at least one other space. Only one of your test strings matches this pattern.

    Update: added bracketed clause to specify the pipe - see reply


    🦛

      > [\.|\s]

      > either a dot or a space

      or a literal pipe symbol | ²

      The OP seems to think he can have two alternative character classes inside [...]

      To achieve this he needs (?:[CLASS1]|[CLASS2]) ... (the ?: is for not catching).°

      Probably he just confused it with (\.|\s) or (?:\.|\s)

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

      edit

      °) Sorry, in hindsight this doesn't make much sense (?:[CLASS1]|[CLASS2]) is the same like [CLASS1CLASS2] , 'or'-operations follow associative law

      ²) and the [\.] is redundant, it's the same like [.] and won't match everything

        and the [\.] is redundant, it's the same like [.] and won't match everything

        Yes, I am aware of that. However the OP's question was to explain why the regex he provided, in all its baroque splendour, did not match both the input lines. I think it would be distracting to go changing his regex and then explain how the modified form works rather than answer the question as posed. There are plenty of things I would change about the given regex to achieve the same end were I to write it of which this is merely one. Although the first thing would be to lose the /x which is causing far more confusion than it removes in this case IMHO.

        or a literal pipe symbol |

        That's a good observation and I'll amend my post to mention it. Thanks.


        🦛