cheerydog has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Perl Monks
My brain is on strike and I cannot find out the regex with the following properties:

- the occurence of the string is optional (0 or one occurences)
- if there the string is present, it has to start with '#'
- this string should be captured, without this leading mandatory character '#'


Example:
'bli bla' -> $1 is empty

'#foo bli bla'. -> $1 is 'foo'

Thanks anyways and highest regards

Cheery

Replies are listed 'Best First'.
Re: Simple Regex drives me insane
by ikegami (Patriarch) on Mar 30, 2024 at 17:52 UTC

    tl,dr: I recommend forgetting about setting $1 and using

    my $match = /#(\S*)/ ? $1 : "";

    First, some assumptions since the specs aren't clear.

    • You didn't define what "the string" is. I'm going to assume it's "a sequence of zero-or-more non-spaces" (since it can be foo but not foo bli bla). I'll also cover stricter patterns in my answer.
    • Your example shows the "#" at the start of the searched string, but you didn't specify it had to be there. I'm going to assume it doesn't have to be at the start.

    The following is a single match that produces the desired result:

    (Update: There's a much simpler approach using (?|pattern). See tybalt89's answer.)

    If it doesn't have to be a single match, you could also use the following:

    That said, I think setting a variable other than $1 would be better. This allows us to use one of the following:

    • my $match = /#(\S*)/ ? $1 : ""; say $match;
    • my $match = /#\K\S*/ ? $& : ""; say $match;

    And if it's acceptable to produce undef instead of an empty string when the pattern isn't found, you could even use the following:

    • my ( $match ) = /#(\S*)/; say $match;

    These are far more readable and maintainable. Among other things, this makes them far less error-prone.

    Also, these would be trivial to adapt them to match more restrictive patterns. For example,

    • my $match = /#(\S+)/ ? $1 : ""; say $match;
    • my $match = /#(\w+)/ ? $1 : ""; say $match;

    Adapting the first version would be horrible.

    • / ^ (?: (?! \#\S ). )* \#? ( \S* ) /x; # /#(\S+)/ say $1;
    • / ^ (?: (?! \#\w ). )* \#? ( \w* ) /x; # /#(\w+)/ say $1;
Re: Simple Regex drives me insane
by tybalt89 (Monsignor) on Mar 30, 2024 at 21:56 UTC

    Assumptions:

    1. the #xxx do not have to be at the beginning of the string
    2. more than one #xxx can be in the string

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11158577 use warnings; for ( split /\n/, <<END ) #foo bli bla bli #foo bar baz #more than #one is a #string more than #one is a #string foo bli bla # does not match # END { /(?|#(\S+)|()\z)/ and print "found '$1' in '$_'\n"; }

    Outputs:

    found 'foo' in '#foo bli bla' found 'foo' in 'bli #foo bar baz' found 'more' in '#more than #one is a #string' found 'one' in 'more than #one is a #string' found '' in 'foo bli bla' found '' in '# does not match #'
Re: Simple Regex drives me insane
by hippo (Archbishop) on Mar 31, 2024 at 16:58 UTC
Re: Simple Regex drives me insane
by Anonymous Monk on Mar 30, 2024 at 15:06 UTC
    Maybe /(?:#(\w+))?/ assuming \w+ is enough to capture "the string"?
      Thank you very much for you answer. It works but there is a problem: I would to successfully match the string if '#foo' is not there (0 or 1 occurences)

        It already does do that.

        Specifically, it always matches, setting $1 to foo if #\w+ is found at the start of the string, and setting $1 to undef otherwise.

Re: Simple Regex drives me insane
by Timka (Acolyte) on Mar 30, 2024 at 17:52 UTC

    How about:

    / ^ \# (.*) # Something starting with a hash. | # Or .* # Anything else. /x;

      It doesn't set $1 to the foo when matching against #foo bli bla, despite an example stating it should.

      It doesn't set $1 to the empty string when matching against bli bla, despite an example stating it should.

      It doesn't set $1 to foo when matching against bli #foo bla.

      For the last two, it doesn't match and leaves $1 unchanged.


      Update: After the update to the parent, the following holds:

      It doesn't set $1 to the foo when matching against #foo bli bla, despite an example stating it should.

      It doesn't set $1 to the empty string when matching against bli bla. However, it does set it to undef, which might be close enough.

      It doesn't set $1 to foo when matching against bli #foo bla. It sets $1 to undef instead.

        I updated my last reply

        But I think it might be better to step back and consider what you are trying to achieve.

        If you can use code, that may be simpler:

        my $in = "foo"; my $out = $in =~ /#(.*)/ ? $1 : "";