Bod has asked for the wisdom of the Perl Monks concerning the following question:

I am enforcing some rules are CRM tags and the regexps are pushing my limits. So, a follow up to a little question over at Re^3: regex in REPLACEMENT in s///

Part A

I have never got chomp to work so I avoid it...

my $tag = 'test text '; chomp $tag; print $tag . '#'; > test text #

I understood it to equivalent to s/ +$//; when used as above with no new lines.

Part B

The tags should be lowercase and exclude most punctuation. Extraneous punctuation removed and uppercase characters converted to lowercase. Here is what I have tried

$tag = lc $tag; $tag =~ s/[^az09 _\-\+\.]//g;
I expected the regexp to substitute anything that is not ^ in the character class [] with an empty string. But it seems to strip out anything that is not an 'a' character or a space.

How should I go about properly working out how to construct a regexp to do what I want?

Replies are listed 'Best First'.
Re: chomp and regexp
by haukex (Archbishop) on Sep 14, 2023 at 14:56 UTC
    I understood [chomp] to equivalent to s/ +$//; when used as above with no new lines.

    No, as per its docs:

    ... removes any trailing string that corresponds to the current value of $/...

    So normally that's just "\n". Update: The idiomatic way to trim other whitespace from strings in Perl is in fact with regexen, e.g. s/^\s+|\s+$//g. /Update

    s/[^az09 _\-\+\.]//g

    You appear to be missing the dash in the two ranges: s/[^a-z0-9 _\-\+\.]//g

    How should I go about properly working out how to construct a regexp to do what I want?

    Though it doesn't support all the advanced features of Perl, you could use https://regex101.com. When developing regexen, it's always best to have lots of test cases, and not only positive (what you want to match) but also negative (what you don't want to match). The site's "unit test" feature is very useful for that. (Of course there's also my WebPerl Regex Tester.)

      The idiomatic way to trim other whitespace from strings in Perl is in fact with regexen

      You can now use

      use builtin qw( trim );

      However, it's currently experimental.

      You appear to be missing the dash in the two ranges: s/[^a-z0-9 _\-\+\.]//g

      Oh...it's good to know that I wasn't too far out!
      Thank you.

Re: chomp and regexp
by hippo (Archbishop) on Sep 14, 2023 at 15:32 UTC

    I won't reiterate haukex's excellent points. Instead, it's worth a mention that if you are just stripping out all characters which match (or don't match) a given set then you do not need the heavy duty regex engine at all.

    use strict; use warnings; use Test::More tests => 1; my $tag = "A quick-brown fox or 3? %&)( \\\n\n"; my $want = 'a quick-brown fox or 3 '; $tag = lc $tag; $tag =~ y/a-z0-9 _+.-//cd; is $tag, $want;

    Add more tests of course to make sure you cover all bases. See y/// in perldoc for more.


    🦛