in reply to Re: Is there a way to make these two regex lines cleaner?
in thread Is there a way to make these two regex lines cleaner?

So, if you choose substitution: s/[^a-zA-Z0-9"' .?!-]//g

if transliteration (would be my choice): y/a-zA-Z0-9"' .?!-//cd

I'd still be inclined to escape the '-' in both cases: someone is all too likely to come along in a couple of years needing to add one more character to the allowed list, and the natural tendency would be to add it to the end.

Replies are listed 'Best First'.
Re^3: Is there a way to make these two regex lines cleaner?
by AnomalousMonk (Archbishop) on Apr 17, 2022 at 22:55 UTC

    A hyphen placed at the beginning of a character class or tr/// search/replace list is also interpreted literally:

    Win8 Strawberry 5.8.9.5 (32) Sun 04/17/2022 18:43:19 C:\@Work\Perl\monks >perl use strict; use warnings; my $s = '123-abc-456'; $s =~ tr/-a-z//cd; print "'$s' \n"; $s = '123-xyz-456'; $s =~ s/[^-a-z]//g; print "'$s' \n"; ^Z '-abc-' '-xyz-'
    But one can argue that one is as likely to place new stuff at the start as at the end, so escaping remains wise. :)


    Give a man a fish:  <%-{-{-{-<

Re^3: Is there a way to make these two regex lines cleaner?
by kcott (Archbishop) on Apr 18, 2022 at 04:40 UTC

    I've been putting '-' at the end for a very long time (probably decades) and have never encountered the scenario you describe; however, I'm not averse to a bit of defensive programming. :-)

    Update: The remainder of what I originally wrote is just wrong: I'll put it down to an idiotic brain fart. I've stricken it and, because it was quite long, removed it to a spoiler.

    Escaping the '-' in transliteration does not work; it'll actually remove '\' characters. Here's some additional code, for the script I provided above, to demonstrate this.

    my $trans2 = '\\' . $str . '\\'; print "$trans2\n"; $trans2 =~ y/a-zA-Z0-9"' .?!\-//cd; printf $fmt, 'TRANS2', $trans2;

    Additional output:

    \*a%Z(5)-["'] <.?!>\ TRANS2 : aZ5-"' .?!

    See "perlop: Quote and Quote-like Operators" (first table indicates no interpolation) and, in "perlop: Quote-Like Operators", "y/SEARCHLIST/REPLACEMENTLIST/cdsr" (full details).

    Curiously, there seems to be a contradiction between what the documentation says and what I both expected and got. In "y/SEARCHLIST/REPLACEMENTLIST/cdsr" (my emphasis):

    "A hyphen at the beginning or end, or preceded by a backslash is also always considered a literal."

    That's from the 5.34.1 version of https://perldoc.perl.org/perl, I'm using 5.34.0, and clearly (from code and output) '\-' is being considered as two separate literals. I don't know if the documentation is simply wrong or, perhaps, badly worded and I'm misinterpreting it: I'd appreciate the thoughts of others with regard to this.

    — Ken