in reply to Is there a way to make these two regex lines cleaner?

G'day bartender1382,

hv wrote: "... most characters do not need escaping in a character class, probably only '-' and ']'.".

See "perlrecharclass: Special Characters Inside a Bracketed Character Class" for a discussion of this.

You have duplicated double-quote in the class ('\"' and later '"') so you can lose one of those. Also note that '-' is special because it indicates a range; however, when it's the last character in the class, there is no range; so no special meaning and no escape required.

haukex suggested transliteration "should be a bit faster". In my experience, it is a lot faster. See "Search and replace or tr" in "Perl Performance and Optimization Techniques". If your line is just by itself, the improvement is unlikely to be noticeable; however, if it occurs in a loop, or a frequently called routine, it could make a big difference: run your own Benchmark to determine this.
[In case you didn't know, y/// and tr/// are synonymous.]

Here a script that shows the various points I made:

#!/usr/bin/env perl use strict; use warnings; my $str = q{*a%Z(5)-["'] <.?!>}; my $fmt = "%12s : %s\n"; my $op_sub = $str; $op_sub =~ s/[^a-zA-Z0-9\-\"\'\ \.\?\!"]//g; printf $fmt, 'OP', $op_sub; my $no_dup_quote = $str; $no_dup_quote =~ s/[^a-zA-Z0-9\-\"\'\ \.\?\!]//g; printf $fmt, 'NO_DUP_QUOTE', $no_dup_quote; my $less_esc = $str; $less_esc =~ s/[^a-zA-Z0-9\-"' .?!]//g; printf $fmt, 'LESS_ESC', $less_esc; my $no_esc = $str; $no_esc =~ s/[^a-zA-Z0-9"' .?!-]//g; printf $fmt, 'NO_ESC', $no_esc; my $trans = $str; $trans =~ y/a-zA-Z0-9"' .?!-//cd; printf $fmt, 'TRANS', $trans;

Output:

OP : aZ5-"' .?! NO_DUP_QUOTE : aZ5-"' .?! LESS_ESC : aZ5-"' .?! NO_ESC : aZ5-"' .?! TRANS : aZ5-"' .?!

So, if you choose substitution: s/[^a-zA-Z0-9"' .?!-]//g
if transliteration (would be my choice): y/a-zA-Z0-9"' .?!-//cd

— Ken

Replies are listed 'Best First'.
Re^2: Is there a way to make these two regex lines cleaner?
by hv (Prior) on Apr 17, 2022 at 17:01 UTC

    So, if you choose substitution: s/[^a-zA-Z0-9"' .?!-]//g

    if transliteration (would be my choice): y/a-zA-Z0-9"' .?!-//cd

    I'd still be inclined to escape the '-' in both cases: someone is all too likely to come along in a couple of years needing to add one more character to the allowed list, and the natural tendency would be to add it to the end.

      A hyphen placed at the beginning of a character class or tr/// search/replace list is also interpreted literally:

      Win8 Strawberry 5.8.9.5 (32) Sun 04/17/2022 18:43:19 C:\@Work\Perl\monks >perl use strict; use warnings; my $s = '123-abc-456'; $s =~ tr/-a-z//cd; print "'$s' \n"; $s = '123-xyz-456'; $s =~ s/[^-a-z]//g; print "'$s' \n"; ^Z '-abc-' '-xyz-'
      But one can argue that one is as likely to place new stuff at the start as at the end, so escaping remains wise. :)


      Give a man a fish:  <%-{-{-{-<

      I've been putting '-' at the end for a very long time (probably decades) and have never encountered the scenario you describe; however, I'm not averse to a bit of defensive programming. :-)

      Update: The remainder of what I originally wrote is just wrong: I'll put it down to an idiotic brain fart. I've stricken it and, because it was quite long, removed it to a spoiler.

      — Ken