Re: Is there a way to make these two regex lines cleaner?

hv wrote: "... most characters do not need escaping in a character class, probably only '-' and ']'.".

See "perlrecharclass: Special Characters Inside a Bracketed Character Class" for a discussion of this.

You have duplicated double-quote in the class ('\"' and later '"') so you can lose one of those. Also note that '-' is special because it indicates a range; however, when it's the last character in the class, there is no range; so no special meaning and no escape required.

haukex suggested transliteration "should be a bit faster". In my experience, it is a lot faster. See "Search and replace or tr" in "Perl Performance and Optimization Techniques". If your line is just by itself, the improvement is unlikely to be noticeable; however, if it occurs in a loop, or a frequently called routine, it could make a big difference: run your own Benchmark to determine this.
[In case you didn't know, y/// and tr/// are synonymous.]

Here a script that shows the various points I made:

#!/usr/bin/env perl

use strict;
use warnings;

my $str = q{*a%Z(5)-["'] <.?!>};
my $fmt = "%12s : %s\n";

my $op_sub = $str;
$op_sub =~ s/[^a-zA-Z0-9\-\"\'\ \.\?\!"]//g;
printf $fmt, 'OP', $op_sub;

my $no_dup_quote = $str;
$no_dup_quote  =~ s/[^a-zA-Z0-9\-\"\'\ \.\?\!]//g;
printf $fmt, 'NO_DUP_QUOTE', $no_dup_quote;

my $less_esc = $str;
$less_esc =~ s/[^a-zA-Z0-9\-"' .?!]//g;
printf $fmt, 'LESS_ESC', $less_esc;

my $no_esc = $str;
$no_esc =~ s/[^a-zA-Z0-9"' .?!-]//g;
printf $fmt, 'NO_ESC', $no_esc;

my $trans = $str;
$trans =~ y/a-zA-Z0-9"' .?!-//cd;
printf $fmt, 'TRANS', $trans;
[download]

Output:

          OP : aZ5-"' .?!
NO_DUP_QUOTE : aZ5-"' .?!
    LESS_ESC : aZ5-"' .?!
      NO_ESC : aZ5-"' .?!
       TRANS : aZ5-"' .?!
[download]

So, if you choose substitution: s/[^a-zA-Z0-9"' .?!-]//g
if transliteration (would be my choice): y/a-zA-Z0-9"' .?!-//cd

— Ken

Comment on Re: Is there a way to make these two regex lines cleaner? Select or Download Code

Replies are listed 'Best First'.
Re^2: Is there a way to make these two regex lines cleaner? by hv (Prior) on Apr 17, 2022 at 17:01 UTC
So, if you choose substitution: `s/[^a-zA-Z0-9"' .?!-]//g` if transliteration (would be my choice): `y/a-zA-Z0-9"' .?!-//cd` I'd still be inclined to escape the '-' in both cases: someone is all too likely to come along in a couple of years needing to add one more character to the allowed list, and the natural tendency would be to add it to the end.	[reply] [d/l] [select]
Re^3: Is there a way to make these two regex lines cleaner? by AnomalousMonk (Archbishop) on Apr 17, 2022 at 22:55 UTC
A hyphen placed at the beginning of a character class or `tr///` search/replace list is also interpreted literally: `Win8 Strawberry 5.8.9.5 (32) Sun 04/17/2022 18:43:19 C:\@Work\Perl\monks >perl use strict; use warnings; my $s = '123-abc-456'; $s =~ tr/-a-z//cd; print "'$s' \n"; $s = '123-xyz-456'; $s =~ s/[^-a-z]//g; print "'$s' \n"; ^Z '-abc-' '-xyz-'` [download] But one can argue that one is as likely to place new stuff at the start as at the end, so escaping remains wise. :) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Is there a way to make these two regex lines cleaner? by kcott (Archbishop) on Apr 18, 2022 at 04:40 UTC
I've been putting '`-`' at the end for a very long time (probably decades) and have never encountered the scenario you describe; however, I'm not averse to a bit of defensive programming. :-) Update: The remainder of what I originally wrote is just wrong: I'll put it down to an idiotic brain fart. I've stricken it and, because it was quite long, removed it to a spoiler. <Reveal this spoiler or all in this thread> — Ken	[reply] [d/l] [select]