Re: Need to speed up many regex substitutions and somehow make them a here-doc list

See haukex's article Building Regex Alternations Dynamically:

Win8 Strawberry 5.8.9.5 (32)  Sat 10/01/2022 17:18:27
C:\@Work\Perl\monks
>perl

use strict;
use warnings;

use Data::Dump qw(dd);  # for debug

my $text = <<'TEXT';
Regular expressions have the undeserved reputation
of being abstract and difficult to understand.
TEXT
print "before ---$text--- \n";

my @regexlist = split /\n/, <<'REGEX';
a A
i I
e E
REGEX

my %replace = map split, @regexlist;
# dd \%replace;  # for debug

my ($rx_search) =
    map  qr{ $_ }xms,
    join ' | ',
    map  quotemeta,
    reverse sort
    keys %replace
    ;
# dd $rx_search;  # for debug

$text =~ s{ ($rx_search) }{$replace{$1}}xmsg;
print "after +++$text+++ \n";

^Z
before ---Regular expressions have the undeserved reputation
of being abstract and difficult to understand.
---
after +++REgulAr ExprEssIons hAvE thE undEsErvEd rEputAtIon
of bEIng AbstrAct And dIffIcult to undErstAnd.
+++
[download]

Update: This approach assumes each text file can be slurped to memory; 2-100 MB should be no problem. It also assumes the number of substitutions is "reasonable"; 150-1000 should be no problem. Care must be exercised in building the $rx_search regex if it is more complex than shown in the example; see haukex's article for tips on this. I have no idea how fast this approach is versus the one you're using now. Good luck :)

Give a man a fish: <%-{-{-{-<

Comment on Re: Need to speed up many regex substitutions and somehow make them a here-doc list Select or Download Code

Replies are listed 'Best First'.
Re^2: Need to speed up many regex substitutions and somehow make them a here-doc list by LanX (Saint) on Oct 02, 2022 at 10:02 UTC
> This approach assumes each text file can be slurped to memory; 2-100 MB should be no problem The OP could slice the input into big chunks separated at newline boundaries. If that's not possible he could alternatively use a sliding window which always continues at the `pos` where the last replacement ended. On a side note, your `map qr{...} join ...` irritated me a bit, because the processed list has only one element. Not sure if that's the clearest style. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^3: Need to speed up many regex substitutions and somehow make them a here-doc list by AnomalousMonk (Archbishop) on Oct 02, 2022 at 20:02 UTC
... your `map qr{...} join ...` irritated me a bit, because the processed list has only one element. Yeah, that gets to me a bit too, whenever I use it. But that syntax is used in haukex's original article, so I'm willing to consider it an "idiom." :) The important point is that the regex elements be somehow converted into a regex object. It's at this stage that any necessary boundary assertions are added. The only reasonable alternative I can see is something like `my $rx_search = join ' \| ', map quotemeta, reverse sort keys %replace ; $rx_search = qr{ ... $rx_search ... }xms;` [download] That's slightly more irritating to me and doesn't seem to clarify anything either. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Need to speed up many regex substitutions and somehow make them a here-doc list by LanX (Saint) on Oct 02, 2022 at 22:39 UTC
> `$rx_search = qr{ ... $rx_search ... }xms;` Ok it's somehow "wasting" a variable, but `my $rx_search = qr{$joined_search}xms;` wouldn't really irritate me. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]