in reply to Re: Reusing a complex regexp in multiple spots, escaping the regexp
in thread Reusing a complex regexp in multiple spots, escaping the regexp
Disclaimer: the following "code" is not only an untested one, it's even AI generated, and is meant as a conceptional demo.
Personally I would simplify it even further if I knew the full problem's domain and use better variable names and more /x and /i modifiers. (The handling of upper/lower case seems inconsistent)
On a side note: the various snippets are now much easier testable against the expected input, and these samples could be added to the documentation.
I don't see how Marpa can add more to this.
# ------------------------------------------------- # 1. Basic building blocks # ------------------------------------------------- my $hex2 = qr/[0-9A-Fa-f]{2}/; # exactly 2 hex digits my $hex2p = qr/[0-9A-Fa-f]{2,}/; # 2 or more hex digits my $hexXp = qr/[0-9A-Fa-fXx]{1,4}/; # 1‑4 hex/X digit +s my $hexXp8 = qr/[0-9A-Fa-fXx]{1,8}/; # 1‑8 hex/X digit +s my $opt_h = qr/[Hh]?/; # optional trailing H/h my $quotes = qr/"[^"]*"/; # optional quoted litera +l # ------------------------------------------------- # 2. Re‑usable sub‑patterns # ------------------------------------------------- my $int = qr/\bINT\s?$hex2$opt_h/; # INT <byte> my $mem_addr = qr/\bMEM\s?$hexXp:$hexXp$opt_h/; # MEM <addr>:<addr> my $mem_long = qr/\bMEM\s?$hexXp8$opt_h/; # MEM <addr> my $port_rng = qr/\bPORT\s?$hexXp$opt_h\-$hexXp$opt_h/; # PORT <range> my $port_one = qr/\bPORT\s?$hexXp$opt_h/; # PORT <single> my $hash_ref = qr/\#[0-9A-Z][0-9]{4}\b/; # #<letter><4 digits> my $ea_xhl = qr/\b(?:E?A[XHL])=$hex2p$opt_h/; # EA/XHL assignment my $reglist = qr/ (?:E?[ABCD][XHL]|E?[SD]I|E?[SB]P|[DESC]S)=$hex2p$opt_h /x; # a single register ent +ry my $opt_reglist = qr/ (?:\/$reglist)+ /x; # one or more "/<reg>" +entries # ------------------------------------------------- # 3. Full pattern with named captures # ------------------------------------------------- my $linking_re = qr/ # 1. INT with optional register list and optional quoted literal (?<int_full> $int $opt_reglist? $quotes? ) | # 2. INT with only optional quoted literal (?<int_simple> $int $quotes? ) | # 3. EA/XHL with repeated register list and optional quoted litera +l (?<ea_xhl_full> $ea_xhl (?:\/$reglist)* $quotes? ) | # 4. Hash reference (e.g. #A1234) (?<hash_ref> $hash_ref ) | # 5. MEM range (addr:addr) with optional quoted literal (?<mem_range> $mem_addr $quotes? ) | # 6. MEM single address with optional quoted literal (?<mem_simple> $mem_long $quotes? ) | # 7. @‑reference (addr:addr) with optional quoted literal (?<at_ref> \@$hexXp:$hexXp$opt_h $quotes? ) | # 8. PORT range with optional quoted literal (?<port_range> $port_rng $quotes? ) | # 9. PORT single value with optional quoted literal (?<port_simple> $port_one $quotes? ) /x; # ------------------------------------------------- # 4. Usage example # ------------------------------------------------- if ( $linking =~ $linking_re) { my %cap = %+; # hash of all named captures if ( $cap{int_full} ) { # handle INT with register list } elsif ( $cap{mem_range} ) { # handle MEM range } # …other branches as needed }
Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Reusing a complex regexp in multiple spots, escaping the regexp
by ecm (Initiate) on Apr 13, 2026 at 12:11 UTC | |
by LanX (Saint) on Apr 13, 2026 at 13:30 UTC |