Re^2: Problem with a text-parsing regex

Thank you -- I think you've nailed it.

I'd never thought about using a character-at-a-time approach as you did to handle the first problem. I just assumed it would be much less efficient than trying to use [[:word:]]+, for example. But there's probably no basis for that assumption. (Premature optimization!) That could make it easier in the future for me to tackle these complicated scenarios.

I use v5.34.1, and will take a look at regex_sets.

Comment on Re^2: Problem with a text-parsing regex Download Code

Replies are listed 'Best First'.
Re^3: Problem with a text-parsing regex by hv (Prior) on May 07, 2022 at 23:17 UTC
I'd never thought about using a character-at-a-time approach as you did to handle the first problem. I just assumed it would be much less efficient than trying to use `[[:word:]]+`, for example. It will be less efficient - but I would always recommend solving the problem first, and worrying about optimization second. In the general case, a regular expression that has to invoke more regops (regexp operations) will usually be slower than one that invokes fewer; but the cost will be less than invoking more ops at the perl level.	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: Problem with a text-parsing regex
by hv (Prior) on May 07, 2022 at 23:17 UTC

I'd never thought about using a character-at-a-time approach as you did to handle the first problem. I just assumed it would be much less efficient than trying to use [[:word:]]+, for example.

It will be less efficient - but I would always recommend solving the problem first, and worrying about optimization second.

In the general case, a regular expression that has to invoke more regops (regexp operations) will usually be slower than one that invokes fewer; but the cost will be less than invoking more ops at the perl level.

[reply]
[d/l]