Re: Re: Re: Skipping special tags in regexes

The only problem is that each word in a string may not be unique, so you couldn't just plop things in a hash. Also the regexes might introduce new words someplace in the string that alreay existed in the string somewhere else. That why I have tried using diffing, but it was just too slow.

Comment on Re: Re: Re: Skipping special tags in regexes

Replies are listed 'Best First'.
Re: Re: Re: Re: Skipping special tags in regexes by CountZero (Bishop) on Dec 05, 2003 at 06:31 UTC
That's true, but I was not necessarily thinking of using a hash. An array based datastructure would probably be OK and it has the added benefit of preserving the sequence of the words: this would make it a lot easier to construct the"untagged" sentence for regex-purposes and thereafter, one could split the regexed-sentence on whitespace and compare this list with the array made by splitting the "original" list. All you have to do then is to walk the original list, adding tags to the regexed-list where necessary and skipping the newly inserted words in the regexed-list. You might still have a problem in cases where you introduce duplicate words next to one another. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]

Replies are listed 'Best First'.

Re: Re: Re: Re: Skipping special tags in regexes
by CountZero (Bishop) on Dec 05, 2003 at 06:31 UTC

An array based datastructure would probably be OK and it has the added benefit of preserving the sequence of the words: this would make it a lot easier to construct the"untagged" sentence for regex-purposes and thereafter, one could split the regexed-sentence on whitespace and compare this list with the array made by splitting the "original" list.

All you have to do then is to walk the original list, adding tags to the regexed-list where necessary and skipping the newly inserted words in the regexed-list. You might still have a problem in cases where you introduce duplicate words next to one another.

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

[reply]