comment on

Monks,

I appologize in advance for this being a RegEx rather than Perl specific question.

I'm trying to clean up a list of data entered in a free-text field (who needs validation anyway). For the most part the data consists of one or more numbers (which I want to keep) and sometimes color "words" which might appear before, after or between the number. I want to discard some specific colors but not other colors or other text. I'm trying to craft a regex to match the following and remove it:

The word Red and variations thereof (R, r, rd etc.); multiple occurrences; any case

Optionally surrounded by various spaces and parenthesis; not important if they are matched

So my current regex substitution looks something like:

s/\s*$?\s*re?d?\s*$?\s*//gi

This seemed to be working flawlessly until my spot checks revealed the following humorous example:

12345 Gray 6789 Red => 12345 Gay 6789

To avoid workplace embarrassment I thought it best to make sure that the bit I was removing occurred either just before or just after a number rather than in the middle of other text. So my thought is to modify the regex to somthing like

s/(\d?)\s*$?\s*re?d?\s*$?\s*(\d?)//gi

The problem is that I can't leave both digits optional (as shown) or I'm still in the same boat. I also can't make either one mandatory or I'm dictating a before-number-only or after-number-only match. What I really want is one or the other (or both) but not neither.

As you might guess from the parens around the digits, I also considered checking what matched in the second part and substituting back in the original if I didn't see a digit. However, I ran in to various problems ($1 being undefined, the A?B:C syntax not working inside the regex, etc.)

So, is there some nice way to do this in a single regex? Can I somehow ask that the regex match one or more of two disjointed parts?

Thanks,
Cefu

Update: Found a solution that almost matches my requirements and is actually better for what I needed. I'd downvote myself for not thinking this through first if I could. :)

s/(^|\d)\s*$?\s*re?d?\s*$?\s*($|\d)/$1$2/gi

While messing around with getting a conditional to work on the right side of the substitution I noticed that it happily substituted no characters when $1 or $2 are undefined. It does the same, without complaint, when they match the anchors ^ and $. So, rather than shoot for some weird hybrid of optional and mandatory I decided to make them both mandatory but with palatable alternatives (the anchors).

Where this differes from my requirements is that it will not match a Red-like word if there is a digit on one side and more text (rather than the beginning or end of the string) on the other. So, for example, with my new regex:

(Red) 123 Reddish-Orange 456 Orange 789 => 123 Reddish-Orange 456 Orange 789

Whereas if someone had come up with a way to get the behavior I asked for above it would have done this:

(Red) 123 Reddish-Orange 456 Orange 789 => 123 dish-Orange 456 Orange 789

The results of my number-or-edge-of-string before and after suit me better.

Requirements translate into code or code translates into requirements..... I never can seem to remember how that's supposed to go. :)

In reply to RegEx to match at least one non-adjacent term by Cefu

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.