in reply to Curious Regex

You could use a two-stage approach: in the first stage you extract the entire part from <\/Mil to \x9D. In the second stage, you remove any control characters from the extracted substring:

sub remove_ctrls { my $s = shift; $s =~ tr/\x90\x8F//d; return $s; } $text =~ s/(<\/Mil.*?>.*?\x9D)/remove_ctrls($1)/esg;

The /e option makes the substitution part be treated as Perl code, i.e. it calls remove_ctrls() with the extracted substring.

Personally, I find this easier to read than overcomplex regexes which would do it all in one go... YMMV, of course.

Replies are listed 'Best First'.
Re^2: Curious Regex
by HamNRye (Monk) on Feb 11, 2009 at 19:16 UTC

    Thanks... That helps with readability and is easy enough to understand. I don't use the "subroutines in regex" very often and just didn't think of it.

    I've got the code in place and it's working like a champ. the </Mil> tags are actually font declarations (Miller) so this will provide me some reusability of the subroutine if files start popping up with other fonts used.

    Thanks for the help Monks! It is very much appreciated.