http://qs1969.pair.com?node_id=539837


in reply to Regular Expresssion TroubleShoot Help plz

update: No -- Crackers2 is correct. Sorry. First time I've wanted to downvote my own post.

Well, it seems your character class isn't working the way you expect. I usually find the print statement to be an excellent debugger. I modified your code a bit -- first, I didn't see a particular need for the \A in this short a sample. I'm also used to looking at regexes without whitespace, and I'm not sure why you used both \s and \m modifiers (aren't they contradictory?){update: never mind that last -- I found "both s and m modifiers (//sm): Treat string as a single long line, but detect multiple lines. '.' matches any character, even "\n". ^ and $, however, are able to match at the start or end of any line within the string." in the docs}
$foo = qq{^snafu^|^foobar^\n}; $foo =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2; print "\n2nd try\n"; $foo2 = qq{^snafu1|^foobar^\n}; $foo2 =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2;
yields
H:\script>perl majingz.pl text: ^ field: snafu^|^foobar 2nd try text: ^ field: snafu1|^foobar
Telling me your [^\1]+class sucked up everything from s to r, and then the \1 kicked in for the fourth ^. So, your backreference isn't working inside a character class. This isn't quite so surprising (to me, anyway) since a character class doesn't follow many standard regex rules (a period inside a character class, for example, is just a period, escaped or not). I don't see any hard documentation on the failure of backreferences in character classes, but it makes sense to me.

What is somewhat surprising to me is that for the second try the match for the second try (where I assume the "1" is part of the character class), the "1" doesn't trigger the class. I'm guess this is because the "\" is the escape character. I do note that if I double escape (i.e., [^\\1]), I get the expected result of that class matching on the "1".

I wish I could tell you how to resolve your situation, but I think it's a difficult one: parsing csv's is not an easy task. That's one reason there's a module.