in reply to Regular Expresssion TroubleShoot Help plz
update: No -- Crackers2 is correct. Sorry. First time I've wanted to downvote my own post.
Well, it seems your character class isn't working the way you expect. I usually find the print statement to be an excellent debugger. I modified your code a bit -- first, I didn't see a particular need for the \A in this short a sample. I'm also used to looking at regexes without whitespace, and I'm not sure why you used both \s and \m modifiers (aren't they contradictory?){update: never mind that last -- I found "both s and m modifiers (//sm): Treat string as a single long line, but detect multiple lines. '.' matches any character, even "\n". ^ and $, however, are able to match at the start or end of any line within the string." in the docs}
What is somewhat surprising to me is that for the second try the match for the second try (where I assume the "1" is part of the character class), the "1" doesn't trigger the class. I'm guess this is because the "\" is the escape character. I do note that if I double escape (i.e., [^\\1]), I get the expected result of that class matching on the "1".
I wish I could tell you how to resolve your situation, but I think it's a difficult one: parsing csv's is not an easy task. That's one reason there's a module.
Well, it seems your character class isn't working the way you expect. I usually find the print statement to be an excellent debugger. I modified your code a bit -- first, I didn't see a particular need for the \A in this short a sample. I'm also used to looking at regexes without whitespace, and I'm not sure why you used both \s and \m modifiers (aren't they contradictory?){update: never mind that last -- I found "both s and m modifiers (//sm): Treat string as a single long line, but detect multiple lines. '.' matches any character, even "\n". ^ and $, however, are able to match at the start or end of any line within the string." in the docs}
yields$foo = qq{^snafu^|^foobar^\n}; $foo =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2; print "\n2nd try\n"; $foo2 = qq{^snafu1|^foobar^\n}; $foo2 =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2;
Telling me your [^\1]+class sucked up everything from s to r, and then the \1 kicked in for the fourth ^. So, your backreference isn't working inside a character class. This isn't quite so surprising (to me, anyway) since a character class doesn't follow many standard regex rules (a period inside a character class, for example, is just a period, escaped or not). I don't see any hard documentation on the failure of backreferences in character classes, but it makes sense to me.H:\script>perl majingz.pl text: ^ field: snafu^|^foobar 2nd try text: ^ field: snafu1|^foobar
What is somewhat surprising to me is that for the second try the match for the second try (where I assume the "1" is part of the character class), the "1" doesn't trigger the class. I'm guess this is because the "\" is the escape character. I do note that if I double escape (i.e., [^\\1]), I get the expected result of that class matching on the "1".
I wish I could tell you how to resolve your situation, but I think it's a difficult one: parsing csv's is not an easy task. That's one reason there's a module.
|
---|
In Section
Seekers of Perl Wisdom