Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
update: No -- Crackers2 is correct. Sorry. First time I've wanted to downvote my own post.

Well, it seems your character class isn't working the way you expect. I usually find the print statement to be an excellent debugger. I modified your code a bit -- first, I didn't see a particular need for the \A in this short a sample. I'm also used to looking at regexes without whitespace, and I'm not sure why you used both \s and \m modifiers (aren't they contradictory?){update: never mind that last -- I found "both s and m modifiers (//sm): Treat string as a single long line, but detect multiple lines. '.' matches any character, even "\n". ^ and $, however, are able to match at the start or end of any line within the string." in the docs}
$foo = qq{^snafu^|^foobar^\n}; $foo =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2; print "\n2nd try\n"; $foo2 = qq{^snafu1|^foobar^\n}; $foo2 =~ m/(\W)([^\1]+)\1(\W)/; $text_qual = $1; $field_sep = $3; print "text: $text_qual\n"; print "field: $field_sep\n"; print $2;
yields
H:\script>perl majingz.pl text: ^ field: snafu^|^foobar 2nd try text: ^ field: snafu1|^foobar
Telling me your [^\1]+class sucked up everything from s to r, and then the \1 kicked in for the fourth ^. So, your backreference isn't working inside a character class. This isn't quite so surprising (to me, anyway) since a character class doesn't follow many standard regex rules (a period inside a character class, for example, is just a period, escaped or not). I don't see any hard documentation on the failure of backreferences in character classes, but it makes sense to me.

What is somewhat surprising to me is that for the second try the match for the second try (where I assume the "1" is part of the character class), the "1" doesn't trigger the class. I'm guess this is because the "\" is the escape character. I do note that if I double escape (i.e., [^\\1]), I get the expected result of that class matching on the "1".

I wish I could tell you how to resolve your situation, but I think it's a difficult one: parsing csv's is not an easy task. That's one reason there's a module.

In reply to Re: Regular Expresssion TroubleShoot Help plz by SamCG
in thread Regular Expresssion TroubleShoot Help plz by MajingaZ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-19 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found