I'm learning perl, and am trying to apply it to convert a CSV database so I can upload the modified database to another location. The wonderful thing being that the CSV one program puts out can have quoted strings, which can contain commas and quotation marks that are unmarked/unescaped in any way. By fiddling around, and trying the regular expression in other applications that have regexp support, I think I've figured out an expression that works (by matching the data I need, not the commas. I couldn't figure out how to make more complicated (ie, full regular expression based) lookbehinds. Is this possible?)

split /(\".*?\"(?=,))|(.*?(?=,))|(.*?(?=\n))/

Parenthesis pairs for preserving data. This also gives me the fun part of blank results, which I take care of below, and with an is-defined check before doing anything with the data.

Now comes the fun part: logically speaking, I can't find anything wrong with it, and every application I've tested it in has worked perfectly. When I slap it into a Perl program, it doesn't. I can post my whole program and a testable part of the database if it's needed.

Then, in the same program (and working off the split string from the expression), I have a statement like this:

my $j=0; if (m/^,/, $i){ print $j++ . ": " . $i . "\n"; }

It helps me weed out blank and non-relevant results. One problem comes up. The remaining results look kinda like this:

0: item1 1: ,field2 2: ,more data

etc etc. The first result is what's getting my attention. There's no comma leading the data, but it matches anyway... how? And where did the commas come in in the first place? They're not part of the initial expression's results...

Any help? I've fiddled with this thing for a few hours, and each time the results of the regular expression fail to make any sense, matching data that doesn't match, and not separating data correctly. The most blatant example would be the above example, where m/^,/ matches a string that doesn't have a comma at the beginning.

If it matters, perl -v in my terminal returns this (and yes, I'm running OSX):

This is perl, v5.8.6 built for darwin-thread-multi-2level

(with 3 registered patches, see perl -V for more detail)

Copyright 1987-2004, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page.

Thanks!


In reply to Perl is returning... odd results... from regular expressions. Things matching when they shouldn't, and stuff like that. by Groxx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.