in reply to Perl is returning... odd results... from regular expressions. Things matching when they shouldn't, and stuff like that.

First off, may we see an excerpt of the relevant records?
Is it a CSV file or database record? It obviously can't be both.

split /(\".*?\"(?=,))|(.*?(?=,))|(.*?(?=\n))/

Using $1 which results from your capturing parentheses in addition with split seems weird. Imagine, you're excluding the chunks specified by the pattern from the resulting list and capturing the values which match the pattern (\".*?\"(?=,))|(.*?(?=,))|(.*?(?=\n)) itself.

The lookaheads are okay though.

  • Comment on Re: Perl is returning... odd results... from regular expressions. Things matching when they shouldn't, and stuff like that.
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Perl is returning... odd results... from regular expressions. Things matching when they shouldn't, and stuff like that.
by Groxx (Novice) on Jan 11, 2007 at 08:32 UTC
    I guess it wasn't clear enough earlier, sorry. This is for a CSV file so I can nab the relevant data, convert it, and spit it out as a different, re-ordered CSV file for importing. I need to be able to convert between the two CSV data formats (accounting application and website), as the two main pieces of software are completely incompatible with each other. They can both import and export CSV, though, so I figured this would probably be the easiest, most flexible option (learning Perl aside).

    A few chunks of the CSV file (not complete lines, just representative of all the circumstances that could cause problems, with some extra data) are below:

    Bag10x8x24,Poly Bags 10x8x24 gussetted,1,FALSE,FALSE,,"Poly Bags 10x8x +24 gussetted metallocene bags; Assoc. Bag # 264-4-64 (500/carton, 1 c +arton min)",0.00,NC,0,0.00,0.00,NC,0,0.00,0.00,NC Bag2.5x3zip,2.5x3x.004 zip lock bag,1,FALSE,FALSE,,"2-1/2 x 3x .004" z +ip lock bags with hang hole. Assoc Bag item #274-03H",0.00,NC,0,0.00, +0.00,NC,0,0.00,0.00,PL1*0.6700000,0,0.00,0.00 H06045-fullthd,"M6 x 45 hex cap scrw, full thd",1,FALSE,FALSE,"M6 x 45 + hex cap scrw, full thd, class 8.8, zinc (C)","M6 x 45 hex cap scrw, +full thd, class 8.8, zinc, Bossard article # 1049577",0.16,NC,0,0.00, +0.00,NC,0,0.00,0.10,PL1*0.6300000,0,0.00,0.10

    There can effectively be any number of quotes or commas inside a quote-delimited field (though I'm not sure what the export does if a quote mark is followed by a comma in a description field... it hasn't happened before though, and it's not really a concern as it's easily enough avoided), and there can effectively be any number of quoted fields per line. There are also many blank (no data at all) fields, ALL of which have to be tracked and accounted for.

    As to the split, I noticed while reading through my Perl book that, when given parenthesis, split// returns the results of the matches (normally discarded) and the remaining data (normally retained). If nothing else, I figured it'd be handy for double-checking my regular expressions, as I could see what it dropped too.

    Thanks for the reply!