I have a really nasty legacy flat file I'm trying to parse. I can convert the file, but right now my employer doesn't want to do this.

It appears that DBD::CSV or Text::CSV_XS (I'm thinking the latter actually) will only support a single character field separator and escape character. The current field separator is ' _ ' (that's space underscore space) with _UNSC being the escape string (in the original code it's really just replacing _ with _UNSC_ but it's basically the same thing).

Any pointers on making this work? Do I need to replace Text::CSV_XS with my own module (please, no) or can I make this work somehow?

Upon doing a quick search it appears that Text::xSV is a pure perl solution, but it's got a hard coded limitation that restricts the separator to 1 character as well. I'll see if I can figure out why, but my time is limited--my boss wants this soonest. If I can't make this work then I'll have to go back to the old code *shudder*

Update: It appears I wasn't as clear as I thought I was.

Sample data:
12345 _ value1 _ value2 _ value3 _ ...
12346 _ value1 _ value2 _ value3 _ ...
12347 _ valuewith_UNSC_ _ value2 _ value3 _ ...

I can convert this data easily enough to "standard" csv:

if ( open my $FH, "<dbfile" ) { my @newrows; for my $row ( <$FH> ) { chomp $row; push @newrows, ( join ',', map { s/"/""/g ; $_ = "\"$_\"" } split / _ /, $row ); } # print @newrows to file }

My ultimate goal is to convince my boss to move to a RDBMS of some kind. He *loves* being able to just open the data file and modify something quick and easy. And he sees no reason to learn SQL or hide his data in a binary format. So I need to get DBD::CSV working with the original file so that he can switch between my new code and the live code and see the same data. I can't convince him to change the format, he's used to working with it the way he is. Once I show him how much simpler the code is with DBD::CSV and SQL (albeit simplified via SQL::Statement) I can convince him to move to a RDBMS.

Harley J Pig

In reply to DBD::CSV and really bad legacy flat file by harleypig

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.