Monks --

I'm sure that I am just overlooking something here, but this is driving me bananas. I am trying to process delimited files, whose lines look like this:

acb,def,123,456

etc. The delimiter might be any printable character, including regexp meta characters like ^, |, etc. I need to read the lines from the input, split them into fields, "do stuff", join them back together with (probably) a different delimiter, and then write them to the output. In the process, I need to do a match on the input lines to make sure that they don't already contain the output delimiter.

The rub is that I am trying to allow the user to specify the delimiters on the command line. I'm getting them with getopts just fine, but I run into problems when I'm doing the match (m//) on the new delimiter and split with the old delimiter, when those happen to be regexp meta characters.

$foo = '^'; m/$foo/

isn't working for me no matter how I escape it. I think that the problem is that Perl processes the string twice, once to interpolate the variables and a second to interpret the regexp. What would be ideal for me is a way to tell Perl that this isn't a regexp, just a literal string, and it should ignore the meta character.

What am I missing here?

Thanks!

-- David


Update:

I've got it working. One thing I forgot to mention is that the input delimiter might be a tab. /Q..../E and quotemeta on which they depend escape everything except alphanumerics, which wasn't adequate for me, because they quote the tab.

What I did instead is this:

my $str = "acb^def^123^456"; my $foo = '^'; $foo =~ s/([\\\|\(\)\[\{\^\$\*\+\?\.])/\\$1/; my @fields = split /$foo/, $str; my $out = join '|', @fields; print "$out\n";

which explicitly quotes the metacharacters and only the metacharacters. This also works for:

my $str = "acb\tdef\t123\t456"; my $foo = "\t";

and should work for any other delmiter, although I have not tested everything. Using an existing delimiter parser is less appealing to me because my (ridiculous) incoming data sources think that

"first,field","second,""field"with','junk","third'field

is three fields. My current program doesn't handle quoted delmiters, but soon I wil have to.

Thanks much for your help!

-- David


In reply to escaping variables in regexp by Getch

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.