Getch has asked for the wisdom of the Perl Monks concerning the following question:

Monks --

I'm sure that I am just overlooking something here, but this is driving me bananas. I am trying to process delimited files, whose lines look like this:

acb,def,123,456

etc. The delimiter might be any printable character, including regexp meta characters like ^, |, etc. I need to read the lines from the input, split them into fields, "do stuff", join them back together with (probably) a different delimiter, and then write them to the output. In the process, I need to do a match on the input lines to make sure that they don't already contain the output delimiter.

The rub is that I am trying to allow the user to specify the delimiters on the command line. I'm getting them with getopts just fine, but I run into problems when I'm doing the match (m//) on the new delimiter and split with the old delimiter, when those happen to be regexp meta characters.

$foo = '^'; m/$foo/

isn't working for me no matter how I escape it. I think that the problem is that Perl processes the string twice, once to interpolate the variables and a second to interpret the regexp. What would be ideal for me is a way to tell Perl that this isn't a regexp, just a literal string, and it should ignore the meta character.

What am I missing here?

Thanks!

-- David


Update:

I've got it working. One thing I forgot to mention is that the input delimiter might be a tab. /Q..../E and quotemeta on which they depend escape everything except alphanumerics, which wasn't adequate for me, because they quote the tab.

What I did instead is this:

my $str = "acb^def^123^456"; my $foo = '^'; $foo =~ s/([\\\|\(\)\[\{\^\$\*\+\?\.])/\\$1/; my @fields = split /$foo/, $str; my $out = join '|', @fields; print "$out\n";

which explicitly quotes the metacharacters and only the metacharacters. This also works for:

my $str = "acb\tdef\t123\t456"; my $foo = "\t";

and should work for any other delmiter, although I have not tested everything. Using an existing delimiter parser is less appealing to me because my (ridiculous) incoming data sources think that

"first,field","second,""field"with','junk","third'field

is three fields. My current program doesn't handle quoted delmiters, but soon I wil have to.

Thanks much for your help!

-- David

Replies are listed 'Best First'.
Re: escaping variables in regexp
by blazar (Canon) on Oct 12, 2006 at 22:14 UTC
    The rub is that I am trying to allow the user to specify the delimiters on the command line.

    You want to read about \Q and \E in perldoc perlre, or about quotemeta.

Re: escaping variables in regexp
by GrandFather (Saint) on Oct 12, 2006 at 22:16 UTC

    Something like:

    use strict; use warnings; my $str = 'acb^def^123^456'; my $foo = '^'; my @fields = split /\Q$foo\E/, $str; print join '|', @fields;

    Prints:

    acb|def|123|456

    DWIM is Perl's answer to Gödel
Re: escaping variables in regexp
by swampyankee (Parson) on Oct 12, 2006 at 22:45 UTC

    You could also read about quotemeta

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: escaping variables in regexp
by grep (Monsignor) on Oct 13, 2006 at 02:35 UTC
    Or you can skip the whole regex for parsing delimited data. Text::CSV_XS is not just for commas anymore. Text::CSV_XS also has the advantage of dealing with escaped and quoted data.

    For brevity, error checking was not added to the code.

    use strict; use warnings; use Text::CSV_XS; # Setup a CSV obj to parse the user seperator my $csv_in = Text::CSV_XS->new({ sep_char => '^' }); # Setup a CSV obj to handle the output seperator my $csv_out = Text::CSV_XS->new({ sep_char => '|' }); open(IN,"in_file") or die "$!\n"; my %hash; while (my $line = <IN>) { my $status = $csv_in->parse($line); my @cols = $csv_in->fields(); $hash{$cols[0]} = $cols[1]; # or whatever } foreach (keys %hash) { my $string = $csv_out->combine($_,$hash{$_}); print "$string\n"; }


    grep
    One dead unjugged rabbit fish later