Knoperl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Most Intelligent Monks:

I have a Perl program that works properly on my ActiveState 5.10.1 Build 1006 (Windows XP) box. I have shown the code and the input/outputs below. It essentially opens a file, goes line by line, and deletes the last comma separated value.

Example of Input Text File:

dog,cat,horse,bird sheep,cow fish,mouse,rat,tiger,lion

Here is the working Perl Program:

#!/usr/bin/perl -w use strict; #!/usr/local/bin/perl open (MYFILE, 'test2.txt'); while (<MYFILE>) { chomp; $_ =~ s/,(?:"[^"\r\n]*"|[^,\r\n]*)$//m; print "$_\n"; } close (MYFILE);

Would result in this output:

dog,cat,horse sheep fish,mouse,rat,tiger

What I need is for the REGEX used in the program copied below:

$_ =~ s/,(?:"[^"\r\n]*"|[^,\r\n]*)$//m;

to use a backslash (i.e. "\") instead of a comma so that this happens:

Input:

dog\cat\horse\bird sheep\cow fish\mouse\rat\tiger\lion

would give me the following Output:

dog\cat\horse sheep fish\mouse\rat\tiger

I have tried putting in  =~ s/\\((?:"[^"\r\n]*"|[^,\r\n]*)$//m; and it still does not work.

I do not want to use different modules like Text::CSV_XS or Text::xSV since I am installing this on multiple Windows machines (Yes I know about CAVA, etc) and think that this should be simple to fix my REGEX to make it work with the backslash "\".

As someone completely undeserving, I most humbly beg your kindly assistance.

Replies are listed 'Best First'.
Re: Need help with regex
by kennethk (Abbot) on Dec 16, 2009 at 19:59 UTC
    The simple fix is to replace all occurrences of commas in your regular expression by escaped backslashes, not just the first, i.e:

    s/\\(?:"[^"\r\n]*"|[^\\\r\n]*)$//m;

    I would also point out that $_ =~ is unnecessary since s/// implicitly binds to the magic variable $_ when you don't provide an explicit binding.

    And you should probably be using Text::CSV with the pure Perl implementation Text::CSV_PP if you don't want to deal with XS on windows. And barring that, split is usually a better substitute.

      Thank you sooooooo much kennethk!!!!

      I will look at Text::CSV_PP. And I will do my best to study harder REGEX!!!