discoTab has asked for the wisdom of the Perl Monks concerning the following question:

I am splitting fields based on a ~, and need to find any strings that begin and end with quotation marks (ie. "disco"). My problem is I have strings that contain a ~, which breaks my split: "disco~tab". My initial thought was to use a conditional in the regex to find any string that begins with but does not end with a '"' and load it into $a1 and then find the next string which ends with a '"' and load it into $a2, and then concatenate the two strings. Now I find that there is no good way to do this with regex due to the lack of an AND conditional, so I'm kind of at a loss... Any help would be grateful!

Replies are listed 'Best First'.
Re: RegEx Perl Newbie Question
by AppleFritter (Vicar) on Aug 18, 2014 at 21:07 UTC

    Howdy discoTab, welcome to the Monastery!

    This is a frequently-asked question, so it's got its own answer in one of FAQs, specifically perlfaq4: How can I split a [character]-delimited string except when inside [character]? (I'd link to it, but Perlmonks' link syntax does not play nicely with fragments that contain square brackets.) Your best bet is to use Text::ParseWords, a core module that comes with Perl.

        Neat, thanks! I'd tried the HTML entities the Monastery usually requires for literal square brackets (which did not work), but for some reason I'd not thought of using percent-encoding. Another trick learned! (And who says you can't teach an old pony new tricks?)
Re: RegEx Perl Newbie Question
by AnomalousMonk (Archbishop) on Aug 19, 2014 at 00:42 UTC

    And there's always Text::CSV_XS:

    c:\@Work\Perl>perl -wMstrict -le "use Text::CSV_XS; use Data::Dump; ;; my $s = 'foo~bar~\"disco~tab\"~quux'; print qq{'$s'}; ;; my $csv = Text::CSV_XS->new({ sep_char => '~' }) or die qq{Text::CSV_XS->new failed: }, Text::CSV_XS->error_diag; ;; $csv->parse($s) or die qq{parse failed: }, Text::CSV_XS->error_diag; my @fields = $csv->fields; dd \@fields; " 'foo~bar~"disco~tab"~quux' ["foo", "bar", "disco~tab", "quux"]

Re: RegEx Perl Newbie Question
by discoTab (Initiate) on Aug 19, 2014 at 15:06 UTC
    Thanks all!!! Text::ParseWords was exactly what I needed!

    My initial split:

    @ColumnValue = split(/~/ , $_);

    Fixed by

    @ColumnValue = quotewords('~', 1, $_);

    Way more elegant than the convoluted IF regex concatenation path that I was going down!