in reply to Comma separated list into a hash

I'll give you another possible approach to split which is what most people here pointed you to. In fact it's its total opposite: split allows you to specify what is between what you want to grab, my in approach you have to specify what you want to grab. A simple code example is:
my @words = /\w+/g;
Parens are optional in this case, it acts the same as this:
my @words = /(\w+)/g;

Now, as an exercise, how does it work? :)

p.s. The code abbove expects the string in $_. If you want to use a different variable, use the syntax:

my @words = $string =~ /(\w+)/g;

Replies are listed 'Best First'.
Re: Re: Comma separated list into a hash
by dragonchild (Archbishop) on Apr 26, 2004 at 12:26 UTC
    Your approach grabs words. The issue the OP is attempting to solve is to grab stuff between commas. A better approach would have been to do something like: But, that suffers from the same problems that split does when dealing with CSV data, specifically how to handle commas that belong in the data value. This is why regexen and split are poor choices for dealing with CSV data. Parsers are appropriate.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      how to handle commas that belong in the data value

      If quotes are used to disambiguate, it's fairly easy to parse with a regex:

      Of course, that suffers from other problems, such as going all wonky with unbalanced quotes. But it's a fairly simple way to parse well-formed data, and I thought it might be helpful for some to look at.

        Oooh, close. It is a good regex, but it suffers from the following issues:
        1. As you say, it won't handle mal-formed data. A major part of a parser's job is to detect data that doesn't conform to the specification. Parsing XML would be easier with a regex if you didn't have to handle error conditions ... *grins*
        2. If you have your element surround by "'s, then an embedded " is encoded as "".
        3. You assume that the element will be surround by double-quotes, but single-quotes / apostrophes are also legal
        4. Embedded newlines are also legal, but your regex won't handle them. (Text::CSV doesn't handle them, either, but Text::xSV does.)
        5. This is a nit, but you don't handle whitespace at the end of the line. A simple \s* would handle that.
        6. You don't handle whitespace between the closing double-quote and the comma. </ol

          ------
          We are the carpenters and bricklayers of the Information Age.

          Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose