jeiku has asked for the wisdom of the Perl Monks concerning the following question:

I have a file in the format:
"value 1","something else","other stuff" "asdf123","omg","hope this works" "more tests","testy","blah"
but my code doesn't work:
#!/usr/bin/perl $file = "new.txt"; open(FILE, $file) or die "$!\n"; @data = <FILE>; close(FILE); foreach $entry (@data) { ($var1, $var2, $var3) = split(/\"\W\"\,/, $entry); print "Var1: $var1\n"; print "Var2: $var2\n"; print "Var3: $var3\n"; }
I can't seem to get the regular expression in the split statement to work. This needs to match special characters, words and numbers inbetween
"",
Please help, thanks!

2006-03-30 Retitled by planetscape, as per Monastery guidelines
Original title: 'Regex help'

Replies are listed 'Best First'.
Re: Help with Regex in Split
by Corion (Patriarch) on Mar 29, 2006 at 08:01 UTC

    In general, you should be using one of the CSV parsing modules, like Text::xSV or DBD::AnyData, but if your data is really formatted and as simple as you describe it, the approach of matching what you want to keep instead of splitting away what you want to discard produces good results:

    #!/usr/bin/perl foreach $entry (<DATA>) { ($var1, $var2, $var3) = ($entry =~ /"([^"]+)"(?:,|$)/g); print "Var1: $var1\n"; print "Var2: $var2\n"; print "Var3: $var3\n"; } __DATA__ "value 1","something else","other stuff" "asdf123","omg","hope this works" "more tests","testy","blah"

    I find that whenever something is hard to split, I'm approaching the problem from the wrong end and should be matching what I want to keep, and vice versa.

Re: Help with Regex in Split
by GrandFather (Saint) on Mar 29, 2006 at 08:15 UTC

    Use a regex to capture what you want rather than trying to deal with end effects splitting. Note the use of a __DATA__ section in the sample code - it was added to Perl to facilitate posting code to PerlMonks.

    use strict; use warnings; while (<DATA>) { my ($var1, $var2, $var3) = /"([^"]*)"/g; print "Var1: $var1\n"; print "Var2: $var2\n"; print "Var3: $var3\n"; } __DATA__ "value 1","something else","other stuff" "asdf123","omg","hope this works" "more tests","testy","blah"

    Prints:

    Var1: value 1 Var2: something else Var3: other stuff Var1: asdf123 Var2: omg Var3: hope this works Var1: more tests Var2: testy Var3: blah

    DWIM is Perl's answer to Gödel
Re: Help with Regex in Split
by davorg (Chancellor) on Mar 29, 2006 at 08:06 UTC

    It's difficult to help when you don't tell us what the problem is. We only know that the code you've given us doesn't do what you want it to to - but you haven't told us what results you were expecting.

    But my telepathy is strong this early in the morning and I think I've worked out what you want. You want to extract the three string values from each line of the data. You're looking for a lightweight CSV parser. Is that right?

    If that's the case, then split might not be the best approach. I'd recommend using Text::ParseWords instead - it's a standard part of the Perl distribution.

    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; my @data = <DATA>; foreach my $entry (@data) { my ($var1, $var2, $var3) = parse_line(',', 0, $entry); print "Var1: $var1\n"; print "Var2: $var2\n"; print "Var3: $var3\n"; } __DATA__ "value 1","something else","other stuff" "asdf123","omg","hope this works" "more tests","testy","blah"
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Help with Regex in Split
by davido (Cardinal) on Mar 29, 2006 at 08:10 UTC

    Are you looking for this?:

    my( @stuff ) = $entry =~ m/"([^",]+)"/g;

    ...instead of using split?

    Important note: If your individual values contain either commas or " quote characters, this isn't the right solution for you. If your data has the possibility of containing embedded commas or quote characters, you should probably have a look at Text::CSV or Text::Balanced. Trying to deal with comma separated data with embedded quotes which could lead to embedded commas can get really tricky really fast if you're trying to do it with regular expressions alone. The modules are designed to help in such situations.


    Dave

Re: Help with Regex in Split
by jeiku (Acolyte) on Mar 29, 2006 at 08:22 UTC
    Wow, thank you for the information. I went for using Text::ParseWords as this is exactly what I was looking for. I didn't realise it was that easy.. Thank you again everyone.
Re: Help with Regex in Split
by johngg (Canon) on Mar 29, 2006 at 09:49 UTC
    Briefer but possibly less readable.

    #!/usr/local/bin/perl # use strict; use warnings; print map {"Var1: $_->[0]\nVar2: $_->[1]\nVar3: $_->[2]\n"} map {[/"([^"]+)",?/g]} <DATA>; __DATA__ "value 1","something else","other stuff" "asdf123","omg","hope this works" "more tests","testy","blah"

    Produces

    Var1: value 1 Var2: something else Var3: other stuff Var1: asdf123 Var2: omg Var3: hope this works Var1: more tests Var2: testy Var3: blah

    Cheers,

    JohnGG