kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hi again, I have some strings of data from where I wish to extract data; here's an example:
nc,71526435,0,"text text, text text,456 etc...",36.5,121.1,1.4,7.50, 8 +,"text2 text, text text,123 etc..."
The fields are separated by commas as you can see; some have spaces before the numbers, others not. Two are between quotes. Of course all these values can change - but the numerics are always numeric or not existents. Any ideas? Kind regards, Kepler

Replies are listed 'Best First'.
Re: Extracting data from a string
by Eliya (Vicar) on Feb 21, 2011 at 09:11 UTC

    You haven't said what the semantics of the quotes are, and what you want to extract, but my suspicion is that Text::CSV would be the right tool for the job.

Re: Extracting data from a string
by Ratazong (Monsignor) on Feb 21, 2011 at 10:19 UTC
    nc,71526435,0,"text text, text text,456 etc...",36.5,121.1,1.4,7.50, 8 +,"text2 text, text text,123 etc..."

    If you are sure on the format of your input, you can parse such data easily with a regex (see below, and also this node by planetscape). However your code will likely not work if anything changes in your data-format - or if the data contains errors. Things that will make your regex complicated will be different number formats, or strings containing \" ... and many more. Thats why Text:CSV has been recommended to you.

    But now to the regex-approach:

    • ^ lets your matching start at the beginning of the line
    • (..)grabs two letters
    • ,matches a comma
    • \s*(\d*)grabs a number (if one is available), possible preceeded by some blanks
    • "(.*?)"grabs text, enclosed in double quotes
    • \s*(\d+\.\d+)grabs a number (containing one dot), possible preceeded by some blanks
    Just combine them as you wish!

    HTH, Rata
Re: Extracting data from a string
by Khen1950fx (Canon) on Feb 21, 2011 at 13:41 UTC
    Using Text::CSV::Simple, you can extract all the data like this:
    #!/usr/bin/perl use strict; use warnings; use Text::CSV::Simple; use YAML; use YAML::Dumper; my $dumper = YAML::Dumper->new; $dumper->indent_width(1); my $filename = '/root/Desktop/strings.txt'; my $parser = Text::CSV::Simple->new; my @data = $parser->read_file($filename); print "=========list of data=========", "\n"; print $dumper->dump( { dump => @data } ), "\n";
    If you just want a couple of fields for example:
    use strict; use warnings; use Text::CSV::Simple; use YAML; use YAML::Dumper; my $dumper = YAML::Dumper->new; $dumper->indent_width(1); my $filename = '/root/Desktop/strings.txt'; my $parser = Text::CSV::Simple->new; $parser->want_fields(2,5); my @data = $parser->read_file($filename); print "=========wanted fields=======", "\n"; print $dumper->dump( { dump => @data } ), "\n";
Re: Extracting data from a string
by tospo (Hermit) on Feb 21, 2011 at 10:23 UTC
    Any regex you have tried so far? What did you get out of it and waht did you expect to get? Can you describe the pattern a bit more? What exactly do you need to capture and what defines a record in this format?
Re: Extracting data from a string
by viveksnv (Sexton) on Feb 21, 2011 at 09:19 UTC
    Hi

    Did you try with split option. Hope you did

    $var='nc,71526435,0,"text text, text text,456 etc...",36.5,121.1,1.4,7 +.50, 8,"text2 text, text text,123 etc..."'; @array=split(",",$var); while(<@array>) { print $_ ."\n"; }

    And also you may try with modules as suggested above.

      Did you? Run your program.
        Hi, Yes. It's a mess, because the text between quotes has also commas. Kind regards, Kepler