sea has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need help with figuring out how to search through a comma delimited text file and yank out the right field THEN export that information to a new file. In the data example below, I would like to extract the domain names specifically. I've just cracked the perl book and I keep hitting my head on it so if you could give me a good example to run with I'd appreciate it. I figured that this sort of request would be pretty basic to start with? Any help is greatly appreciated!


Thanks,
Simone :)

"1495107228781574","SomeCompany, Inc.","8335788590132972","kerpow.org","Active","Jan 27 2000 9:02PM","May 1 2004 12:34PM","May 28 2005 2:02AM"
"1495107228781574","SomeCompany, Inc.","1802975834647670","squish.com","Active","Jul 15 2004 1:49PM","May 2 2004 11:05PM","May 19 2005 11:05PM"
"1495107228781574","SomeCompany, Inc.","4430459098697337","wanka.com","Active","Nov 9 2004 9:29AM","May 3 2004 2:14PM","May 16 2005 2:14PM"
"1495107228781574","SomeCompany, Inc.","2113292909703608","blah.com","Active","Jan 16 2003 10:48AM","May 4 2004 12:34PM","May 16 2005 5:48AM"

Replies are listed 'Best First'.
Re: search and delete comma delimited file
by Limbic~Region (Chancellor) on Dec 08, 2004 at 23:25 UTC
    Simone,
    Checking the CPAN is never a bad idea when trying to solve a new problem. Text::xSV is one of a handful of modules that make this easy.
    #!/usr/bin/perl use strict; use warnings; use Text::xSV; my $csv = Text::xSV->new( fh => *DATA ); $csv->bind_header; while ( $csv->get_row() ) { my $row = $csv->extract_hash(); print $row->{Domain}, "\n"; } __DATA__ "Column 1","Company Name","Column 2","Domain","Status","Start Date","E +xpiration Date","Renew Date" "1495107228781574","SomeCompany, Inc.","8335788590132972","kerpow.org" +,"Active", "Jan 27 2000 9:02PM","May 1 2004 12:34PM","May 28 2005 2:02AM" "1495107228781574","SomeCompany, Inc.","1802975834647670","squish.com" +,"Active", "Jul 15 2004 1:49PM","May 2 2004 11:05PM","May 19 2005 11:05PM"
    It should not be hard to take this as an example and use the documentation to get what you want, but I will leave that up to you.

    Cheers - L~R

Re: search and delete comma delimited file
by diotalevi (Canon) on Dec 08, 2004 at 23:32 UTC

    Try this on for size. It reads your file using the standard Text::CSV_XS module and prints just the domain field.

    use strict; use warnings; use constant { FIELD_Unknown_1 => 0, FIELD_CompanyName => 1, FIELD_Unknown_2 => 2, FIELD_Domain => 3, FIELD_Status => 4, FIELD_Unknown_3 => 5, FIELD_Unknown_4 => 6, FIELD_Unknown_5 => 7, }; my $parser = Text::CSV_XS->new(); while ( my $line = <> ) { $parser->parse( $line ); my @columns = $parser->fields; my @new_columns = @columns[ FIELD_Domain, ]; $parser->combine( @new_columns ); print $parser->string; }
Re: search and delete comma delimited file
by jZed (Prior) on Dec 08, 2004 at 23:26 UTC
    If you want to access the file as if it were a database, you can use DBD::CSV which will allow DBI/SQL access to multiple CSV files in a variety of formats, including the one you show (embedded commas in quote-delimited fields). If you aren't a SQL person, then Text::CSV_XS will parse the file for you.
Re: search and delete comma delimited file
by tall_man (Parson) on Dec 08, 2004 at 23:53 UTC
    In this case (grabbing one inner field), you could avoid the fancy modules:
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { my $domain = (split /","/,$_,5)[3]; print $domain,"\n"; } __DATA__ "1495107228781574","SomeCompany, Inc.","8335788590132972","kerpow.org" +,"Active","Jan 27 2000 9:02PM","May 1 2004 12:34PM","May 28 2005 2:02 +AM" "1495107228781574","SomeCompany, Inc.","1802975834647670","squish.com" +,"Active","Jul 15 2004 1:49PM","May 2 2004 11:05PM","May 19 2005 11:0 +5PM" "1495107228781574","SomeCompany, Inc.","4430459098697337","wanka.com", +"Active","Nov 9 2004 9:29AM","May 3 2004 2:14PM","May 16 2005 2:14PM" "1495107228781574","SomeCompany, Inc.","2113292909703608","blah.com"," +Active","Jan 16 2003 10:48AM","May 4 2004 12:34PM","May 16 2005 5:48A +M"
    Note: this one works even if some company names have commas and the others don't.
Re: search and delete comma delimited file
by McMahon (Chaplain) on Dec 08, 2004 at 23:27 UTC
    There are many ways to do this, but I really like DBD::CSV. The SQL interface is really really powerful for this sort of data wrangling.
Re: search and delete comma delimited file
by Popcorn Dave (Abbot) on Dec 08, 2004 at 23:52 UTC
    Another thing you might try - and this is only if your data holds as your examples - is something I did for a stock tracking program. The trick is though that your data is always in the same format. Just split it and grab what you need.

    In my case, I'm getting static stocks data from Yahoo Finance and I expect it to be in a certain order, so I just split it, and only take what I'm after. If your data is going to change, then this isn't going to do it for you. HTH

    #!/usr/bin/perl use strict; use warnings; my @results; my $relevant; while (<DATA>){ (undef, undef, undef, $relevant) = split ('","',$_); push(@results, $relevant); } print join("\n", @results); __DATA__ "1495107228781574","SomeCompany, Inc.","8335788590132972","kerpow.org" +,"Active","Jan 27 2000 9:02PM","May 1 2004 12:34PM","May 28 2005 2:02 +AM" "1495107228781574","SomeCompany, Inc.","1802975834647670","squish.com" +,"Active","Jul 15 2004 1:49PM","May 2 2004 11:05PM","May 19 2005 11:0 +5PM" "1495107228781574","SomeCompany, Inc.","4430459098697337","wanka.com", +"Active","Nov 9 2004 9:29AM","May 3 2004 2:14PM","May 16 2005 2:14PM" "1495107228781574","SomeCompany, Inc.","2113292909703608","blah.com"," +Active","Jan 16 2003 10:48AM","May 4 2004 12:34PM","May 16 2005 5:48A +M"

    Output:

    kerpow.org
    squish.com
    wanka.com
    blah.com

    Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.
      Popcorn Dave,

      Okay, this absolutely sits side by side with the sunrise, that I so was lucky
      to be able to witness this morning...

      I'm going to walk through the other responses as well so I can see the different
      ways to approach and visualize this data! THANK YOU! :)


      Oh yeah, I also wanted to give you a blurb of, 'this is was what I did to make
      this work on my end.'

      I took this line of code:
      while (<DATA>){

      and changed it to this:
      while (<>){

      so I could make the module read from STDIN and direct it to the new file:
      ./wicked.pl thefile.txt > thedataIneed.txt

      and like I said above...WOW!! What a beautiful sunrise!

      Simone~

      split ('","',$_);

      While this split works on the data set presented, it is definitely not a good general solution for CSV parsing. There are at least two kinds of valid CSV that will break it. In the first place, many CSV generators omit quotation marks around fields that don't contain embedded special characters so this is a valid three-field CSV record in braces: {foo,"a,b",7} in which the quote-comma-quote split would fail because there are not quotes around some of the field-separating commas. Also, if there are embedded quote marks in the data, those are usually escaped by doubling the quote marks so this is also a valid three-field CSV record: {7,"a"",""b",8}, in which the split would fail because it would find a non-field-separating quote-comma-quote pattern.

        No, you're right, I see that now, and that's a very good point. I was just trying to show something that I had used, with the caveat that it may or may not work for the OP's data. In my particular case, the data is just CSV so just a split on ',' with the undefs works.

        Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.