robs has asked for the wisdom of the Perl Monks concerning the following question:

Monks- I have never used perl before, but have a wee bit of experience with awk. My problem is that I have a series of large data matrices, stored as comma-delim ascii files. They are 18 columns by several thousand lines. I'm trying to pull out three of the 18 columns and do some reformatting to get the raw data into a usable for for Arc/Info (GIS program), and I need some advice as to the best way to do this. A snippet of the raw file looks like this:
lon lat segno2 "-69.87" "40.07" "952026060" "-69.85" "40.07" "952026060" "-69.84" "40.07" "952026060" "-65.64" "41.13" "952026064" "-65.66" "41.15" "952026064" "-65.60" "41.32" "952026064" "-65.61" "41.33" "952026064" "-65.57" "41.15" "952026066" "-65.57" "41.13" "952026066" "-65.57" "41.12" "952026066" "-65.56" "41.11" "952026066" "-65.56" "41.10" "952026066" "-65.54" "40.97" "952026066" "-65.54" "40.96" "952026066" "-65.53" "40.95" "952026066" "-65.53" "40.94" "952026066" "-65.47" "41.16" "952026067" "-65.46" "41.15" "952026067"
And I want it to look like this, perferably without the "":
"952026060" "-69.87", "40.07" "-69.85", "40.07" "-69.84", "40.07" end "952026064" "-65.64", "41.13" "-65.66", "41.15" "-65.60", "41.32" "-65.61", "41.33" end "952026066" "-65.57", "41.15" "-65.57", "41.13" "-65.57", "41.12" "-65.56", "41.11" "-65.56", "41.10" "-65.54", "40.97" "-65.54", "40.96" "-65.53", "40.95" "-65.53", "40.94" end "952026067" "-65.47", "41.16" "-65.46", "41.15" end end
Any help/advice as to how to proceed would be much appreciated. Rob

Replies are listed 'Best First'.
Re: formatting help
by infoninja (Friar) on Jun 30, 2000 at 19:52 UTC
    There are three ways that immediately come to my mind:
    • DBI::CSV - if you need SQL, etc for any reason (doesn't look like you will for what you're describing, but just in case...)
    • Text::CSV
    • With a regexp - if the fields don't contain commas within the values, you can split on /,/. If values can contain commas, the regexp will get a little hairier...
Re: formatting help
by i43s (Novice) on Jun 30, 2000 at 20:02 UTC
    my $data = {}; while ( chomp( $_ = <DATA> ) ) { my $list = [ split ]; tr /"//d foreach @{$list}; push @{$data->{pop @{$list}}}, [ @{$list} ]; } foreach ( sort keys %{$data} ) { print "$_\n"; foreach ( @{$data->{$_}} ) { print join ', ', @{$_}; print "\n"; } print "end\n"; } __END__ "-69.87" "40.07" "952026060" "-69.85" "40.07" "952026060" "-69.84" "40.07" "952026060" "-65.64" "41.13" "952026064" "-65.66" "41.15" "952026064" "-65.60" "41.32" "952026064" "-65.61" "41.33" "952026064" "-65.57" "41.15" "952026066" "-65.57" "41.13" "952026066" "-65.57" "41.12" "952026066" "-65.56" "41.11" "952026066" "-65.56" "41.10" "952026066" "-65.54" "40.97" "952026066" "-65.54" "40.96" "952026066" "-65.53" "40.95" "952026066" "-65.53" "40.94" "952026066" "-65.47" "41.16" "952026067" "-65.46" "41.15" "952026067"
Re: formatting help
by davorg (Chancellor) on Jun 30, 2000 at 20:07 UTC

    I've made a couple of assumptions in my answer.

    1. Your data has no embedded commas (if it has you'll need to look at Text::CSV_XS).
    2. You are using columns 0, 1 and 2 from the data. You can adjust this in the line with the call to split.

    Here's the code:

    open(DAT, 'file.dat') or die "Can't open file.dat: $!\n"; my %data; while (<DAT>) { chomp; my ($key, @coords) = (split(/,/))[2, 0, 1]; push @{$data{$key}}, \@coords; } foreach (keys %data) { print "$_\n"; local $" = ', '; foreach my $coord (@{$data{$_}}) { print "@$coord\n"; } print "end\n"; } print "end\n";

    Basically I'm building up a hash where the key is grouping column and the value is an array of co-ordinates. Having read the whole file and built up the hash, I then iterate over the hash again and print out hte values.

    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000
    <http://www.yapc.org/Europe/>
RE: formatting help
by Shendal (Hermit) on Jun 30, 2000 at 20:04 UTC
    Here's my solution. It removes any double quotes as well.
    use strict; use warnings; my($file) = 'foo.txt'; my($cur_segno) = 0; open(FILE,$file); while (<FILE>) { next unless s/\"//g; # this will skip the first line and remove an +y " my(@res) = split; if ($res[2] == $cur_segno) { print "$res[0], $res[1]\n"; } else { print "end\n" if ($cur_segno); print "$res[2]\n"; print "$res[0], $res[1]\n"; $cur_segno = $res[2]; } } print "end\n"; close(FILE);
    Hope that helps.
RE: formatting help
by mrmick (Curate) on Jun 30, 2000 at 22:12 UTC
    Here's a home brewed example of how to get your format. Similar to some others you have and probably will again receive.
    #!/usr/bin/perl -w my %segment = (); $file = "source_data.txt"; # Open the source file ... open (FILE,$file)||die "Can't open the source data file $file\n$!\n"; while(<FILE>){ if (/,/){ chomp; s/\"//g; # no more double quotes my ($segname, @rec) = (split(/,/))[2,0,1]; push @{$segment{$segname}}, \@rec; } } close (FILE); foreach (keys %segment) { print "$_\n"; # let's get the new comma + space in there and print foreach (@{$segment{$_}}) { print join(", ",@$_) . "\n"; } print "end\n"; }
Re: formatting help
by Buckaroo Buddha (Scribe) on Jun 30, 2000 at 19:58 UTC
    finally, a node that i can help with *grin*
    i'm assuming that there is no header row
    in your text files
    we will also assume you want to return columns
    6, 12, and 18
    while(<>){ # read through each file specified on the command line one + line at a time my @temp = split(',',$_); # take that one line and split it into a +n array determined by the commas my ($section,$value1,$value2) = ($temp[6],$temp[12],$temp[18]); # +set names to specific spots for readability $section =~ s/\"//g; # remove the ", you may have to remove 'g' to + make this work $value1 =~ s/\"//g; # it's off the top of my head ;) $value2 =~ s/\"//g; $ALLDATA{$section} = push [($value1,$value2)]; #put an array of tw +o numbers into an array those } # end while<> foreach my $key (sort keys %ALLDATA) { # go through each section (in o +rder) print "$key\n"; foreach my $array (@{$ALLDATA{$key}}) { print "\t"; foreach my $value (@{$ALLDATA{$key}[$array]} { print "$value,"; } print "\n"; } print "END_SECTION\n"; }
    save the above line of code to a file
    execute the command line
     perl progname.pl < datafilename >
    
      email me if it dosen't work achesser@nortelnetworks.com
    i wrote that off the top of my head
RE: formatting help
by robs (Initiate) on Jun 30, 2000 at 19:52 UTC
    A formatted version of my posting is at http://www.duke.edu/~rss6 sorry for the illegible text in my original posting