SixShot has asked for the wisdom of the Perl Monks concerning the following question:
Monks, I must first apologise as l am new to Perl and have been thrown headlong into the world of Perl to generate scripts for a new job. I hope you will be able to offer some help. I am desperately trying to gather experience rapidly and have embarked over a 10 day period to learn Perl and l hope you will be forgiving for my request for help, but l am struggling to grasp everything about this language.
I have data in s+ delimitated format:
<br/> __________ __________ Header1 Header2 Header3 Header4 + Header5 Header6 __________ __________ Header1 Header2 Header3 Header4 + Header5 Header6 Time Date Header1 Header2 Header3 Header4 + Header5 Header6 Days MM/DD/YYYY UNIT UNIT UNIT UNIT + UNIT UNIT Name AA-AA Name1 BAABAB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 Name AB_AB Name1 ABABAB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 Name AC_AC Name1 BBAABB 0.000000 1/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 31.000000 2/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 59.000000 3/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 90.000000 4/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 120.00000 5/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 151.00000 6/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 181.00000 7/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 212.00000 8/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 274.00000 9/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 305.00000 10/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 336.00000 11/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000 367.00000 12/01/2007 0.00000 0.00000 0.00000 0.00000 + 0.00000 0.00000
This file varies, with varying formats for the Name and Name1 data block headers, varying numbers of data blocks and ive limited the data blocks to one year but the data extends for 50 years in each block. I have dealt with file input and dealing with the headers (see below). I am now at the loop that begins extracting the blocks of data and applies an integar index to the block headers 'name' and 'name1'... and l am stuck. I need the data to be in two csv files, a data file ($outfile)
Name_ID,Name1_ID,Header1,Header2,Header3,Header4,Header5,Header6,Date_ +EN 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007 3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007
And a names file ($outnamefile), which l want as a log of the integar index ID's applied to the block headers:
Name,Name_ID,Name1,Name1_ID AA-AA,1,BAABAB,1 AB-AB,2,ABABAB,2 AC-AC,3,BBAABB,3
I am stuck with the logic of doing this loop in Perl and desperatly need help from some more experienced Perl mongers. Pls help, it would be most appreciated and would also be a great help for me to become more familar with Perl syntax.
command line arguement: -n 254 -o "." -r "file" -f ".+" -g ".+"
#!/usr/bin/perl use File::Spec; use Getopt::Std; #use strict; use IO::File; use Switch; use Time::HiRes qw(gettimeofday tv_interval); #get better than 1 secon +d resolution $| = 1; #Force output my $VERSION = 2.00; my $timerStart = gettimeofday(); # -------------------------------------------------------------------- +-- # Process Inputs # -------------------------------------------------------------------- +-- my %inputs = (); my %params = (); getopts('o:r:p:t:n:f:g:d', \%inputs); &get_params(\%inputs, \%params) || die "Error Getting Parameters\n"; &check_params(\%params); # -------------------------------------------------------------------- +-- # Find and check input files - extract sss type # -------------------------------------------------------------------- +-- my $infile = $params{filename}; # -------------------------------------------------------------------- +-- my $outfile = File::Spec->catfile($params{outfolder},$params{rootname} +); my $outnamefile = File::Spec->catfile( $params{outfolder}, $params{roo +tname} ); my $outLineCount = $outfile . '_db_proc_linecount.txt'; $outfile = $outfile . '_db_proc_data.csv'; $outnamefile = $outnamefile . "_db_proc_names.csv"; # -------------------------------------------------------------------- +-- print "Input SSS File: [$infile]\n"; print "Output SSS File: [$outfile]\n"; print "Output Names File: [$outnamefile]\n"; # -------------------------------------------------------------------- +-- my $hdrmap; my %hdrTypes = ( "fields"=>\&get_fields_area, #"flow"=>\&get_fields_flow, #"gather"=>\&get_fields_gather, "regions"=>\&get_fields_region, "plan"=>\&get_fields_plan); die "Unable to match sss type. " unless defined($hdrTypes{$params{ssst +ype}}); $hdrmap = $hdrTypes{$params{ssstype}}->(); my $namemap = &get_fields_names($params{ssstype}); # ----------------------------------------------- # Load and Process Headers # ----------------------------------------------- open (I,"<$infile") or die "Unable to open file $infile.\n"; open (O,">$outfile") or die "Unable to open file $outfile.\n"; open (ON,">$outnamefile") or die "Unable to open file $outnamefile.\n" +; open (OLCT, ">$outLineCount") or die "Unable to open file $outLineCoun +t.\n"; # ----------------------------------------------- #--------------------------------------------------------------------- +-- # Concatinate headers #--------------------------------------------------------------------- +-- for ($i=0; $i<3; $i++) { $line = <I>; chomp ($line); $line =~ s/^\s+//; $line =~ s/\s+$//; @sp = split (/\s+/,$line); for ($j=0; $j<scalar(@sp); $j++) { @hdrs[$j] = $hdrs[$j]." ".$sp[$j]; } } print "[" . join(",", @hdrs) . "]\n"; #--------------------------------------------------------------------- +-- foreach my $hdr (@hdrs){trim(\$hdr)} # ----------------------------------------------- # Associate Header with matching index # ----------------------------------------------- print "CHECKING HEADERS ------------\n"; foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#hdrs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$key}[ +0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap->{$k +ey}[0] . "\n"; } } print "-----------------------------------------------\n"; my @flist = (); my @fnamelist = (); &assign_names($hdrmap, \@flist); &assign_names($namemap, \@fnamelist); print "[" . join(",", @flist) . "]\n"; # ----------------------------------------------- print O join(",", @flist) . ",RUN_ID\n"; print ON join(",", @fnamelist) . ",RUN_ID\n"; exit(); # ----------------------------------------------- # Map Unit Conversions if needed # ----------------------------------------------- my $l_units = <I>; chomp ($l_units); print "UNITS -- > [$l_units]\n"; sub get_params($$) { # ------------------------------------------- # Input Hash from getopts and Run parameter hash # ------------------------------------------- my $inr = shift(@_); my $pr = shift(@_); # ------------------------------------------- # Setup default parameters # ------------------------------------------- $pr->{'debug'} = 0; #Extra output $pr->{'runid'} = -1000; #Dummy ID $pr->{'filter'} = ".*"; $pr->{'datefilter'} = ""; $pr->{'filename'} = "NA"; #$pr->{'runfolder'} = "."; $pr->{'outfolder'} = "."; $pr->{'ssstype'} = ""; # interpreted from filename # ------------------------------------------- # Get Values from inputs # ------------------------------------------- foreach my $key (keys %{$inr}) { $pr->{'debug'} = $inr->{$key} if $key eq 'd'; $pr->{'runid'} = $inr->{$key} if $key eq 'n'; $pr->{'filter'} = $inr->{$key} if $key eq 'f'; $pr->{'datefilter'} = $inr->{$key} if $key eq 'g'; $pr->{'filename'} = $inr->{$key} if $key eq 'r'; #$pr->{'runfolder'} = $inr->{$key} if $key eq 'p'; $pr->{'outfolder'} = $inr->{$key} if $key eq 'o'; $pr->{'ssstype'} = $inr->{$key} if $key eq 't'; } return 1; } sub check_params() { my $pr = shift; die "Require RunID if not in debug mode\n" if ($pr->{debug} == 0 a +nd $pr->{runid} == -1000); (-e $pr->{filename}) or die "Unable to find input filename: $pr->{ +filename}\n"; (-e $pr->{outfolder}) or die "Unable to find outfolder: $pr->{outf +older}\n"; $pr->{filter} = ".*" if $pr->{filter} eq ""; (my $volume,my $dirs,my $rootname) = File::Spec->splitpath($params +{filename}); $rootname =~ s/\.sss$//; my @sp = split(/_/,$rootname); $params{rootname} = $rootname; $params{ssstype} = $sp[-1] if $params{ssstype} eq ""; # ----------------------------------------------- print "Run Parameters: \n"; foreach my $key (keys %{$pr}) { print "$key -> [$pr->{$key}]\n"; } # ----------------------------------------------- return 1; } sub get_fields_plan() { my $tblFields = shift; $tblFields->{'DATE_EN'} = ["__________ __________ Date",-1]; $tblFields->{'HEADER1'} = ["Header1 Header1 Header1",-1]; $tblFields->{'HEADER2'} = ["Header2 Header2 Header2",-1]; $tblFields->{'HEADER3'} = ["Header3 Header3 Header3",-1]; $tblFields->{'HEADER4'} = ["Header4 Header4 Header4",-1]; $tblFields->{'HEADER5'} = ["Header4 Header5 Header5",-1]; $tblFields->{'HEADER6'} = ["Header6 Header6 Header6",-1]; return $tblFields; } sub assign_headers() { my $hdrmap = shift; my $headline = shift; my $sptagref = shift; # Default for tab sep +aration $$sptagref = '\s*,\s*' if $$headline=~ m/,/; my @hdrs = split(/$$sptagref/, $$headline); foreach my $hdr (@hdrs){trim(\$hdr)} # ----------------------------------------------- # Associate Header with matching index # ----------------------------------------------- print "CHECKING HEADERS ------------\n"; foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#h +drs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$k +ey}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap- +>{$key}[0] . "\n"; } } print "-----------------------------------------------\n"; } sub assign_names() { my $hdrmap = shift; my $flist = shift; @$flist = (); foreach my $key (sort keys %{$hdrmap}) { my $i = $hdrmap->{$key}[1]; push(@{$flist}, $key) if ($i != -1); } } # -------------------------------------------------------------- # Perl trim function to remove whitespace from the start and end of th +e string # -------------------------------------------------------------- sub trim() { my $sref = shift; $$sref =~ s/^\s+//; $$sref =~ s/\s+$//; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Space delimted to CSV, Index and data extraction loop
by planetscape (Chancellor) on Jul 30, 2011 at 12:47 UTC | |
|
Re: Space delimted to CSV, Index and data extraction loop
by Khen1950fx (Canon) on Jul 30, 2011 at 10:27 UTC | |
|
Re: Space delimted to CSV, Index and data extraction loop
by Not_a_Number (Prior) on Jul 30, 2011 at 11:15 UTC | |
|
Re: Space delimted to CSV, Index and data extraction loop
by SixShot (Novice) on Jul 30, 2011 at 13:31 UTC |