Space delimted to CSV, Index and data extraction loop

SixShot has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I must first apologise as l am new to Perl and have been thrown headlong into the world of Perl to generate scripts for a new job. I hope you will be able to offer some help. I am desperately trying to gather experience rapidly and have embarked over a 10 day period to learn Perl and l hope you will be forgiving for my request for help, but l am struggling to grasp everything about this language.

I have data in s+ delimitated format:

<br/>
__________ __________  Header1     Header2     Header3     Header4    
+ Header5     Header6
__________ __________  Header1     Header2     Header3     Header4    
+ Header5     Header6
Time       Date        Header1     Header2     Header3     Header4    
+ Header5     Header6
Days       MM/DD/YYYY  UNIT        UNIT        UNIT        UNIT       
+ UNIT        UNIT


    Name AA-AA  Name1 BAABAB

0.000000    1/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000    
31.000000   2/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
59.000000   3/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000             
90.000000   4/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
120.00000   5/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000          
151.00000   6/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
181.00000   7/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
212.00000   8/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000       
274.00000   9/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
305.00000  10/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
336.00000  11/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
367.00000  12/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000

    Name AB_AB  Name1 ABABAB

0.000000    1/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000    
31.000000   2/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
59.000000   3/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000             
90.000000   4/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
120.00000   5/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000          
151.00000   6/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
181.00000   7/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
212.00000   8/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000       
274.00000   9/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
305.00000  10/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
336.00000  11/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
367.00000  12/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000

    Name AC_AC  Name1 BBAABB
    
0.000000    1/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000    
31.000000   2/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
59.000000   3/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000             
90.000000   4/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
120.00000   5/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000          
151.00000   6/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000           
181.00000   7/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000            
212.00000   8/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000       
274.00000   9/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
305.00000  10/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
336.00000  11/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
367.00000  12/01/2007  0.00000     0.00000     0.00000     0.00000    
+ 0.00000     0.00000
[download]

This file varies, with varying formats for the Name and Name1 data block headers, varying numbers of data blocks and ive limited the data blocks to one year but the data extends for 50 years in each block. I have dealt with file input and dealing with the headers (see below). I am now at the loop that begins extracting the blocks of data and applies an integar index to the block headers 'name' and 'name1'... and l am stuck. I need the data to be in two csv files, a data file ($outfile)

Name_ID,Name1_ID,Header1,Header2,Header3,Header4,Header5,Header6,Date_
+EN
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007
1,1,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007
2,2,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,01/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,02/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,03/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,04/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,05/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,06/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,07/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,08/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,09/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,10/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,11/01/2007
3,3,0.00000,0.00000,0.00000,0.00000,0.00000,0.00000,12/01/2007
[download]

And a names file ($outnamefile), which l want as a log of the integar index ID's applied to the block headers:

Name,Name_ID,Name1,Name1_ID
AA-AA,1,BAABAB,1
AB-AB,2,ABABAB,2
AC-AC,3,BBAABB,3
[download]

I am stuck with the logic of doing this loop in Perl and desperatly need help from some more experienced Perl mongers. Pls help, it would be most appreciated and would also be a great help for me to become more familar with Perl syntax.

command line arguement: -n 254 -o "." -r "file" -f ".+" -g ".+"



#!/usr/bin/perl
use File::Spec;
use Getopt::Std;
#use strict;
use IO::File;
use Switch;
use Time::HiRes qw(gettimeofday tv_interval); #get better than 1 secon
+d resolution
$| = 1;        #Force output
my $VERSION = 2.00;
my $timerStart = gettimeofday();
# --------------------------------------------------------------------
+--
# Process Inputs
# --------------------------------------------------------------------
+--
my %inputs = ();
my %params = ();
getopts('o:r:p:t:n:f:g:d', \%inputs);
&get_params(\%inputs, \%params) || die "Error Getting Parameters\n";
&check_params(\%params);

# --------------------------------------------------------------------
+--
# Find and check input files - extract sss type
# --------------------------------------------------------------------
+--
my $infile = $params{filename};
# --------------------------------------------------------------------
+--
my $outfile = File::Spec->catfile($params{outfolder},$params{rootname}
+);
my $outnamefile = File::Spec->catfile( $params{outfolder}, $params{roo
+tname} );
my $outLineCount = $outfile . '_db_proc_linecount.txt';
$outfile = $outfile . '_db_proc_data.csv';
$outnamefile = $outnamefile .  "_db_proc_names.csv";


# --------------------------------------------------------------------
+--
print "Input SSS File:  [$infile]\n";
print "Output SSS File: [$outfile]\n";
print "Output Names File: [$outnamefile]\n";
# --------------------------------------------------------------------
+--


my $hdrmap;
my %hdrTypes = (
    "fields"=>\&get_fields_area, 
    #"flow"=>\&get_fields_flow,
    #"gather"=>\&get_fields_gather,
    "regions"=>\&get_fields_region,
    "plan"=>\&get_fields_plan);
die "Unable to match sss type. " unless defined($hdrTypes{$params{ssst
+ype}});
$hdrmap = $hdrTypes{$params{ssstype}}->();
my $namemap = &get_fields_names($params{ssstype});

# -----------------------------------------------
# Load and Process Headers
# -----------------------------------------------
open (I,"<$infile") or die "Unable to open file $infile.\n";
open (O,">$outfile") or die "Unable to open file $outfile.\n";
open (ON,">$outnamefile") or die "Unable to open file $outnamefile.\n"
+;
open (OLCT, ">$outLineCount") or die "Unable to open file $outLineCoun
+t.\n";
# -----------------------------------------------

#---------------------------------------------------------------------
+--
# Concatinate headers
#---------------------------------------------------------------------
+--

for ($i=0; $i<3; $i++)
{
    $line = <I>;
    chomp ($line);
    $line =~ s/^\s+//;
    $line =~ s/\s+$//;
    @sp = split (/\s+/,$line);
    
    for ($j=0; $j<scalar(@sp); $j++)
    {
        @hdrs[$j] = $hdrs[$j]." ".$sp[$j];    
    }

}

print "[" . join(",", @hdrs) . "]\n";

#---------------------------------------------------------------------
+--
foreach my $hdr (@hdrs){trim(\$hdr)}
# -----------------------------------------------
# Associate Header with matching index
# -----------------------------------------------
print "CHECKING HEADERS ------------\n";
foreach my $key (sort keys %{$hdrmap})
{
    my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#hdrs;
    if (defined ($index)) 
    { 
        $hdrmap->{$key}[1] = $index; 
        print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$key}[
+0] . "\t[" . $hdrmap->{$key}[1]. "]\n";
    }
    else {
        print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap->{$k
+ey}[0] . "\n";
    }
}
print "-----------------------------------------------\n";



my @flist = ();
my @fnamelist = ();
&assign_names($hdrmap, \@flist);
&assign_names($namemap, \@fnamelist);
print "[" . join(",", @flist) . "]\n";

# -----------------------------------------------
print O join(",", @flist) . ",RUN_ID\n";
print ON join(",", @fnamelist) . ",RUN_ID\n";

exit();
# -----------------------------------------------
# Map Unit Conversions if needed
# -----------------------------------------------
my $l_units = <I>;
chomp ($l_units);
print "UNITS -- > [$l_units]\n";    





sub get_params($$)
{
    # -------------------------------------------
    # Input Hash from getopts and Run parameter hash
    # -------------------------------------------
    my $inr = shift(@_);
    my $pr = shift(@_);
    # -------------------------------------------
    # Setup default parameters
    # -------------------------------------------
    $pr->{'debug'} = 0;        #Extra output
    $pr->{'runid'} = -1000;    #Dummy ID
    $pr->{'filter'} = ".*";
    $pr->{'datefilter'} = "";
    $pr->{'filename'} = "NA";
    #$pr->{'runfolder'} = ".";
    $pr->{'outfolder'} = ".";
    $pr->{'ssstype'} = "";    # interpreted from filename
    # -------------------------------------------
    # Get Values from inputs
    # -------------------------------------------
    foreach my $key (keys %{$inr})
    {
        $pr->{'debug'} = $inr->{$key} if $key eq 'd';
        $pr->{'runid'} = $inr->{$key} if $key eq 'n';
        $pr->{'filter'} = $inr->{$key} if $key eq 'f';
        $pr->{'datefilter'} = $inr->{$key} if $key eq 'g';
        $pr->{'filename'} = $inr->{$key} if $key eq 'r';
        #$pr->{'runfolder'} = $inr->{$key} if $key eq 'p';
        $pr->{'outfolder'} = $inr->{$key} if $key eq 'o';
        $pr->{'ssstype'} = $inr->{$key}    if $key eq 't';
    }
    return 1;
}


sub check_params()
{
    my $pr = shift;
    die "Require RunID if not in debug mode\n" if ($pr->{debug} == 0 a
+nd $pr->{runid} == -1000);
    (-e $pr->{filename}) or die "Unable to find input filename: $pr->{
+filename}\n";
    (-e $pr->{outfolder}) or die "Unable to find outfolder: $pr->{outf
+older}\n";
    $pr->{filter} = ".*" if $pr->{filter} eq "";
    (my $volume,my $dirs,my $rootname) = File::Spec->splitpath($params
+{filename});
    $rootname =~ s/\.sss$//;
    my @sp = split(/_/,$rootname);
    $params{rootname} = $rootname;
    $params{ssstype} = $sp[-1] if $params{ssstype} eq "";
    # -----------------------------------------------
    print "Run Parameters: \n";
    foreach my $key (keys %{$pr})
    {
        print "$key -> [$pr->{$key}]\n";    
    }
    # -----------------------------------------------
    return 1;    
}


sub get_fields_plan()
{
    my $tblFields = shift;
    $tblFields->{'DATE_EN'} = ["__________ __________ Date",-1];
    $tblFields->{'HEADER1'} = ["Header1 Header1 Header1",-1];
    $tblFields->{'HEADER2'} = ["Header2 Header2 Header2",-1];
    $tblFields->{'HEADER3'} = ["Header3 Header3 Header3",-1];
    $tblFields->{'HEADER4'} = ["Header4 Header4 Header4",-1];
    $tblFields->{'HEADER5'} = ["Header4 Header5 Header5",-1];
    $tblFields->{'HEADER6'} = ["Header6 Header6 Header6",-1];
    return $tblFields;    
}


sub assign_headers()
{
    
    my $hdrmap = shift;
    my $headline = shift;
    my $sptagref = shift;                        # Default for tab sep
+aration
    $$sptagref = '\s*,\s*' if $$headline=~ m/,/;
    my @hdrs = split(/$$sptagref/, $$headline);
    foreach my $hdr (@hdrs){trim(\$hdr)} 
    # -----------------------------------------------
    # Associate Header with matching index
    # -----------------------------------------------
    print "CHECKING HEADERS ------------\n";
    foreach my $key (sort keys %{$hdrmap})
    {
        my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#h
+drs;
        if (defined ($index)) 
        { 
            $hdrmap->{$key}[1] = $index; 
            print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$k
+ey}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n";
        }
        else {
            print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap-
+>{$key}[0] . "\n";
        }
    }
    print "-----------------------------------------------\n";
}


sub assign_names()
{
    my $hdrmap = shift;
    my $flist = shift;
    @$flist = ();
    foreach my $key (sort keys %{$hdrmap})
    {
        my $i = $hdrmap->{$key}[1];
        push(@{$flist}, $key) if ($i != -1);
    }
}

# --------------------------------------------------------------
# Perl trim function to remove whitespace from the start and end of th
+e string
# --------------------------------------------------------------
sub trim()
{
    
    my $sref = shift;
    $$sref =~ s/^\s+//;
    $$sref =~ s/\s+$//;
}
[download]

Comment on Space delimted to CSV, Index and data extraction loop Select or Download Code

Replies are listed 'Best First'.
Re: Space delimted to CSV, Index and data extraction loop by planetscape (Chancellor) on Jul 30, 2011 at 12:47 UTC
have embarked over a 10 day period to learn Perl You'll need more than 10 days. ;-) After 10 years, I'm still learning... But welcome, make yourself at home, and don't forget to check out our very fine Tutorials, esp. Getting Started with Perl HTH, planetscape	[reply]
Re: Space delimted to CSV, Index and data extraction loop by Khen1950fx (Canon) on Jul 30, 2011 at 10:27 UTC
I concentrated on just getting your script in bounds. So far, so good. The first thing that you want to do is to check and double-check for errors, warnings. Look for simple things such as: Have you declared all your variables? Are all your subroutines defined? Are your variables properly scoped? Have you checked for anything redundant? Is any part of your code unreachable? (Can it be removed?) Second, you must know exactly what your variables, subroutines, packages, and modules used are. module_info will tell you exactly what you are doing. For example, I ran `module_info -a` on your script: `Name: /root/Desktop/rework.pl Version: v2.0.0 Directory: File: /root/Desktop/rework.pl Core module: no Modules used: English File::Spec Getopt::Std IO::File Switch Time::HiRes autodie strict version warnings Packages created: main 1312019949.17447 Subroutines defined: main assign_headers assign_names check_params get_fields_plan get_params trim` [download] Third, run some checks to see how bad the damage is:). `perl -c script.pl perl -w script.pl perl -MO=Lint script.pl perltidy script.pl` [download] Here's your script as I fixed it, minus the file part: #!/usr/bin/perl use strict; use warnings; use File::Spec; use Getopt::Std; use IO::File; use Switch; use Time::HiRes qw(gettimeofday tv_interval); use version 0.77; our $VERSION = qv("v2.0.0"); use English qw(-no_match_vars); local $OUTPUT_AUTOFLUSH = 1; print my $timer_start = gettimeofday(), "\n"; my(%inputs) = (); my(%params) = (); getopts('o:r:p:t:n:f:g:d', \%inputs); die "Error Getting Parameters\n" unless get_params(\%inputs, \%params); check_params(\%params); my $outfile = File::Spec->catfile($params{'outfolder'},$params{'rootname'}); my $outnamefile = File::Spec->catfile($params{'outfolder'}, $params{'rootname'} ); my $outlinecount = $outfile . '_db_proc_linecount.txt'; $outfile = $outfile . '_db_proc_data.csv'; $outnamefile = $outnamefile . '_db_proc_names.csv'; print "Output SSS File: $outfile\n"; print "Output Names File: $outnamefile\n"; my $hdrmap; my(%hdr_types) = ( fields =>\&get_fields_area, regions =>\&get_fields_region, plan =>\&get_fields_plan ); die 'Unable to match sss type. ' unless defined($hdr_types{$params{ssstype}}); $hdrmap = $hdr_types{$params{ssstype}}(); my $namemap = get_fields_plan($params{ssstype}); #open I, '<', $infile or die "Unable to open file $infile.\n"; open my $O, '>', $outfile or die "Unable to open file $outfile.\n"; open my $ON, '>', $outnamefile or die "Unable to open file $outnamefil +e.\n"; #open OLCT, '<', $outlinecount or die "Unable to open file $outlinecou +nt.\n"; foreach (my $i=0; $i<3; $i++) { my $line = <$O>; chomp ($line); $line =~ s/^\s+//; $line =~ s/\s+$//; my @sp = split (/\s+/,$line); foreach (my $j=0; $j<scalar(@sp); $j++) { my @hdrs; @hdrs = $hdrs[$j] . q{} . $sp[$j]; } } print "[" . join(",", my @hdrs) . "]\n"; foreach my $hdr (@hdrs){ trim(\$hdr); print "CHECKING HEADERS ------------\n"; } foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0..$#hdrs +; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $hdrmap->{$key +}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrmap->{ +$key}[0] . "\n"; } } print "-----------------------------------------------\n"; my @flist = (); my @fnamelist = (); assign_names($hdrmap, \@flist); assign_names($namemap, \@fnamelist); print "[" . join(",", @flist) . "]\n"; print $O join(",", @flist) . ",RUN_ID\n"; print $ON join(",", @fnamelist) . ",RUN_ID\n"; exit(); my $l_units = <$O>; chomp ($l_units); print "UNITS -- > [$l_units]\n"; use autodie qw(:close); close($O); close($ON); sub get_params { my $inr = shift; my $pr = shift; $pr->{'debug'} = 1; $pr->{'runid'} = -1000; $pr->{'filter'} = "."; $pr->{'datefilter'} = ""; $pr->{'filename'} = "NA"; $pr->{'outfolder'} = "."; $pr->{'ssstype'} = ""; foreach my $key (keys %{$inr}) { $pr->{'debug'} = $inr->{$key} if $key eq 'd'; $pr->{'runid'} = $inr->{$key} if $key eq 'n'; $pr->{'filter'} = $inr->{$key} if $key eq 'f'; $pr->{'datefilter'} = $inr->{$key} if $key eq 'g'; $pr->{'filename'} = $inr->{$key} if $key eq 'r'; $pr->{'outfolder'} = $inr->{$key} if $key eq 'o'; $pr->{'ssstype'} = $inr->{$key} if $key eq 't'; } return 1; } sub check_params { my $pr = shift; die "Require RunID if not in debug mode\n" if ($pr->{debug} == 1 a +nd $pr->{runid} == -1000); (-e $pr->{filename}) or die "Unable to find input filename: $pr->{ +filename}\n"; (-e $pr->{outfolder}) or die "Unable to find outfolder: $pr->{outf +older}\n"; $pr->{filter} = "." if $pr->{filter} eq ""; my ($volume, $dirs, $rootname) = File::Spec->splitpath($params{fil +ename}); $rootname =~ s/\.sss$//; my @sp = split(/_/,$rootname); $params{rootname} = $rootname; $params{ssstype} = $sp[-1] if $params{ssstype} eq ""; print "Run Parameters: \n"; foreach my $key (keys %{$pr}) { print "$key -> [$pr->{$key}]\n"; } return 1; } sub get_fields_plan { my $tblFields = shift; $tblFields->{'DATE_EN'} = ["__________ __________ Date",-1]; $tblFields->{'HEADER1'} = ["Header1 Header1 Header1",-1]; $tblFields->{'HEADER2'} = ["Header2 Header2 Header2",-1]; $tblFields->{'HEADER3'} = ["Header3 Header3 Header3",-1]; $tblFields->{'HEADER4'} = ["Header4 Header4 Header4",-1]; $tblFields->{'HEADER5'} = ["Header4 Header5 Header5",-1]; $tblFields->{'HEADER6'} = ["Header6 Header6 Header6",-1]; return $tblFields; } sub assign_headers { my $hdrmap = shift; my $headline = shift; my $sptagref = shift; $$sptagref = '\s,\s' if $$headline=~ m/,/; my @hdrs = split(/$$sptagref/, $$headline); foreach my $hdr (@hdrs){ trim(\$hdr); print "CHECKING HEADERS ------------\n"; } foreach my $key (sort keys %{$hdrmap}) { my ( $index )= grep { $hdrs[$_] eq $hdrmap->{$key}[0] } 0. +.$#hdrs; if (defined ($index)) { $hdrmap->{$key}[1] = $index; print "HEADER MATCH: " . "\t" . $key . "\t" . $h +drmap->{$key}[0] . "\t[" . $hdrmap->{$key}[1]. "]\n"; } else { print "HEADER NOT FOUND: " . "\t" . $key . "\t" . $hdrma +p->{$key}[0] . "\n"; } } print "-----------------------------------------------\n"; } sub assign_names { my $hdrmap = shift; my $flist = shift; @$flist = (); foreach my $key (sort keys %{$hdrmap}) { my $i = $hdrmap->{$key}[1]; push(@{$flist}, $key) if ($i != -1); } } sub trim { my $sref = shift; $$sref =~ s/^\s+//; $$sref =~ s/\s+$//; } [download]	[reply] [d/l] [select]
Re: Space delimted to CSV, Index and data extraction loop by Not_a_Number (Prior) on Jul 30, 2011 at 11:15 UTC
Here's one way of doing the actual parsing bit (ignoring headers, reading from `__DATA__` and printing to the screen for simplicity): use strict; use warnings; my $name_id = 0; my %seen_name; <DATA> for 1 .. 5; # Remove headers while ( my $line = <DATA> ) { next unless $line =~/\w/; if ( $line =~ /Name/ ) { $name_id = parse_name( $line ); } else { my @tmp = split ' ', $line; shift @tmp; my $date = shift @tmp; print join ',', $name_id, $name_id, @tmp, $date; print "\n"; } } print "\n________Names file________\n\n"; for my $key ( sort { $seen_name{$a} <=> $seen_name{$b} } keys %seen_na +me ) { my @names = split /,/, $key; my $val = $seen_name{$key}; print join ",$val,", @names, "\n"; } sub parse_name { my $line = shift; my $name = join ',', ( split ' ', $line )[1, 3]; if ( $seen_name{$name} ) { $name_id = $seen_name{$name}; } else { $name_id += 1; $seen_name{$name} = $name_id; } return $name_id; } [download]	[reply] [d/l] [select]
Re: Space delimted to CSV, Index and data extraction loop by SixShot (Novice) on Jul 30, 2011 at 13:31 UTC
From a fav song of mine..."You get by with a little help from your friends". My appreciation cannot be expressed for your quick and most helpful replies. I am off and running again and l will update or ask for more advice as l move on. Hmmm 10 days l know is asking a lot im like luke skywalker when he first uses the force.	[reply]