Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a spreadsheet with around 47 columns and n number of rows. I only need certain columns for further processes in my script.


Ideally, I would need the values from first , third and the columns from 13 -49. I would like to print these in comma separated format like this/assign the values to a variable($id, $snum, @rs)/or in a hash :
1, 23,45, 454,54,45,656,56,565,65,.... 2,343,4545, 5656,56,6767,67,6,767,6,76,767,... 3,454, 56546,4545,6,45,64,5,45,4,5,4,54,54,5,4,...
#!/software/bin/perl BEGIN { unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib' ; unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib2' ; } use strict; #use warnings ; use File::Basename ; use Algorithm::Permute ; use Spreadsheet::ParseExcel; use Getopt::Long; my($excel_file, $outputdir); &GetOptions("if=s" =>\$excel_file, "od=s" =>\$outputdir, ); my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->Parse($excel_file) || die "Cannot parse the fi +le $!\n" ; for my $worksheet ( $workbook->worksheets() ) { my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); for my $row ( 1.. $row_max ) { my @required_col = (0,2,13..47); for my $col(@required_col){ my $cell = $worksheet->get_cell($row, $col ); next unless $cell; print ($row, $col),"\n"; print $cell->value(),"\n"; # print "Unformatted = ", $cell->unformatted(), "\n"; print "\n"; } } }
I would actually want them to either be able to get the values of those columns stored in a variable or able to print to a file in a csv format. Advance thanks for all the suggestions!

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel assigning the column
by Tux (Canon) on Mar 28, 2011 at 15:55 UTC

    Spreadsheet::Read comes with xlscat, which is a script that offers you exactly what you are asking for

    $ xlscat -c -S1 -C1,3,13-49 speadsheet.xls

    Enjoy, Have FUN! H.Merijn
Re: Spreadsheet::ParseExcel assigning the column
by Nikhil Jain (Monk) on Mar 28, 2011 at 16:20 UTC

    you can store values in a array of arrays like,

    my @store; for my $worksheet ( $workbook->worksheets() ) { my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); for my $row ( 0.. $row_max ) { my @required_col = (0,1,2); my @values_store; for my $col(@required_col){ my $cell = $worksheet->get_cell($row, $col ); next unless $cell; print "$row, $col\n"; print $cell->value(),"\n"; #store values in a array push(@values_store, $cell->value()); # print "Unformatted = ", $cell->unformatted(), "\n"; print "\n"; } #push array ref into array push(@store, [@values_store]); } } print Dumper(\@store);

    Output like:

    $VAR1 = [ [ 'Test', 'Excel', 'hello' ], [ 'hello ', 'test', 'PerlMonks' ] ];
Re: Spreadsheet::ParseExcel assigning the column
by Tux (Canon) on Mar 28, 2011 at 16:30 UTC

    Alternatively, do it yourself ...

    use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\n", auto_diag => +1 }); open my $fh, ">", "selection.csv" or die "selection.csv: $!\n"; # Do you really mean all worksheets? foreach my $ws ($workbook->worksheets ()) { my ($row_min, $row_max) = $ws->row_range (); foreach my $row ($row_min .. $row_max) { $csv->print ($fh, [ map { my $c = $ws->get_cell ($row, $_); $c ? $c->value () : undef; } 0, 2, 13..47 ]); } }

    Enjoy, Have FUN! H.Merijn
Re: Spreadsheet::ParseExcel assigning the column
by wind (Priest) on Mar 28, 2011 at 16:40 UTC
    You just need to use Text::CSV to output the new data to a csv file:
    #!/software/bin/perl BEGIN { unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib' ; unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib2' ; } use File::Basename ; use Algorithm::Permute ; use Spreadsheet::ParseExcel; use Getopt::Long; use Text::CSV; use strict; use warnings ; my @required_col = (0,2,13..47); GetOptions( "if=s" => \my $excel_file, "od=s" => \my $outputdir, ); my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->Parse($excel_file); if ( !defined $workbook ) { die $parser->error(), ".\n"; } my $newfile = $excel_file . ".csv"; my $csv = Text::CSV->new ( { eol => "\n" } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, '>', $newfile or die "Can't open $newfile: $!"; my @worksheets = $workbook->worksheets(); warn "More than 1 worksheet found\n" if @worksheets > 1; # Only work with first page. my $worksheet = $worksheets[0]; my ( $row_min, $row_max ) = $worksheet->row_range(); # Skip header row for my $row ( 1 .. $row_max ) { my @data = map { my $cell = $worksheet->get_cell($row, $_); $cell ? $cell->value() : ''; } @required_col; print "@data\n"; $csv->print($fh, \@data); }
      Hi thanks,! Is there any way to return the headers(or the column descriptors) of the Excel file using Spreadsheet::ParseExcel? At the moment am getting them by,
      for my $row ( 0 .. 0 ) { @value = map { my $cell = $worksheet->get_cell($row, $_); $cell ? $cell->value() : ''; } @required_col;
      But, I couldn't pass the @value array to another subroutine; which takes an array of data as mentioned before.

      I need both the arrays in the next subroutine.
      #!/software/bin/perl BEGIN { unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib' ; unshift @INC, '/nfs/users/nfs_d/dmb/perl/lib2' ; unshift @INC, '/nfs/users/nfs_a/aj6/CGP/perl_stuffs/lib/lib/site_p +erl/5.8.8/'; } use File::Basename ; use Algorithm::Permute ; use Spreadsheet::ParseExcel; use Getopt::Long; use Text::CSV; use strict; use warnings ; my @required_col = (0,1,2,13..47); GetOptions( "if=s" => \my $excel_file, "od=s" => \my $outputdir, ); my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->Parse($excel_file); if ( !defined $workbook ) { die $parser->error(), ".\n"; } my $newfile = $excel_file . ".csv"; my $csv = Text::CSV->new ( { eol => "\n" } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, '>', $newfile or die "Can't open $newfile: $!"; my @worksheets = $workbook->worksheets(); warn "More than 1 worksheet found\n" if @worksheets > 1; # Only work with first page. my $worksheet = $worksheets[0]; my @hea = $worksheet->{Header}; print @hea; my ( $row_min, $row_max ) = $worksheet->row_range(); my @value; ##to get the headers of the excel file; for my $row ( 0 .. 0 ) { @value = map { my $cell = $worksheet->get_cell($row, $_); $cell ? $cell->value() : ''; } @required_col; } # Skip header row my @data; for my $row ( 1 .. $row_max ) { @data = map { my $cell = $worksheet->get_cell($row, $_); $cell ? $cell->value() : ''; } @required_col; &changeformat(\@value,\@data); } sub changeformat{ my ($header,$data) = @_; my(%hash1,%hash2); my $plex1 = shift @$header; #since passed through the loop again and +again, shifts the first element every time it passes through the loop $hash2{$plex1} = \@$header; my $plex = shift @$data; my $sangerid = shift @$data; my $supplierid = shift @$data; $hash1{$supplierid} = \@$data; foreach my $key (keys %hash1){ foreach my $k(keys %hash2){ for(my$i = 0;$i<=scalar(@$data);$i++){ # print "$key,$hash1{$key}[$i], $plex,$hash2{$k}[$i]\n" if ($hash +2{$k}[$i] and $key); } } } }

      Any fix /suggestions how to pass it?

      thanks a ton

        This

        my @value; ##to get the headers of the excel file; for my $row ( 0 .. 0 ) { @value = map { my $cell = $worksheet->get_cell($row, $_); $cell ? $cell->value() : ''; } @required_col; }

        can be simplified to just

        ##to get the headers of the excel file; my @value = map { my $cell = $worksheet->get_cell(0, $_); $cell ? $cell->value() : ''; } @required_col;

        And since you're modifying the arrays passed to your subroutine, it looks like you just need to dereference them so that you make a copy.

        sub changeformat { my ($header,$data) = @_; my @header = @$header; my @data = @$data;

        Then only perform your operations on the new @header and @data arrays and not on $header or $data.