Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

CSV to Excel Converter

by eric256 (Parson)
on Aug 27, 2007 at 22:36 UTC ( [id://635437]=CUFP: print w/replies, xml ) Need Help??

This script takes a csv and converts it to excel. It automatically creates new sheets when you reach 50_000 rows. Excels limit is higher than that around 65k but this way they break into nice units. I often have csv files with 100k+ rows and this is a handy way to get them into excel.

#!/usr/bin/perl use strict; use warnings; use Spreadsheet::WriteExcel; use Text::CSV::Simple; my $infile = shift; usage() unless defined $infile && -f $infile; my $parser = Text::CSV::Simple->new; my @data = $parser->read_file($infile); my $headers = shift @data; my $outfile = shift || $infile . ".xls"; my $subject = shift || 'worksheet'; sub usage { print "csv2xls infile [outfile] [subject]\n"; exit; } my $workbook = Spreadsheet::WriteExcel->new($outfile); my $bold = $workbook->add_format(); $bold->set_bold(1); import_data($workbook, $subject, $headers, \@data); # Add a worksheet sub import_data { my $workbook = shift; my $base_name = shift; my $colums = shift; my $data = shift; my $limit = shift || 50_000; my $start_row = shift || 1; my $worksheet = $workbook->add_worksheet($base_name); $worksheet->add_write_handler(qr[\w], \&store_string_widths); my $w = 1; $worksheet->write('A' . $start_row, $colums, ,$bold); my $i = $start_row; my $qty = 0; for my $row (@$data) { $qty++; if ($i > $limit) { $i = $start_row; $w++; $worksheet = $workbook->add_worksheet("$base_name - $w"); $worksheet->write('A1', $colums,$bold); } $worksheet->write($i++, 0, $row); } autofit_columns($worksheet); warn "Convereted $qty rows."; return $worksheet; } ###################################################################### +######### ###################################################################### +######### # # Functions used for Autofit. # ###################################################################### +######### # # Adjust the column widths to fit the longest string in the column. # sub autofit_columns { my $worksheet = shift; my $col = 0; for my $width (@{$worksheet->{__col_widths}}) { $worksheet->set_column($col, $col, $width) if $width; $col++; } } ###################################################################### +######### # # The following function is a callback that was added via add_write_ha +ndler() # above. It modifies the write() function so that it stores the maximu +m # unwrapped width of a string in a column. # sub store_string_widths { my $worksheet = shift; my $col = $_[1]; my $token = $_[2]; # Ignore some tokens that we aren't interested in. return if not defined $token; # Ignore undefs. return if $token eq ''; # Ignore blank cells. return if ref $token eq 'ARRAY'; # Ignore array refs. return if $token =~ /^=/; # Ignore formula # Ignore numbers #return if $token =~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+ +))?$/; # Ignore various internal and external hyperlinks. In a real scena +rio # you may wish to track the length of the optional strings used wi +th # urls. return if $token =~ m{^[fh]tt?ps?://}; return if $token =~ m{^mailto:}; return if $token =~ m{^(?:in|ex)ternal:}; # We store the string width as data in the Worksheet object. We us +e # a double underscore key name to avoid conflicts with future name +s. # my $old_width = $worksheet->{__col_widths}->[$col]; my $string_width = string_width($token); if (not defined $old_width or $string_width > $old_width) { # You may wish to set a minimum column width as follows. #return undef if $string_width < 10; $worksheet->{__col_widths}->[$col] = $string_width; } # Return control to write(); return undef; } ###################################################################### +######### # # Very simple conversion between string length and string width for Ar +ial 10. # See below for a more sophisticated method. # sub string_width { return length $_[0]; }

___________
Eric Hodges

Replies are listed 'Best First'.
Re: CSV to Excel Converter
by graff (Chancellor) on Aug 28, 2007 at 01:22 UTC
    Thanks (and ++)! Just a few minor nits...

    Does everything in the output xls file really have to be bold?

    Given that you start out like this:

    my $infile = shift; usage() unless defined $infile && -f $infile;
    your usage synopsis really should be like this:
    sub usage { print "csv2xls infile [outfile] [subject]\n"; exit; }
    (i.e. no brackets around "infile", because it is a required arg)

    When I saw this:

    # We store the string width as data in the Worksheet object. We us +e # a double underscore key name to avoid conflicts with future name +s. # my $old_width = $worksheet->{__col_widths}->[$col];
    I was reminded of having to face a similar issue with DBI; in that module, the handy convention is that any time you want to add some "novel" attributes (hash keys) to the "normal" module object, you simply prefix the attribute name with "private_" -- I don't remember all the circumstances that made this "the right way to do it" with DBI, but it seems like a nice convention to use in general.

    BTW, doesn't Excel already support a simple means for importing data from a csv file? I can appreciate the notion of having column widths adjusted for me automatically as the data are loaded, but in most cases, adjusting the column width in Excel isn't that much harder. I also understand the issues with really long csv data sets and the need to split them up somehow to be digestible by Excel, but in those cases, I'd have to wonder: why use Excel at all for that kind of data?

    As for the last chunk of your code:

    # Very simple conversion between string length and string width for Ar +ial 10. # See below for a more sophisticated method. # sub string_width { return length $_[0]; }
    Is that supposed to be a joke, or did you actually leave something off at the end when you posted this? (just curious)

      Everything isn't bold, just the headers.

      The usage should definitly be changed.../me goes to fix it now.

      From these two points down I plead guilty! That code is all cut and past from the Spreadsheet::WriteExcel examples and once it worked I never looked back ;). I needed a way to make fairly nice excel files out of 100-300k record CSV files, and so I put this together. I agree its pretty worthless loaded into excel and spread accross six worksheets, but when thats what they want no matter what they are told thats what they get. I'm working with someone who doesn't believe the totals i produce so they want the data, of coures they've never found any discrepancies, but some people just like to be in control ;)


      ___________
      Eric Hodges

        Thanks Eric! This works very well. I am in the middle of changing the field headers, but everything works like it was meant to work.

      Hi, just tried your script, and somehow it does convert "input" to Excel. But not CSV Files. To be specific it converts everything BUT CSV-Files. When I noticed that I made an input file myself like "1";"2";"3" 1;2;3 '1';'2';'3' 1 2 3 The result looked like Failed on "1";"2";"3" and an Excelfile like 1;2;3 (in bold) '1';'2';'3' 1 2 3 each written in the first column of course ;) Any idea why the script fails with CSV-Format?
Re: CSV to Excel Converter
by Tux (Canon) on Sep 21, 2007 at 08:24 UTC

    I don't want to take away the fun you have in creating utilities like this, but have you taken the time to look at the csv2xls utility that I ship with Text-CSV_XS in the examples folder?

    
    usage: csv2xls -s <sep> -q <quot> -w <width> -d <dtfmt>
                   -o <xls> file.csv
           -s <sep>   use <sep>   as seperator char. Auto-detect, default = ','
                      The string "tab" is allowed.
           -e <esc>   use <sep>   as seperator char. Auto-detect, default = ','
                      The string "undef" is allowed.
           -q <quot>  use <quot>  as quotation char. Default = '"'
                      The string "undef" will disable quotation.
           -w <width> use <width> as default minimum column width (4)
           -o <xls>   write output to file named <xls>, defaults
                      to input file name with .csv replaced with .xls
                      if from standard input, defaults to csv2xls.xls
           -F         allow formula's. Otherwise fields starting with
                      an equal sign are forced to string
           -f         force usage of <xls> if already exists (unlink before use)
           -d <dtfmt> use <dtfmt> as date formats.   Default = 'dd-mm-yyyy'
           -D cols    only convert dates in columns <cols>. Default is everywhere.
           -u         CSV is UTF8
           -v <lvl> verbosity (default = 1)
    

    It is a script I already use in real-life for ages now. It is a working script that already deals with big files

    Updated on 2013-04-23 to show most recent usage and remove restriction


    Enjoy, Have FUN! H.Merijn

      Does that automaticaly wrap to a second worksheet when to many records are added?


      ___________
      Eric Hodges

        No, it switches to Spreadsheet::WriteExcel::Big when the incoming size is over a certain bound. I bet that adding switching to a new sheet would be an easy to add option.

        I myself have never wanted that. I want a sheet to be all or nothing. If it doesn't fit, I would have to invent some logic to break up the source, instead of breaking up the destination


        Enjoy, Have FUN! H.Merijn
        how to implement this code after downloading .... i know nothing bout it ...i just want to change an csv to excel properly
Re: CSV to Excel Converter
by Aim9b (Monk) on Sep 20, 2007 at 15:48 UTC
    eric256, thanks for this post. I'm having to do the same thing & I'm looking at differences between your program, using Spreadsheet::WriteExcel, and mine using Win32::OLE. So far, mine works, but could be "cooler". Yes, MS Excel will read a CVS file, but like you, my users want it "created for them", then they'll tweak it. Thanks again, it really helps with the learning curve.
      hi Aim9b, i'm also doing the same thing as you i have to convert a CSV file into an excel but i would like to use win32::OLE cos it's already install in activeperl don;t wanna go through the hassle of installing the other modules anyway like to know how to start off on writing a perl script to convert a CSV file to an excel using win32::OLE
        If I can figure out this weekend how to put code in here using the "Download" feature or the scratchpad, I'll send you what I have so far. Promise not to laugh too hard.
Re: CSV to Excel Converter
by narendra (Initiate) on Oct 06, 2011 at 07:54 UTC
    It is very very helpful. Please keep it up
Re: CSV to Excel Converter
by bofsthus (Initiate) on Apr 22, 2013 at 22:52 UTC
    Can anyone help me with being able to instead of creating a new Workbook just adding a sheet to an existing Workbook?
        I'm trying automate some Data Collections and want a single Spreadsheet then posting it to our internal website. If it was a one off deal I would of course merge the Sheets. Does WriteExcel have a way to do a merge?
Re: CSV to Excel Converter
by Anonymous Monk on Jul 17, 2013 at 08:29 UTC
    There is another CSV to Excel Converter that uses cloud based technology which is growing rapidly these days to convert not only excel but all MS Office files to other formats. If you want to try this API below is the link: http://www.aspose.com/java/excel-component.aspx

      There is another CSV to Excel Converter that uses cloud based technology which is growing rapidly these days to convert not only excel but all MS Office files to other formats. If you want to try this API below is the link: http://www.aspose.com/java/excel-component.aspx

      Too bad it gives you aids and cancer and hepatitis and ebola and backspasms

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://635437]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-28 21:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found