pooTan has asked for the wisdom of the Perl Monks concerning the following question:

I have been looking at this problem for sometime now. I want to be able to go through bunch of documents and extract some info from MS Word table and output to an HTML file.

Right now, I don't care about the actual extracted cell being outputted to HTML, rather I just want to print to cmd prompt. The problem is it hangs my cmd prompt window when I run it (used to be worse - it kept beeping until I had to take the laptop battery out) and does not output what I want. How do I read the cell and output to cmd correctly?

use warnings; use strict; # we are going be working with MS Word Objects use Win32::OLE qw(in with); # some sites advise me to use these but I am not sure # why we need them and can't really understand # win32::OLE. If there is a way to decipher them I am happy # to know # use Win32::OLE::Const 'Microsoft Word'; # use Win32::OLE::Enum; use CGI; # we'll be having some HTML writing my $text = ""; my $directory = "../sample_folder"; opendir (DH, $directory) || die "can't opendir $directory: $!"; # test whether the item returned by grep is a file and its name does n +ot start with "_" # because there is one redirect input which start with "_" my @dir_list = grep { (/^EN-US_.+/) && -d "$directory/$_" } readdir(DH +); # we are working with Word application my $Word = Win32::OLE->new('Word.Application', 'Quit'); my $root = "C:\\Documents and Settings\\parent_folder"; foreach (@dir_list){ # extract the language abbreviation m/^EN-US_(.+)$/i; my $doc = "$root\\EN-US_$1\\sample_doc.doc"; $Word->Documents->Open("$doc") || die("Unable to open $doc ", Win3 +2::OLE->LastError()); $Word->{Visible}= 0; # we don't need to see Word in an active wind +ow # get the first table my $table = $Word->ActiveDocument->Tables(1); # $table -> Select(); # do I need to select? $text = $table->Cell(1,1)->Range->{Text}; print "*$text*"; #my $language = lc($1); # Prepare OUT_FOOTER file #open(OUT_FOOTER, "> $directory/updates/footer_$language.html") || + die("can't open $directory/updates/footer_$language.html for writing +: $!"); # close document and Word instance print "Closing document and Word\n"; $Word->ActiveDocument->Close(); close OUT_FOOTER; } $Word->Quit; closedir DH;

Replies are listed 'Best First'.
Re: WIN32::OLE MS WORD TABLE CELL READING PERL
by wfsp (Abbot) on Jan 25, 2011 at 07:07 UTC
    You tell us your script "does not output what I want" but don't tell us what that output was. It might have been helpful.

    I would establish that it is in fact Win32::OLE that is locking up your laptop and one way to do that is write your script without it. :-)

    Put your script to one side for a moment and write a short script that demonstrates that you can find the files you want to work with. You refer to a sample_folder at the top of the script (with a path relative to your script - where is your script?) and in the loop you look in parent_folder (with a full path - a better way to go). Is that correct? If it is you would need more error checking.

    Perhaps something like the following would help you see what the script is actually doing (adding print statements) and then adjust if necessary.

    #!/usr/bin/perl use strict; use warnings; my $directory = q{C:/Users/john/Documents/sample_folder}; opendir my $dh, $directory or die qq{cant open *$directory*}; my @dir_list = grep {/^EN-US_.+/ && -d qq{$directory/$_} } readdir $dh +; # does this print what you expect? print qq{dir_list: $directory\n}; print qq{$_\n} for @dir_list; print qq{*****\n}; for my $dir (@dir_list){ my $doc = sprintf(q{%s/%s/sample_doc.doc}, $directory, $dir); # are these the right files? print qq{$doc\n}; if (-f $doc){ print qq{found\n}; } else{ print qq{not found\n}; } }
    When you're sure you can see the files you need then perhaps another short script using Win32::OLE to open one of the files and get at the table cells.

    Get both working nicely and then put them together.

      Thanks for your kind reply. I learned a lot from your script and actually did not know what qq means, now I know.

      Btw, What does sprintf(q{%s/%s...}) mean?

      Nevertheless, I had tried a similar approach before my original post to see whether I can open the word file at all and it was working because I could see the files open. But thanks for the new knowledge. Any more help is appreciated.

      as for the output you asked I simply get an empty zero-length string.

        any news on htis