Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Using OLE to find MS Word paragraph numbers

by Ray Smith (Beadle)
on Oct 20, 2011 at 11:54 UTC ( [id://932635]=perlquestion: print w/replies, xml ) Need Help??

Ray Smith has asked for the wisdom of the Perl Monks concerning the following question:

I have used Win32::OLE to do some parsing of MS Word files. I've been unsuccessful in my attempt to the paragraph numbers from the source file.

The following is a likely example in Visual Basic, but I've not been able to successfully translate it into Perl.

If Selection.Paragraphs(1).Range.ListParagraphs.Count = 1 Then MsgBox Selection.Paragraphs(1).Range.ListFormat.ListLevelNumber Else MsgBox "Not a numbered paragraph" End If

The following is a snipit of my working OLE MS Word access:

sub doc2pt { my ( $docfile, $docpt, # Converted text file ) = @_; require Win32::OLE; require Win32::OLE::Enum; # NOTE: Win32::OLE appears to need + abs path my $document = Win32::OLE -> GetObject(abs_path($docfile)); die "Can't GetObject($docfile) $!\n" if !defined($document); print "Extracting Text ...\n"; open DOCPT, ">$docpt" or die "Can't open docpt file $docpt: $!"; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while (defined($paragraph = $enumerate->Next())) { my $style = $paragraph->{Style}->{NameLocal}; print DOCPT "style==>$style\n"; ### my $range = $paragraph->{Range}; ### my $count = $range->ListParagraph(); ### if (defined($count)) { ### my $paranum = $range->ListFormatListLevelNumber(); ### print DOCPT "paranum==>$paranum\n"; ### } my $text = $paragraph->{Range}->{Text}; $text =~ s/[\n\r]//g; $text =~ s/\x0b/\n/g; print DOCPT "text==>$text\n"; } close DOCPT; }

I would greatly appreciate any suggestions on how to solve this problem.

Replies are listed 'Best First'.
Re: Using OLE to find MS Word paragraph numbers
by Util (Priest) on Oct 20, 2011 at 14:13 UTC

    1. Add my before $paragraph in while(defined($paragraph=$enumerate->Next())).
    2. Typo: Change ListParagraph to ListParagraphs.
    3. Typo: Change ListFormatListLevelNumber to ListFormat->ListLevelNumber.
    4. You might need ListFormat->ListValue instead of (or in addition to) ListFormat->ListLevelNumber.
    5. To match your VB code, check for $count->{Count} == 1 inside the if (defined $count) {...} block.

    Working, tested code:

    #!/usr/bin/perl use strict; use warnings; use Cwd qw( abs_path ); use Data::Dumper; doc2pt( "C:/a/perls/pm/PM_932635_data.doc" ); sub doc2pt { die if @_ != 1; my ( $doc_path ) = @_; ( my $out_path = $doc_path ) =~ s{\.doc$}{.out} or die "doc_path '$doc_path' does not end in '.doc'"; require Win32::OLE; require Win32::OLE::Enum; # NOTE: Win32::OLE appears to need abs path my $abs_doc_path = abs_path($doc_path); my $document = Win32::OLE->GetObject($abs_doc_path) or die "Can't GetObject($abs_doc_path) $!\n"; print "Extracting Text ...\n"; open my $out_fh, '>', $out_path or die "Can't open output file '$out_path': $!"; my $debug = sub { die if @_ != 2; my ( $name, $value ) = @_; local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; printf {$out_fh} "%-10s ==> %s", $name, Dumper $value; }; my $paragraphs = $document->Paragraphs(); my $enumerate = Win32::OLE::Enum->new($paragraphs); while ( my $paragraph = $enumerate->Next() ) { my $style = $paragraph->{Style}->{NameLocal}; $debug->( style => $style ); my $range = $paragraph->{Range}; my $count = $range->ListParagraphs(); if ( defined $count ) { my $real_count = $count->{Count}; my $paranum = $range->ListFormat->ListLevelNumber(); my $paraval = $range->ListFormat->ListValue(); $debug->( real_count => $real_count ); $debug->( paranum => $paranum ); $debug->( paraval => $paraval ); } my $text = $paragraph->{Range}->{Text}; $text =~ tr{\n\r}{}d; $text =~ tr{\x0b}{\n}; $debug->( text => $text ); } close $out_fh; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://932635]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-03-29 15:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found