rajkrishna89 has asked for the wisdom of the Perl Monks concerning the following question:

Im trying to extract some lines from a .doc to an excel sheet..

The format of the document is: 1. Hi,perl monks.Have a good day. 2. Checking whether the output is coming.Perlmonks.org 3. HTML formatting is good. Use while formatting.

Im splitting the document into text and im extracting the lines. The o/p is coming like below in excel sheet:

first cell :- Hi,perl monks. Second cell :- Have a good day Third cell:- Checking whether the output is coming. fourth cell:- Perlmonks.org fifth cell :- HTML formatting is good. sixth cell :- Use while formatting.

But i need the output to come like this, i.e the points have to be extracted in means of points.

first cell :- Hi,perl monks. Have a good day Second cell:- Checking whether the output is coming.perlmonks.org Third cell :- HTML formatting is good.Use while formatting.
@files=glob('*.doc'); foreach my $file (@files) { $i=0;$j=0; my $var; $var = $filename."$file"; print $var ; my $document = Win32::OLE -> GetObject("$var"); print "Extracting Text ...\n"; my @array; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(my $paragraph = $enumerate->Next()) { my $text = $paragraph->{Range}->{Text}; $text =~ s/[\n\r\t]//g; $text =~ s/\x0B/\n/g; $text =~ s/\x07//g; chomp $text; my $Data .= $text; @array=split(/\.$/,$Data); foreach my $line( @array) { if($line =~ m/^Document/sis/) { $i=1; $j=0; $Sheet->Cells($row,$col-1)->{'Value'} = $file; } if ($i == 1) { $j=$j+1; } if($line=~ m/$pattern/) { $s=0; } if ($j > 1 && $s!=0) { $Sheet-> Cells($row,$col+6)->{'Value'} = $line ; $row=$row+1; } } }

help out monks

Replies are listed 'Best First'.
Re: Extract lines from document to excel
by Don Coyote (Hermit) on Feb 15, 2012 at 17:24 UTC

    From what I can make out the problem is that you have split all of the paragraphs into sentences. You then place each sentence into a separate cell.

    my $Data = "Sentence One. Sentence Two. Sentence Three. Sentence Four. + Sentence Five."; @array=split(/\.$/,$Data); print "\n"; print "$_.\n" foreach @array;

    this prints

    Sentence One. Sentence Two. Sentence Three. Sentence Four. Sentence F +ive.

    Firstly, consider removing the trailing '$' to print the following as this causes the line to be split at only the last '.'

    Sentence One. Sentence Two. Sentence Three. Sentence Four. Sentence Five.

    After this point you may need to execute some methods to concenate pairs of array elements into another array. Which you can then split into the cells as required. Try:

    #! /usr/bin/perl -T use strict; use warnings; my $Data = " Sentence One. Sentence Two. Sentence Three. Sentence Four +. Sentence Five"; #my @array=split(/\./,$Data); my @array = split /\./, $Data; print "\n"; print "$_.\n" foreach @array; #my $count = scalar int(@array/2); my $count = int @array/2; my @joinarray; for (1..$count){ my $tmp = shift @array; $tmp .= ".".shift @array; push @joinarray, $tmp ; } #if (@array){my $tmp2 = shift @array; push @joinarray, $tmp2}; push @joinarray, shift @array if @array; print "\n"; print "$_.\n" foreach @joinarray; exit 0;

    which prints

    Sentence One. Sentence Two. Sentence Three. Sentence Four. Sentence Five.

    I cannot see that you are opening a spreadsheet within this snippet though it contains instructions to set out code as if on an open spreadsheet. So I have determined the error to be within the data manipulation and have set out the solution as above.

    There may be a couple of issues that you will need to reconsider as a result of a modification of flow such as this entails.

    • Where '.' are not used to grammatically denote the end of a sentence.
    • Where there are more than two sentences within any of the paragraphs.

    For the full stop problem you may need to enhance the pattern to fit the structure of the end of a sentence such as '.' being followed by a ' ' (space). And...

    Extend the array counting variable routine to suppose variable length paragraphs.

    If you need further help, please show what output you are getting, and the part of the programme you think is causing the error, and supply a brief explanation of what it is not doing that you want it to do.

    Coyote

    Closing statements updated as an afterthought.