stallion has asked for the wisdom of the Perl Monks concerning the following question:
Scenario:
The following text belongs to a .doc file
File: check1.asm Function: Monks Tag: No Tag: 001 Tag: Yes Tag: 002 File: check2.asm Function: Perl Monks Tag: Yes Tag: 003 Tag: No Tag: 004 File: check3.asm Function: Experts Tag: No Tag: 005 Tag: No Tag: 006 Function: Perl Experts Tag: No Tag: 007 Tag: Yes Tag: 008
I have to extract the tag which have been tagged as Yes and the corresponding function and file name to an excel sheet..
The output have to be like this:
Tags Function File 002 Monks check1.asm 003 Perl Monks check2.asm 008 Perl Experts check3.asm
I have written the following snippet for extracting the tag which is categorized as Yes :
use strict; use warnings; use Win32::OLE; use Win32::OLE qw(in with); use Win32::OLE::Variant; use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE::Const 'Microsoft Word'; use Cwd; use File::Find; use Win32::OLE; use Win32::OLE::Enum; $Win32::OLE::Warn = 3; # die on errors. +.. my $out_file = 'check.xls'; open my $out_fh, '>', $out_file or die "Could not open file $out_file: +$!"; my $print_next = 0; #Globals our $Word; our $reviewchklists; my @scriptfiles; @scriptfiles=glob('*.doc'); foreach my $file (@scriptfiles) { my $var; my $filename = "D\:\\"; $var = $filename."$file"; print $var ; my $document = Win32::OLE -> GetObject("$var"); print "Extracting Text ...\n"; my @array; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(my $paragraph = $enumerate->Next()) { my $text = $paragraph->{Range}->{Text}; $text =~ s/[\n\r\t]//g; $text =~ s/\x0B/\n/g; $text =~ s/\x07//g; chomp $text; my $Data .= $text; @array=split(/\.$/,$Data); foreach my $line( @array) { if ($print_next) { print $out_fh $line."\n" ; # we add a "\n" ; #No n +eed to chomp - we print the "\n" local $\ = "<br>\n"; local $/="\n\n"; } $print_next = ($line =~ /^Tag\sYes/); } } } #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The above snippet is printing the output as follows:
ID : 002 ID : 003 ID : 008
I dont want the ID to be printed and how to extract the corresponding function and file name?
Help out monks!!!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extract Multiple Tags
by Marshall (Canon) on Dec 22, 2011 at 01:49 UTC | |
|
Re: Extract Multiple Tags
by Anonymous Monk on Dec 21, 2011 at 14:42 UTC | |
|
Re: Extract Multiple Tags
by Cristoforo (Curate) on Dec 22, 2011 at 01:30 UTC |