comment on

Scenario:

The following text belongs to a .doc file


File: check1.asm

Function: Monks

Tag: No
Tag: 001

Tag: Yes
Tag: 002

File: check2.asm

Function: Perl Monks

Tag: Yes
Tag: 003

Tag: No
Tag: 004

File: check3.asm

Function: Experts
Tag: No
Tag: 005

Tag: No
Tag: 006

Function: Perl Experts

Tag: No
Tag: 007

Tag: Yes
Tag: 008
[download]

I have to extract the tag which have been tagged as Yes and the corresponding function and file name to an excel sheet..

The output have to be like this:

Tags      Function     File
002       Monks        check1.asm
003       Perl Monks   check2.asm
008       Perl Experts check3.asm
[download]

I have written the following snippet for extracting the tag which is categorized as Yes :

use strict; 
use warnings;
use Win32::OLE;
use Win32::OLE qw(in with);
use Win32::OLE::Variant;
use Win32::OLE::Const 'Microsoft Excel';
use Win32::OLE::Const 'Microsoft Word'; 
use Cwd;
use File::Find;
use Win32::OLE;
use Win32::OLE::Enum;
$Win32::OLE::Warn = 3;                                # die on errors.
+..


my $out_file = 'check.xls';

open my $out_fh, '>', $out_file or die "Could not open file $out_file:
+$!";

my $print_next = 0;
 
 #Globals
our $Word;
our $reviewchklists;
my @scriptfiles;



@scriptfiles=glob('*.doc');
foreach my $file (@scriptfiles)
{ 
    
    my $var;

    
    my $filename = "D\:\\";
    $var = $filename."$file";
    print $var ;
    my $document = Win32::OLE -> GetObject("$var");

    print "Extracting Text ...\n";
    my @array;
    my $paragraphs = $document->Paragraphs();
    my $enumerate = new Win32::OLE::Enum($paragraphs);
    while(my $paragraph = $enumerate->Next())
    {

        my $text = $paragraph->{Range}->{Text}; 
          $text =~ s/[\n\r\t]//g;
          $text =~ s/\x0B/\n/g;
          $text =~ s/\x07//g;
      
         
         chomp $text;
        
        my $Data .= $text;
        
        @array=split(/\.$/,$Data);
        
        foreach my $line( @array)
        {
                
              if ($print_next)
              {
                print $out_fh $line."\n"  ;  # we add a "\n"  ;  #No n
+eed to chomp - we print the "\n"
                  local $\ = "<br>\n";
                local $/="\n\n";
                }
                $print_next =  ($line =~ /^Tag\sYes/);
                
        }
 
    }

 }       #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[download]

The above snippet is printing the output as follows:

ID : 002
ID : 003
ID : 008
[download]

I dont want the ID to be printed and how to extract the corresponding function and file name?

Help out monks!!!

In reply to Extract Multiple Tags by stallion

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.