in reply to How can I find a line in a RTF file?

Just out of curiosity: Would the RTF::TEXT::Converter module work instead? Have not used it myself, but was theorizing that it might be simpler to process plain text output from this module...?

Different approach... Just a thought...

Update:

I became curious about this so tried the following:

#!/usr/bin/perl use Modern::Perl; use RTF::TEXT::Converter; my $string; my $object = RTF::TEXT::Converter->new(output => \$string); $object->parse_stream( "/home/wjw/tmp/RTF_test.rtf" ); chomp $string; my @string = split("\n", $string); foreach my $line (@string) { chomp $line; next if $line !~ /\w+/; next if $line =~ /^\_+\/\_+\/\_+$/; say $line; }

.. which is in-complete and with a leaning toothpick eye-sore in the second regex, but gets one to the point of only having to deal with the text.

Output is as follows:

Use of uninitialized value $cstylename in string ne at /home/wjw/perl5 +/perlbrew/perls/perl-5.20.0/lib/site_perl/5.20.0/RTF/Control.pm line +1489, <GEN1> line 16. Document #: 000 Version #: 1 Document Owner: Someone Date of Last Update: 10/15/2013 Written by: Someone Status: Approved General Description Purpose Definitions Procedures Sign-Off Approvals Jane Doe, CEO Date John Doe, CFO Date

I did not dive into the warning issued. Note that the install of the referred to module actually depends on two others, one of which I think you were using already.

cpanm RTF::TEXT::Converter --> Working on RTF::TEXT::Converter Fetching http://www.cpan.org/authors/id/S/SA/SARGIE/RTF-Parser-1.12.ta +r.gz ... OK Configuring RTF-Parser-1.12 ... OK ==> Found dependencies: RTF::Tokenizer --> Working on RTF::Tokenizer Fetching http://www.cpan.org/authors/id/S/SA/SARGIE/RTF-Tokenizer-1.18 +.tar.gz ... OK Configuring RTF-Tokenizer-1.18 ... OK Building and testing RTF-Tokenizer-1.18 ... OK Successfully installed RTF-Tokenizer-1.18 Building and testing RTF-Parser-1.12 ... OK Successfully installed RTF-Parser-1.12 2 distributions installed

...the majority is always wrong, and always the last to know about it...

Insanity: Doing the same thing over and over again and expecting different results...

A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

Replies are listed 'Best First'.
Re^2: How can I find a line in a RTF file?
by thanos1983 (Parson) on Aug 08, 2014 at 01:55 UTC

    Hello wjw,

    OMG, everyday that I keep reading posts here I discover so many new modules that Perl has that I can not even keep up. :)

    It looks as possible solution though, thanks for sharing.

    Seeking for Perl wisdom...on the process...not there...yet!
Re^2: How can I find a line in a RTF file?
by kevyt (Scribe) on Aug 10, 2014 at 08:05 UTC
    I got it to work thanks to everyone's help.
    I placed a # in front of "use warnings" in C:\Perl64\site\lib\RTF\Control.pm so I would not see a screen of warning about an uninitialized value in that file.
    I have 3 warnings in the print statement for using of an initialized value.
    Thank you very much!

    Test code:
    #!/usr/bin/perl use Modern::Perl; use RTF::TEXT::Converter; my $string; my $object = RTF::TEXT::Converter->new(output => \$string); $object->parse_stream( '..\Policies\Test\000144028.rtf' ); chomp $string; my @string = split("\n", $string); my $number_of_sigs = 0; foreach my $line (@string) { chomp $line; if ($line =~ m/Date of Last Update:(.*)/){ my $date = $1; $date =~ s/^\s+//; my @date_of_last_update = split (/ /, $date); print "Date of last update is: $date_of_last_update[0]\n"; } if ($line =~ m/Document #:(.*)/){ print "\nDocument number is $1\n"; } # Sign-Off Approvals if ($line =~/_____\/_____\/_____/){ # print $line; $number_of_sigs ++; } } print "\nnumber of signatures is $number_of_sigs \n";

    Finished Code:
    use strict; use warnings; use HTTP::Date; use File::stat; use Time::localtime; use Modern::Perl; use RTF::TEXT::Converter; my $fullpath = my $dir = 'C:\Policies\\'; my %hash; my %months = ( "Jan" => "1", "Feb" => "2", "Mar" => "3", "Apr" => "4", "May" => "5", "Jun" => "6", "Jul" => "7", "Aug" => "8", "Sep" => "9", "Oct" => "10", "Nov" => "11", "Dec" => "12" ); #print "full path is $fullpath \n"; opendir (DIR, $dir) or die $!; #print "dir is $dir \n"; while (my $dir = readdir(DIR)) { my $file = $fullpath = 'C:\Policies\\' . $dir; opendir (FILE, $file) or die $!; while (my $file = readdir(FILE)){ # print "7 reading $file\n"; if (($file =~/RTF/ || $file =~/rtf/ )&& $file !~/^~/ ){ my $path_file = $fullpath . '\\'. $file; #print '\nfullpath = ' . $fullpath . '\\' . $file . '\ +n'; my $string; my $object = RTF::TEXT::Converter->new(output => \$str +ing); $object->parse_stream( $path_file ); # print ctime(stat($fullpath . '\\' . $file)->mtime); my ($wday, $mon, $day, $time, $yr ) = split (/\s+/, ct +ime(stat($fullpath . '\\' . $file)->mtime) ); # split on space # print "\n$wday, $mon, $day, $time, $yr \n"; # print "file is $file my date is $months{$mon}/$day/$ +yr\n\n" ; $hash{$dir}{$file}{UPLOAD_DATE} = $months{$mon} . "/" +. $day . "/" .$yr; chomp $string; my @string = split("\n", $string); my $line_num =0; foreach my $line (@string) { $line_num ++; #print "Reading line number $line_num in $file\ +n"; chomp $line; if ($line =~ m/Date of Last Update:(.*)/){ my $date = $1; $date =~ s/^\s+//; my @last_update_date = split (/ /, $date); #print "Date of last update is: $last_update_da +te[0] line number $line_num\n"; $hash{$dir}{$file}{LAST_UPDATE_DATE} = $last_up +date_date[0]; } # print "5 line is $line \n"; if ($line =~ m/Document #:(.*)/){ # not needed bec +ause the file name is the policy number #print "Document number is $1 line number $lin +e_num\n"; my $Policy_number = $1; $Policy_number =~ s/\t+/ /g; # replace all tab +s with spaces $Policy_number =~ s/^\s+//; # remove leading s +paces my @arr = split (/ /, $Policy_number ); $hash{$dir}{$file}{DOC_NUM_AND_VERSION} = $arr +[0]; # print "array is @arr and element 0 is $arr[0 +]\n"; #print "line is $line \n"; #print "1Path is $fullpath " . "\\" . "$file\n +"; } # Sign-Off Approvals if ($line =~/_____\/_____\/_____/){ #print $line; $hash{$dir}{$file}{NUM_OF_SIGS} ++; } } #print "$hash{$dir}{$file}{NUM_OF_SIGS}\n"; } } #print "dir is $dir\n"; } closedir(DIR); closedir(FILE); print_data (\%hash); sub print_data { my ($h_ref) = @_; open (OUT, '> C:\Dev\Policy_upload_dates.csv') or die ("Can't +open the output file $!"); print OUT "Department,File_Name,Date_of_Last_Update,Upload_Dat +e,Policy_Number,Number_of_Signatures\n"; foreach my $dir (keys %$h_ref){ foreach my $file (keys $h_ref->{$dir}){ printf OUT "%s,%s, %s,%s, %s,%s\n", $dir, $file, $h_ref->{$dir}{$file}{LAST_UPDATE_DATE}, $h_ref->{$dir}{$file}{UPLOAD_DATE}, $h_ref->{$dir}{$file}{DOC_NUM_AND_VERSION}, $h_ref->{$dir}{$file}{NUM_OF_SIGS} ; } } }

    One of many input files:
    COMPUTER APPLICATION SELECTION Policy and Procedure Document #: 123.456.789 Version #: 6 Document Owner: Date of Last Update: 04/22/2003 Written by: INFORMATION SYSTEMS Status: Approved General Description Purpose To establish guidelines for computer application selection. Policy GENERAL INFORMATION RESPONSIBILITY Procedure A. Pages of text … DEAPRTMENT – INFORMATION TECHNOLOGY Document Control Revision History Ver. 1 02/01/1996 INITIAL Sign-Off Approvals The person responsible for this document must verify accuracy and that + the steps for this procedure or work instruction have been tested an +d validated. After you have approved this document, please sign and +date below. ________________________________________________________________ _____ +/_____/_____ ________________________________________________________________ _____ +/_____/_____

    Output:
    Department,File_Name,Date_of_Last_Update,Upload_Date,Policy_Number,Num +ber_of_Signatures Test,000777012.rtf, 08/23/2007,3/29/2010, 000.777.012,3 Test,000777034.rtf, 3/27/2013,6/5/2013, 000.777.034,3 Test,000777056.rtf, 05/10/2013,6/4/2013, 000.777.056,3 Test,000777078.rtf, 3/28/2013,6/13/2013, 000.777.078,3