in reply to How can I find a line in a RTF file?

I dont know if this will be of much help, but i worked for me and prints "Date of Last Update: 10/15/2013".

But but you may have to do a work around unless the dates are always displayed as "01/01/2014" (10 characters), if its "1/1/2014" (8 characters)then you will have to modify the pattern matches position like "@-"+1, and then read one less character as well. which should print " Date of Last Update: 1/1/2014"
use File::Slurp; my $file = read_file( "file.rtf" ); $match = "Date of Last Update: "; if ($file =~ /$match/){ open my $file, "<", "file.rtf"; my $startpos = "@-"+2; #if date is 8 chars then change to "@-"+1 seek $file, $startpos, 0; read $file, my $calender_date, 31; #if date is 8 chars change to 30 print $calender_date; }
And im not sure about the signatures, but hopefully the above will be helpful :)

You can also return the position in which the match happens with @- for beginning of the match or @+ for the end of the match.


nevermind the above <.<
here is working code that will get the exact info you want.
use File::Slurp; my $file = read_file( "1.rtf" ); $match0 = "Date of Last Update: "; $match1 = "Sign-Off Approvals"; $match2 = "Jane Doe, CEO"; if ($file =~ /$match0/){ open my $file, "<", "1.rtf"; my $startpos0 = "@-"; seek $file, $startpos0 +2, 0; read $file, my $calender_date, 29; if ($calender_date =~ /\\/){ seek $file, $startpos0 +2, 0; read $file, my $calender_date, 31; print "\n"; print "$calender_date\n\n"; } else { print "\n"; print "$calender_date\n\n"; } } if ($file =~ /$match1/){ open my $file, "<", "1.rtf"; my $startpos1 = "@+"; seek $file, $startpos1 +42, 0; read $file, my $first_sig, 66; print "$first_sig\n"; print "Jane Doe\n\n"; } if ($file =~ /$match2/){ open my $file, "<", "1.rtf"; my $startpos2 = "@+"; seek $file, $startpos2 +82, 0; read $file, my $second_sig, 67; print "$second_sig\n"; print "John Doe\n\n"; } prints: C:\Users\guy\Desktop\tests>test.pl Date of Last Update: 10/15/2013 ______________________________________________________/_____/_____ Jane Doe ______________________________________________________/_____/_____ John Doe
This simply just pattern matches then seeks from the match.

You should also be able to do something like
if (first_sig = "_____________________________________________________ +_/_____/_____"){ print "There is no first signature"; } else{ print "Document has a first signature"; } if (second_sig = "____________________________________________________ +__/_____/_____"){ print "There is no second signature\n"; } else{ print "Document has a second signature\n"; }
this also includes if your date is only 8 characters eg: 1/1/2013. if not, then dont worry about it.
this is also based off of the example you provided and will only work if every rtf file your editing is formatted identical to it. also fyi, if you open this or any document in a hex editor, you will see the formatting of the file, and can manually parse it to get whatever data you want.

Replies are listed 'Best First'.
Re^2: How can I find a line in a RTF file?
by kevyt (Scribe) on Aug 09, 2014 at 18:43 UTC
    Thanks. I almost have this working. I dont know how the index works in the code above. I need to do more searching for @- and @+.

    I'm scanning the documents on a windows OS because the dates will change if I transfer to linux. the documents might also have several pages of text between the "Date of Last Update" and signatures. I'm on a team of a few people trying to determine why policies take so long to get signed. I have not used perl extensively for several years. Thanks for your help. I'll update the page with the code and 2 sample files when I am done.

    #!/usr/bin/perl use Modern::Perl; use RTF::TEXT::Converter; my $string; my $object = RTF::TEXT::Converter->new(output => \$string); $object->parse_stream( '..\Policies\Test\000144027.rtf' ); chomp $string; my @string = split("\n", $string); my $number_of_sigs = 0; foreach my $line (@string) { chomp $line; if ($line =~ m/Date of Last Update:(.*)/){ my $date_of_last_update = $1; $date_of_last_update =~ s/^\s+//; print "Date of last update is: $date_of_last_update\n"; } if ($line =~ m/Document #:(.*)/){ print "\nDocument number is $1\n"; } # Sign-Off Approvals if ($line =~/'Sign-Off Approvals'/){ print $line; $number_of_sigs ++; } # next; #next if $line !~ /\w+/; #next if $line =~ /^\_+\/\_+\/\_+$/; #say $line; } print "number of signatures is $number_of_sigs \n";
      honestly it shouldnt matter if there are 400 pages between dat and signature. as long as they are all in the same document and there is never another instance of it. also with the code i posted, it shouldnt matter if the dates change at all. they can be 01/01/0001, or 1/1/0001, you can add an extra elsif and add to include if dat is 1/1/01 as well
      and its just a simple pattern match then it will seek so many characters at the beginning or the end of the match. i put in the code i posted @- and @+ to show you how its used. if you want to see how it works, open the file in HxD editor and look on the text side (right side) and compare with the script.
Re^2: How can I find a line in a RTF file?
by kevyt (Scribe) on Aug 09, 2014 at 08:25 UTC
    Wow. Thanks for all of the replies. I was offline for most of the day but I'll read / try these and let you know how it worked! Thanks so much. Kevin