how to read big postscript files

srikrishnan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: how to read big postscript files by Corion (Patriarch) on May 25, 2010 at 07:00 UTC
Perl has no problem reading big files, so it must be something that your program is doing with the input that makes the program slow. As you don't show any code and input data, it is quite hard for us to help you in a more concrete fashion. Consider testing whether the files open in other programs quickly and correctly. The embedded TIFF images are likely directly dumped as compressed binary data and not as ASCII data, so reading the file line-by-line will likely not work.	[reply]
Re^2: how to read big postscript files by srikrishnan (Beadle) on May 25, 2010 at 08:53 UTC
Hi Corion Thanks for your response Below I have pasted my code use strict; use warnings; use Cwd; my $filename; my $filepath; if($ARGV[0]=~m/((.)[\\\/])?(.?)\.ps$/i) { $filename=$3; if(defined($1)) { $filepath=$1; } else { $filepath=cwd(); $filepath=~s!/!\\!gi; $filepath.="\\"; } } else { Win32::MsgBox("Incorrect argument, Please check", 0, ""); exit; } open(F1, "$ARGV[0]") or Win32::MsgBox("Input File cannot be opened", 1 +6, "Error Message"); undef $/; my $line = <F1>; close F1; my @imgrem; my $imgno = 0; while($line =~ s/\n\%\%BeginObject\: image(.?)\n\%\%EndObject/<img$im +gno>/msi) { my $tmp = $&; push(@imgrem, $tmp); $imgno++; } $line =~ s/$\\266$D r\n/D r\n/msgi; while($line =~ m/\[\/Action \<\< \/Subtype \/URI \/URI $(.+?)$ \>\> +\/Rect \[(\d+) (\d+) (\d+) (\d+)\] \/Border \[0 0 0\] \/LNK pdfmark\n +/gi) { my $temp = "$&"; my $contents = $1; my $originalcontents = $contents; my $x1 = $2; my $y1 = $3; my $x2 = $4; my $y2 = $5; $y1 = $y1 - 100; if($contents !~ /^(http\|www\|mailto)/i) { $contents =~ s/–/\-/gi; $contents =~ s/=/\=/gi; $contents =~ s/&percnt;/\\%/gi; $contents =~ s/&ast;/\/gi; $contents =~ s/&(l\|r)squo;/\'/gi; $line =~ s/\[\/Action \<\< \/Subtype \/URI \/URI $(.+ +?)$ \>\> \/Rect \[(\d+) (\d+) (\d+) (\d+)\] \/Border \[0 0 0\] \/LNK + pdfmark\n/\[\/Action \<\< \/Subtype \/Caret \/Contents $$contents$ + \/Rect \[$x1 $y1 $x2 $y2\] \/Title $Original Text$ \/Subj $Insert +ed Text$ \/Border \[0 0 0\] \/Color \[0 0 1\] \/ANN pdfmark\n/i; } } #while($line =~ s/<img([0-9]+)>/$imgrem[$1]/si){}; open(F2, ">$filepath$filename-out.ps"); print F2 $line; close F2; print "\n\nEnd time ", time() - $^T; [download] The above coding run successfully in the files upto some sizes. for eg. it runs on 100mb file Thanks srikrishnan R.	[reply] [d/l]
Re^3: how to read big postscript files by Corion (Patriarch) on May 25, 2010 at 09:02 UTC
So you don't have a problem with reading large Postscript files, you have a problem with processing them. Maybe it would be less hard on your machine if you didn't process the whole file in one go. For example, you could write the images to disk instead of keeping them around in memory. Also, you can do the replacements on the parts of the file instead of doing the replacements on the file at once. Also, I'm quite unclear what the replacement loop is supposed to be doing, but maybe you can rewrite that code using `/ge` from perlre. It seems to make heavy use of `$&`, which tends to be slow.	[reply] [d/l] [select]
Re: how to read big postscript files by dineed (Scribe) on May 25, 2010 at 07:53 UTC
I don't think I've worked with a postscript file - apologies if the below doesn't make sense. Is there some marker that indicates the beginning and/or end of the tiff image? If so, then perhaps you can use a test for the marker(s) and skip the record containing the tiff image. It sounds to me like you need to either strip out the tiff images prior to entering your read loop (using a regex) or account for the binary data within your read loop. You might even be able to respond to the binary data itself - I did something similar a year or two ago with hex data on nix system.	[reply]
Re^2: how to read big postscript files by srikrishnan (Beadle) on May 25, 2010 at 10:06 UTC
Hi Corion/Dineed, Thanks for your immediate responses. Actually our requirement is we need to change the uri links to Annotations throughout the postscript file. Because the original software which uses for creating the postscript not supports Annotated Postscript files natively. So we try to achieve that by writing a perl script for read and modify the original Postscript file Thanks Srikrishnan R.	[reply]