in reply to how to read big postscript files

Perl has no problem reading big files, so it must be something that your program is doing with the input that makes the program slow. As you don't show any code and input data, it is quite hard for us to help you in a more concrete fashion. Consider testing whether the files open in other programs quickly and correctly. The embedded TIFF images are likely directly dumped as compressed binary data and not as ASCII data, so reading the file line-by-line will likely not work.

Replies are listed 'Best First'.
Re^2: how to read big postscript files
by srikrishnan (Beadle) on May 25, 2010 at 08:53 UTC

    Hi Corion

    Thanks for your response

    Below I have pasted my code

    use strict; use warnings; use Cwd; my $filename; my $filepath; if($ARGV[0]=~m/((.*)[\\\/])?(.*?)\.ps$/i) { $filename=$3; if(defined($1)) { $filepath=$1; } else { $filepath=cwd(); $filepath=~s!/!\\!gi; $filepath.="\\"; } } else { Win32::MsgBox("Incorrect argument, Please check", 0, ""); exit; } open(F1, "$ARGV[0]") or Win32::MsgBox("Input File cannot be opened", 1 +6, "Error Message"); undef $/; my $line = <F1>; close F1; my @imgrem; my $imgno = 0; while($line =~ s/\n\%\%BeginObject\: image(.*?)\n\%\%EndObject/<img$im +gno>/msi) { my $tmp = $&; push(@imgrem, $tmp); $imgno++; } $line =~ s/\(\\266\)D r\n/\(\)D r\n/msgi; while($line =~ m/\[\/Action \<\< \/Subtype \/URI \/URI \((.+?)\) \>\> +\/Rect \[(\d+) (\d+) (\d+) (\d+)\] \/Border \[0 0 0\] \/LNK pdfmark\n +/gi) { my $temp = "$&"; my $contents = $1; my $originalcontents = $contents; my $x1 = $2; my $y1 = $3; my $x2 = $4; my $y2 = $5; $y1 = $y1 - 100; if($contents !~ /^(http|www|mailto)/i) { $contents =~ s/&ndash;/\-/gi; $contents =~ s/&equals;/\=/gi; $contents =~ s/&percnt;/\\%/gi; $contents =~ s/&ast;/\*/gi; $contents =~ s/&(l|r)squo;/\'/gi; $line =~ s/\[\/Action \<\< \/Subtype \/URI \/URI \((.+ +?)\) \>\> \/Rect \[(\d+) (\d+) (\d+) (\d+)\] \/Border \[0 0 0\] \/LNK + pdfmark\n/\[\/Action \<\< \/Subtype \/Caret \/Contents \($contents\) + \/Rect \[$x1 $y1 $x2 $y2\] \/Title \(Original Text\) \/Subj \(Insert +ed Text\) \/Border \[0 0 0\] \/Color \[0 0 1\] \/ANN pdfmark\n/i; } } #while($line =~ s/<img([0-9]+)>/$imgrem[$1]/si){}; open(F2, ">$filepath$filename-out.ps"); print F2 $line; close F2; print "\n\nEnd time ", time() - $^T;

    The above coding run successfully in the files upto some sizes. for eg. it runs on 100mb file

    Thanks

    srikrishnan R.

      So you don't have a problem with reading large Postscript files, you have a problem with processing them.

      Maybe it would be less hard on your machine if you didn't process the whole file in one go. For example, you could write the images to disk instead of keeping them around in memory. Also, you can do the replacements on the parts of the file instead of doing the replacements on the file at once.

      Also, I'm quite unclear what the replacement loop is supposed to be doing, but maybe you can rewrite that code using /ge from perlre. It seems to make heavy use of $&, which tends to be slow.