in reply to use of Backticks to catch console output

G'day Takamoto,

[I created a quick test.pdf, with just the text "Test for PM 11148715", for the tests below. I don't have p2t, but I do have pdftotext, which appears to have the same functionality and accepts the same options.]

You appear to have got bogged down in absolute vs. relative paths and eval code.

If your p2t is in a directory listed in $PATH, you don't technically need a path at all; however, using an absolute path avoids tainting.

If all you want to do is print the PDF text, you can use system() or backticks like one of these:

$ perl -e 'system "pdftotext -nodiag -layout -enc UTF-8 test.pdf -"' Test for PM 11148715 $ perl -e 'print `pdftotext -nodiag -layout -enc UTF-8 test.pdf -`' Test for PM 11148715

If you want something a little more robust, that avoids the overhead of using the shell, consider capturex() from IPC::System::Simple. Here's an example (p2t_capturex.pl):

#!/usr/bin/env perl use strict; use warnings; use IPC::System::Simple 'capturex'; my $p2t_exe = 'C:/cygwin64/bin/pdftotext.exe'; my $pdf_doc = 'test.pdf'; print capturex( $p2t_exe => qw{-nodiag -layout -enc UTF-8}, $pdf_doc, '-' );

You get the same output as before:

$ ./p2t_capturex.pl Test for PM 11148715

In case you were wondering, that's the same pdftotext program throughout. Note the identical inode numbers:

$ ls -i1 `which pdftotext` C:/cygwin64/bin/pdftotext.exe 844424931368301 /usr/bin/pdftotext 844424931368301 C:/cygwin64/bin/pdftotext.exe

— Ken

Replies are listed 'Best First'.
Re^2: use of Backticks to catch console output
by haukex (Archbishop) on Dec 11, 2022 at 10:26 UTC
    If you want something a little more robust, that avoids the overhead of using the shell, consider capturex() from IPC::System::Simple.

    It is indeed much more robust and a good suggestion! Just a small nitpick: on Windows it is almost impossible to avoid the shell*. IPC::System::Simple works around this by using Win32::ShellQuote under the hood.

    * Update: See my clarification further down in the thread.

      G'day haukex,

      "Just a small nitpick: on Windows it is almost impossible to avoid the shell."

      I checked out that module's documentation and code not so long; I've just checked again. Both are still very clear that capturex() does not invoke the shell. Here's a selection of extracts (non-exhaustive):

      From SYNOPSIS:

      # As above, but NEVER invokes the shell. my $output = capturex("some_command", @args);

      From source, starting at Line 361 (note the "NO_SHELL"):

      # capturex() is just like backticks/qx, but never invokes the shell. sub capturex { ... if (WINDOWS) { return _win32_capture(NO_SHELL, $valid_returns, $command, +@args); }

      Are you possibly confusing capturex() with capture()? Same source, starting at Line 220 (note the "USE_SHELL"):

      # capture is our way of running a process with backticks/qx semantics sub capture { ... if (WINDOWS) { # USE_SHELL really means "You may use the shell if you nee +d it." return _win32_capture(USE_SHELL, $valid_returns, $command) +; }

      If not a case of confusion, do you think the documentation, code, or something else, is wrong?

      All links and extracts are from the IPC-System-Simple-1.30 distribution (released "Mar 24, 2020").

      — Ken

        From source, ...

        You need to trace the source a little further and look into _win32_capture, where you'll see that Win32::ShellQuote gets called no matter what the value of $use_shell is.

        Both are still very clear that capturex() does not invoke the shell.

        In my experience the documentation of such modules is usually tailored to *NIX. Either that or people eqate using Win32::ShellQuote to "avoiding the shell"*, which isn't quite accurate, and the module does have some edge cases that means it's not the same as execvp on *NIX - like I said, I was being nitpicky :-)

        Have a look at Re^2: Having to manually escape quote character in args to "system"? for more details on calling commands on Windows and why argument quoting is such an issue there.

        * Update: See my clarification further down in the thread.