Takamoto has asked for the wisdom of the Perl Monks concerning the following question:

I want to capture the output of the following system() call into a variable:

use strict; use warnings; my $p2tExe = '/Desktop/p2t'; my $PathDocumentUnicode = "test.pdf"; my $converted_text= ""; eval { $converted_text = `/Desktop/p2t -nodiag -layout -enc UTF-8 $P +athDocumentUnicode -` }; print $converted_text;

p2t is the executable of the fantastic pdftotext (XpdfReader) suite. I read that system() doesn't return the output, just the exit status, and that I should use Backticks. However the following does not work (Can't exec "/Desktop/p2t": No such file or directory at pdftotext.pl line 9.):

eval { $converted_text = `$p2tExe, "-nodiag", "-layout", "-enc", "UT +F-8", "$PathDocumentUnicode", "-"` };

Replies are listed 'Best First'.
Re: use of Backticks to catch console output
by davies (Monsignor) on Dec 10, 2022 at 15:55 UTC

    On Windows:

    C:\Windows\system32>pwd /c/Windows/system32 C:\Windows\system32>perl -E"say `pwd`" /c/Windows/system32 C:\Windows\system32>perl -E"say eval{`pwd`}" /c/Windows/system32 C:\Windows\system32>perl -E"say eval{'pwd'}" pwd C:\Windows\system32>perl -E"say system('pwd')" /c/Windows/system32 0

    On Linux (Raspbian):

    dr@mail:~ $ pwd /home/dr dr@mail:~ $ perl -E'say `pwd`' /home/dr dr@mail:~ $ perl -E'say eval{`pwd`}' /home/dr dr@mail:~ $ perl -E'say eval{"pwd"}' pwd dr@mail:~ $ perl -E'say system("pwd")' /home/dr 0

    I think you are confusing yourself by combining eval with backticks. Backticks alone should work, as demonstrated above. I prefer qx() to backticks as I find it clearer, but the effect is identical and a matter of preference. Your error message suggests that the executable is either not where you expect or not named what you think, but that's a different problem. Try the examples I have given with your executable and you should end up on the right road.

    Regards,

    John Davies

    Almost immediate update as you solved your problem while I was composing my suggestions: it looks as though the executable was not where you expected. The leading two dots are taking you to a relative directory rather than the absolute one you specified originally.

Re: use of Backticks to catch console output
by kcott (Archbishop) on Dec 10, 2022 at 21:49 UTC

    G'day Takamoto,

    [I created a quick test.pdf, with just the text "Test for PM 11148715", for the tests below. I don't have p2t, but I do have pdftotext, which appears to have the same functionality and accepts the same options.]

    You appear to have got bogged down in absolute vs. relative paths and eval code.

    If your p2t is in a directory listed in $PATH, you don't technically need a path at all; however, using an absolute path avoids tainting.

    If all you want to do is print the PDF text, you can use system() or backticks like one of these:

    $ perl -e 'system "pdftotext -nodiag -layout -enc UTF-8 test.pdf -"' Test for PM 11148715 $ perl -e 'print `pdftotext -nodiag -layout -enc UTF-8 test.pdf -`' Test for PM 11148715

    If you want something a little more robust, that avoids the overhead of using the shell, consider capturex() from IPC::System::Simple. Here's an example (p2t_capturex.pl):

    #!/usr/bin/env perl use strict; use warnings; use IPC::System::Simple 'capturex'; my $p2t_exe = 'C:/cygwin64/bin/pdftotext.exe'; my $pdf_doc = 'test.pdf'; print capturex( $p2t_exe => qw{-nodiag -layout -enc UTF-8}, $pdf_doc, '-' );

    You get the same output as before:

    $ ./p2t_capturex.pl Test for PM 11148715

    In case you were wondering, that's the same pdftotext program throughout. Note the identical inode numbers:

    $ ls -i1 `which pdftotext` C:/cygwin64/bin/pdftotext.exe 844424931368301 /usr/bin/pdftotext 844424931368301 C:/cygwin64/bin/pdftotext.exe

    — Ken

      If you want something a little more robust, that avoids the overhead of using the shell, consider capturex() from IPC::System::Simple.

      It is indeed much more robust and a good suggestion! Just a small nitpick: on Windows it is almost impossible to avoid the shell*. IPC::System::Simple works around this by using Win32::ShellQuote under the hood.

      * Update: See my clarification further down in the thread.

        G'day haukex,

        "Just a small nitpick: on Windows it is almost impossible to avoid the shell."

        I checked out that module's documentation and code not so long; I've just checked again. Both are still very clear that capturex() does not invoke the shell. Here's a selection of extracts (non-exhaustive):

        From SYNOPSIS:

        # As above, but NEVER invokes the shell. my $output = capturex("some_command", @args);

        From source, starting at Line 361 (note the "NO_SHELL"):

        # capturex() is just like backticks/qx, but never invokes the shell. sub capturex { ... if (WINDOWS) { return _win32_capture(NO_SHELL, $valid_returns, $command, +@args); }

        Are you possibly confusing capturex() with capture()? Same source, starting at Line 220 (note the "USE_SHELL"):

        # capture is our way of running a process with backticks/qx semantics sub capture { ... if (WINDOWS) { # USE_SHELL really means "You may use the shell if you nee +d it." return _win32_capture(USE_SHELL, $valid_returns, $command) +; }

        If not a case of confusion, do you think the documentation, code, or something else, is wrong?

        All links and extracts are from the IPC-System-Simple-1.30 distribution (released "Mar 24, 2020").

        — Ken

Re: use of Backticks to catch console output
by Takamoto (Monk) on Dec 10, 2022 at 15:53 UTC

    solved.

    eval { $converted_text = `../Desktop/p2t -nodiag -layout -enc UTF-8 +$PathDocumentUnicode -` }

    Why I have to add '..' is not clear to me. But it does the trick.

      You putting two dots before the path turns the path into a relative path whereas having / as the first character makes it an absolute path. In Windows, the Desktop folder is usually found within the user's folder which is located under C:\Users

      C:\Users\username\Desktop is a valid path but C:\Desktop is not likely unless you have created such a directory.

      Also: It is not common practice in Windows to store exe files in the Desktop directory! The Desktop directory usually has Word documents, pictures, txt files and lnk files.