itsscott has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks Below is a little code that I successfully used to reformat an xml file from using 2 spaces for indenting to using a tab. I am curious if there is a better implementation than this silly system call. Thanks in advance!
# set xmllint to use a tab over 2 spaces $ENV{XMLLINT_INDENT} = "\t"; # create (gag) a system call to call # xmllint and then move the temp file # to the target file my $syscall = 'xmllint --format '.$outfile.' > '.$outfile.'-lint;mv '. +$outfile.'-lint '.$outfile; # ensure the temp file does not exist. unlink("$outfile-lint"); system($syscall);

Replies are listed 'Best First'.
Re: Calling xmllint better than using a system()
by Tanktalus (Canon) on Jul 22, 2011 at 22:47 UTC

    Well, first, define "silly" :-)

    Next, I'd suggest pulling out the mv call and making it a rename call after the system. More portable, and easier to read, IMO.

    Then the actual reformat... that's debatable. I expect that xmllint, being C code, might be faster than trying to use something else (I'd probably use XML::Twig if it mattered, but I usually wouldn't concern myself with trying to make XML pretty - check the source to last hour of cb at some point to see how ugly the html I generate is :-)). There's a point of diminishing returns either way here: for small XML files, the cost of calling out to xmllint may overwhelm an implementation done in XML::Twig, on the other hand, the cost of developing with XML::Twig, given that you already have a working version with xmllint, is going to be significant, too. For larger XML files I would expect xmllint to perform better at this task.

    By the way, if you can avoid the shell altogether, that's a plus, too. Since you're redirecting the output, this can be annoyingly painful to do, so check if xmllint can take an option for output file.

    local $ENV{XMLLINT_INDENT} = "\t"; my $linted = "$outfile-lint"; # for readability unlink $linted; system( # here is how we avoid the shell: don't use a string. Use a li +st qw(xmllint --format --output), $linted, $outfile ); # don't forget to check return code! rename $linted, $outfile;
    Hope that helps,

      First off, thank you for the reply. Silly being using a system() call. It's frowned upon in ansi c, most of our CGI's are written in ansi c, and my support programmer says system calls are expensive and the use of piping is preferred in C for the most part. (not that I have an opinion, I'm self taught and a number of things like that escape me)

      Not to be too much of a newbie, but I don't really get how the above code avoids the shell? Does it still not shell the list? I'll implement that right now. I do prefer the look of it.

      As for pretty XML I personally could care less as well, but when a client is paying and demands pretty XML with tabs and not spaces, which I will say, the use of tabs is quite a bit better for the huge xml files, my little test XML files with 2 spaces indenting 201k with 1 tab per indent it's 184k this will make a massive difference on the multi-megabyte XML files.

        My opinion? Your support programmer needs to do more benchmarking. :-) Using system can be faster in some circumstances. Sometimes, it's a matter of coder's time vs CPU time, especially when the CPU time isn't important. :-)

        The above code does not hit the shell because I'm calling system with a list of parameters instead of a single string. When you call system("some stuff here"), perl does not go and split that on the spaces and call "some" with the parameters "stuff", "here". Instead, it passes the whole thing to the shell, and lets the shell do the splitting up. If there are special metacharacters, the shell will act on them. If you don't need the shell to act on things like redirection and piping between subprocesses, this is all just extra overhead. I know, I just said "using system can be faster" - it all depends on what you're doing. If you need the shell to redirect, then it's far easier to let the shell do it than for you to do it yourself. But for most of the time, you're actually creating the string for the shell to parse, you may as well just create the list.

        When you call system(@list), perl calls a different POSIX API. Perl will automatically perform a fork and exec, but the exec is a execvp (or similar). This is what the shell itself does under the covers. Since everything is already set up as a list, this call bypasses the shell and goes directly to the executable you want. (Perl is nicer than this - it automatically finds the executable in the PATH if necessary, which is a simplification that is really super handy.)

        Normal POSIX system always uses the shell. But perl's system will use POSIX system when it's a single string with either spaces or shell metacharacters, or fork/execvp when it's a list, whether there are spaces/metacharacters or not.

        I always advocate using system(@list) just so as to avoid the shell metacharacters having meaning. With the obvious exception of when you really do want those shell metacharacters to do your work for you :-) It's generally just easier. And cheaper on RAM (one less process running) and on CPU (one less process to initialise/tear down).

        Basically, there's a difference between running another program and "hitting the shell" (using the shell to run another program). In C, I find most people use the shell because system is so much easier thank fork/execvp. Perl already has done that work, so avoiding the shell becomes so much easier. :-)

Re: Calling xmllint better than using a system()
by CountZero (Bishop) on Jul 23, 2011 at 06:45 UTC
    Hey, Perl is the sys-admins glue! Perfectly valid use of our beloved language.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James