bwgoudey has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I'm reading in a file line by line, then splits each line into words. I then want to be able to take each word and run it through a filter program like tr or sed and recieve the output of this command as another string which I can then continue to process.

I tried implementing this in C with multiple pipes and it worked but it all got a little complicated. I am not really great with perl but I was wondering if there was a nice way of doing it that anyone knows about.

Cheers, Benjamin

  • Comment on Running a string through a filter program and recieving the output as a string

Replies are listed 'Best First'.
Re: Running a string through a filter program and recieving the output as a string
by davidrw (Prior) on Sep 30, 2005 at 02:39 UTC
    Here's a basic outline -- you might want to modify the split line.
    open FILE, "<", $file; while(<FILE>); chomp; my @words = split; foreach my $word ( @words ){ # my $newword = `sed $word`; # UPDATE: original lazy typing my $newword = `echo $word | sed`; # UPDATE: more realistic print $newword, "\n"; } } close FILE;
    What exactly is the tr/sed command doing? I suspect that there's a native perl way to do it (e.g. see perldoc -f tr).

    As for using pipes instead of backticks, see the Tutorials and perlipc ...

    Update: changed the backtick command .. i was just lazy the first time :)
      That's it almost. By saying "filter program" I think the OP means the words have to be piped to the filter program. Filters usually take STDIN as input.

      Also, from the description of the OP I assume his program will deal with different filters, set by (e.g.) commandline parameters.

      So
      my $newword = `sed $word`;
      should become
      my $newword = `echo '$word' | $filterprogram`;
      Where $filterprogram has to be set somewhere else in the script.

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
      "What exactly is the tr/sed command doing?"

      From unix man page, in a nut shell:

      "Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors."
      "The tr utility copies the standard input to the standard output with substitution or deletion of selected characters. The options specified and the string1 and string2 operands control translations that occur while copying characters and single-character collating elements."
        I think he meant in this case specifically, since Perl can do tr and sed tasks quite easily.
Re: Running a string through a filter program and recieving the output as a string
by graff (Chancellor) on Sep 30, 2005 at 03:04 UTC
    Perl can be viewed as an implementation of sed, tr and awk, along with many functions of the shell and of C. You won't need to run command lines from a perl script in order to do tr and sed work, because that sort of thing is an integral part of (and much more capably implemented in) Perl.

    In fact, you won't want to run commands for that kind of stuff, especially on a "one command per word" basis, because the overhead of launching and closing down a new shell to run such a command for every word will slow you down tremendously.

    Just use the "tr///" and "s///" operators in Perl, along with the standard IO operations to read/write lines, and split to divide each line into individual words.

Re: Running a string through a filter program and recieving the output as a string
by ioannis (Abbot) on Sep 30, 2005 at 06:03 UTC
    You can always use open2 if you want a bidirectional pipe. This program pipes the command to tr(1) to get back a modified command:
    use IPC::Open2; open2 ( REA, WRI, 'tr a-z A-Z' ) or die $!; for (<DATA>){ print {\*WRI} [(my $cmd, my @rest) = split]->[0]; } close WRI; print <REA>; __END__ an apple a day boat for bloat
      Although when using SystemV (Sunos 5.9) today, I had a complaint about a defunct process being left even after closing both the read and write channels - this being in a subroutine that handled repeated communications with a time-series database language interpreter. At some point there was one defunct process for each time I had called the subroutine. To fix this, I had to modify the code to specifically tell the parent to wait. It seems silly that I should have to do that, but on the other hand it is indeed partially documented in perlipc. Here is what I had to do to fix it, but applied to the above piece of code instead of mine:
      use IPC::Open2; use POSIX ":sys_wait_h"; my $pid = open2 ( REA, WRI, 'tr a-z A-Z' ) or die $!; for (<DATA>){ print {\*WRI} [(my $cmd, my @rest) = split]->[0]; } close WRI; print <REA>; close REA; # but believe it or not, under sysV the zombie is still the +re! waitpid($pid,0); # this will finally kill it cleanly! __END__ # although exit from the program will eventually kill it of co +urse an apple a day boat for bloat

      -M

      Free your mind

Re: Running a string through a filter program and recieving the output as a string
by TomDLux (Vicar) on Sep 30, 2005 at 13:41 UTC

    Piping things to external programs is slow. Starting up sed will involve many microseconds or even milliseconds to spawn an external process, and if you do this in an inner loop, once for each of a few million words, it soon adds up to a long time. It's also unneccessary, since Perl has the ability to perform such transformations itself:

    my $old = 'hat|coat|shoe'; my %new = ( cap => 'hat', jacket => 'coat', shoe => 'boot', ); for my $word ( @words ) { $word =~ tr/[A-Za-z]/[M-ZA-Lm-za-l]/; # decode ROT13 $word =~ s/($old)/$new{$1}/ei; # convert cheap clothes int +o expensive ones }

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Running a string through a filter program and recieving the output as a string
by Moron (Curate) on Oct 03, 2005 at 14:43 UTC
    It is almost essential to use a bidirectional pipe (as already described by ioannis) when a current step in the processing has to be done outside perl before continuing - what IS too slow would be to close and reopen the pipe more than once in cases where one can pipe a complete dataset in (using print) and read the processed results back out processing them exhaustively in perl before closing the read pipe.

    -M

    Free your mind