Re: commands with multiple pipes in perl

If that is your above command,you might want to consider a pure Perl implementation. It appears that you are filtering out lines (grep) and then using a series of calls to awk to split the input into fields and subfields. All of this can be done quite easily in Perl four or five lines of Perl (maybe less) using a regular expression and maybe split. A pure Perl implementation is likely to be much faster as you will only need a single process rather than the 4 you are currently using in your pipe.

The following sample code illustrates how grep and awk can be mapped to Perl constructs. It is a lot more verbose than necessary because I've assigned things to variables to make it clearer exactly what is going on. The real production code could easily be mushed down to no more than four lines inside the while loop and possibly even down to one line (print if regex matches) if splits are replaced by a capturing regular expression:

while(my $line = <DATA>) {
  #grep 'DataDictionary'
  next unless $line =~ /DataDictionary/;

  #awk -F'<pciOFACViolation>' {print $1}
  my @aFields = split(/<pciOFACViolation>/, $line);
  my $sFieldICareAbout = $aFields[0];  #$1 in awk

  #awk '{print $3}'
  @aFields = split(/\s/, $sFieldICareAbout);
  $sFieldICareAbout = $aFields[2];  #$3 in awk

  #awk -F'>' '{print $1}'
  @aFields = split(/>/, $aFields[2]);
  $sFieldICareAbout = $aFields[0]; #$1 in awk
  print "$sFieldICareAbout\n";
}

__DATA__
*** *** G1>H>I<pciOFACViolation>DataDictionary
Whan that aprill with his shoures soote
The droghte of march hath perced to the roote,
And bathed every veyne in swich licour
Of which vertu engendred is the flour;
*** *** G2>H>I<pciOFACViolation>DataDictionary
Whan zephirus eek with his sweete breeth
Inspired hath in every holt and heeth
Tendre croppes, and the yonge sonne
Hath in the ram his halve cours yronne,
And smale foweles maken melodye,
That slepen al the nyght with open ye
(so priketh hem nature in hir corages);
*** *** G3>H>I<pciOFACViolation>DataDictionary
Thanne longen folk to goon on pilgrimages,
And palmeres for to seken straunge strondes,
[download]

The one liner (print if regex) depends heavily on the exact format of each line, particularly the placement of "DataDictionary". To give you a feel for its succinctness, here is the one-line code for the above format of DataDictionary lines.

while(<DATA>) {
  print "$1\n"
    if /^\S+\s+\S+\s+([^>]+).*<pciOFACViolation>.*DataDictionary/;
}
[download]

If you are interested in this approach, perhaps you could give us a few sample lines containing "DataDictionary"?

Best, beth

Update: Added code illustrating mapping of grep and awk to Perl constructs.

Update: Added more succinct example using one line (print if regex).

Comment on Re: commands with multiple pipes in perl Select or Download Code

Replies are listed 'Best First'.
Re^2: commands with multiple pipes in perl by raghu_shekar (Novice) on Mar 17, 2009 at 11:30 UTC
Hi, but the approach you have mentioned here takes a long time as it searches extensively. when i run the entire command from the command line it gives me the output in seconds but when i put it in the script it takes a long time and i had to end the script.even close to 5minutes and no output.. I also reduced the file size and tried still takes a long time making me wonder if the script has hung	[reply]
Re^3: commands with multiple pipes in perl by ELISHEVA (Prior) on Mar 17, 2009 at 12:34 UTC
Curious. It shouldn't be doing any more searching than grep would, assuming you are reading in one line at a time. (if you slurped the file in as one long line that could slow you down a lot). How did you adapt the above code for your situation? Perhaps if you posted the code we might have a better idea of why your program is so slow. When I created a dummy file with the data above repeated 10,000 times (equivalent to a 6.6M file) parsing took only 0.71 seconds (wall clock time). When I upped the size by repeating the file 1,000,000 times (equivalent to a 660M file, more than half a gigabyte) it took 26 seconds. Best, beth	[reply]