in reply to shell commands doesn't integrate with perl

Seems to me that your approach is very inefficient. If I understand the OP code, you are extracting patterns to search for from a "labels" file, and then, for each pattern in a given language (english or hindi), you are trying to get all the lines from a particular file that contain the pattern.

But you're reading the latter files again and again for each search pattern, whereas you shouldn't need to read them more than once. The code you posted would probably work (if there were no problems with "magic" characters in the regex patterns), but if "en_1000" or "HI_1000" happen to contain lines where two or more patterns match, those lines get printed multiple times. Is that your intention?

If so, the following should do the same thing (and I think it will go quicker):

#!/usr/bin/perl use strict; use warnings; open( LABELS, "labels" ) or die "labels: $!\n"; my @english; my @hindi; while (<LABELS>) { my ( $eng_indx, $hin_indx ) = map { s/:/|/; $_ } split( /\|/ ); push @english, $eng_indx; push @hindi, $hin_indx; } open( EN, "/home/vikash/pro_1/en_1000" ) or die "en_1000: $!\n"; my @enlines = <EN>; open( HI, "/home/vikash/pro_1/HI_1000" ) or die "HI_1000: $!\n"; my @hilines = <HI>; for my $i ( 0 .. $#english ) { print grep /$english[$i]/, @enlines; print grep /$hindi[$i]/, @hilines; print '*' x 50, "\n"; }
(not tested)

Instead of splitting the English and Hindi "labels" strings on ":", this converts the ":" to "|", so that each label string becomes a single regex with alternations. Then, the "*_1000" files are read only once into memory (this will be a problem if the files are too big).

Replies are listed 'Best First'.
Re^2: shell commands doesn't integrate with perl
by vikashiiitdm (Novice) on Jul 18, 2011 at 09:35 UTC

    this code doesn't work @all, all it does is flooding my screen with asterisks. please do refer to the sample data files i've posted. appreciate ur help very much

      Which just means it doesn't find any of the patterns you are looking for. If I look at your test data, the patterns all look like "EN-1000-0002-1", while the files have ids like "EN--1000-0002-1". a remarkably obvious difference. No wonder the script doesn't find anything