Perlwanab has asked for the wisdom of the Perl Monks concerning the following question:

I'm opening and reading in 30.txt to <SOURCE> filehandle. I open searchfile.txt to <FILE2SEARCH> and assigned the elements to @sfile and then close <FILE2SEARCH>. I read in SOURCE one record at a time and assign it to $script, chomp the new line off $script then I want to grep $script for each element of @sfile. When the first record of <SOURCE> is done, I want to take the next record from <SOURCE> and grep that against @sfile and so on. For each statement that is true, I want the element from @sfile to be assigned to @line. Instead of @line having only those elements from @sfile matching each $script variable, @line is a duplicate copy of @sfile. What am I missing?

You don't see it below, but I did some testing to see if each record of <SOURCE> is being recognized. So after chomp = $script; I added a line to print each $script value and it printed each $script variable.

#!/usr/bin/perl -w open(SOURCE, "30.txt") || die "Cannot open: $!"; open(FILE2SEARCH, "searchfile.txt") || die "Cannot open: $!"; @sfile=<FILE2SEARCH>; close(FILE2SEARCH); while (<SOURCE>) { $script = <SOURCE>; chomp $script; <p> the next line is where my problem begins. I have warning on, but I + get no errors.</p> @line=grep( /$script/, @sfile ); } close(SOURCE); open(DEST, ">>newfile.txt") or die "Can't open new.cfg: $!"; print DEST @line; close(DEST);
This is what 30.txt looks like big.script.sh onetime.scrip.sh pay.sh scripta.sh scripta.1.sh scriptbb.sh scriptbb.1.sh scriptbb.2.sh
This is what searchfile.txt looks like file="^billing.file*" id=none synccmd="/foo/bin/scriptbb2.sh + %P %D %F" file="^pay.file*" id=none synccmd="/foo/bin/pay.sh %P %D + %F" file="^car.file*" id=none synccmd="/foo/bin/big.script.s +h %P %D %F" file="^last.file*" id=none synccmd="/foo/bin/nowhere.scri +pt.sh %P %D %F" file="^grass.file*" id=none synccmd="/foo/bin/grass.script +.sh %P %D %F" file="^cart.file*" id=none synccmd="/foo/bin/cart.script. +sh %P %D %F" file="^mortgage.file*" id=none synccmd="/foo/bin/big.script.s +h %P %D %F" file="^lincoln.file*" id=none synccmd="/foo/bin/onetime.scri +pt.sh %P %D %F" file="^music.file*" id=none synccmd="/foo/bin/scripta.sh % +P %D %F" file="^house.file*" id=none synccmd="/foo/bin/scripta.1.sh + %P %D %F" file="^garage.file*" id=none synccmd="/foo/bin/scripta.1.sh + %P %D %F" file="^tree.file*" id=none synccmd="/foo/bin/scriptbb.1.s +h %P %D %F" file="^foo.file*" id=none synccmd="/foo/bin/scriptbb.2.s +h %P %D %F" file="^fun.file*" id=none synccmd="/foo/bin/notthis.sh % +P %D %F" file="^done.file*" id=none synccmd="/foo/bin/donethis.sh +%P %D %F" file="^cement.file*" id=none synccmd="/foo/bin/cement.sh %P + %D %F" file="^animal.file*" id=none synccmd="/foo/bin/scripta.sh % +P %D %F"
This is what @line shoud look like after the grep file="^car.file*" id=none synccmd="/foo/bin/big.script.s +h %P %D %F" file="^mortgage.file*" id=none synccmd="/foo/bin/big.script.s +h %P %D %F" file="^lincoln.file*" id=none synccmd="/foo/bin/onetime.scri +pt.sh %P %D %F" file="^pay.file*" id=none synccmd="/foo/bin/pay.sh %P %D + %F" file="^music.file*" id=none synccmd="/foo/bin/scripta.sh % +P %D %F" file="^animal.file*" id=none synccmd="/foo/bin/scripta.sh % +P %D %F" file="^house.file*" id=none synccmd="/foo/bin/scripta.1.sh + %P %D %F" file="^garage.file*" id=none synccmd="/foo/bin/scripta.1.sh + %P %D %F" file="^tree.file*" id=none synccmd="/foo/bin/scriptbb.1.s +h %P %D %F" file="^foo.file*" id=none synccmd="/foo/bin/scriptbb.2.s +h %P %D %F"

Replies are listed 'Best First'.
Re: grep array elemetns against another array
by ikegami (Patriarch) on Oct 11, 2009 at 21:07 UTC

    Instead of @line having only those elements from @sfile matching each $script variable, @line is a duplicate copy of @sfile.

    That's incorrect. That should read

    Because @line only has those elements from @sfile that match each $script variable, @line is a duplicate copy of @sfile.

    One of your patterns is apparently matching every line. Specifically, your last pattern is matching every line (since only the results of the last pattern are kept).

    Here are the problems you do have:

    • while (<SOURCE>) { $script = <SOURCE>;
      skips every second line. I think you want
      while (my $script = <SOURCE>) {
    • You are only keeping the lines that match last pattern. If you want the lines matched by any pattern, change

      @line=grep( /$script/, @sfile );
      to
      push @lines, grep( /$script/, @sfile );

      But that's not good enough. If two patterns can match the same line, you'll end up with duplicate lines in @line. You want something like

      @line = grep( /$script[0]|$script[1]|.../, @sfile );
    • Why are you appending to (awfully named) newfile.txt?

    • Finally, you're using global variables all over the place. Start by using use strict;, and fix the errors that were previously hidden and fix the file handles too.

    Solution:

    #!/usr/bin/perl -w use strict; my $pat_qfn = '30.txt'; my $in_qfn = 'searchfile.txt'; my $out_qfn = 'newfile.txt'; my $pat; { open(my $pat_fh, '<', $pat_qfn) or die("Can't open pattern file $pat_qfn: $!\n"); chomp( my @pats = <$pat_fh> ); ($pat) = map qr/$_/, join '|', @pats; } open(my $in_fh, '<', $in_qfn) or die("Can't open input file $in_qfn: $!\n"); open(my $out_fh, '>', $out_qfn) or die("Can't create output file $out_qfn: $!\n"); while (<$in_fh>) { print $out_fh $_ if /$pat/; }

    Of course, if you don't mind specify the input and outfile files on the command line, the program becomes much simpler and much more flexible.

    #!/usr/bin/perl -w use strict; my $pat_qfn = shift(@ARGV); my $pat; { open(my $pat_fh, '<', $pat_qfn) or die("Can't open pattern file $pat_qfn: $!\n"); chomp( my @pats = <$pat_fh> ); ($pat) = map qr/$_/, join '|', @pats; } while (<>) { print if /$pat/; }
    filter 30.txt searchfile.txt > newfile.txt

    Update: Fixed typo in var name

Re: Perlwanab grep array elements against another array
by moritz (Cardinal) on Oct 11, 2009 at 20:50 UTC
    while (<SOURCE>) { $script = <SOURCE>;

    This ignores every second line in 30.txt, because it reads one item from SOURCE and assigns it to $_, and then another one and assigns it to $script. If that's not what you want, write while (my $script = <SOURCE>) { ... } or so instead.

    It might also help if you gave us a little example of a few inputs, and what output you expect from them.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Perlwanab grep array elemetns against another array
by biohisham (Priest) on Oct 12, 2009 at 10:35 UTC
    ikegami made justice to your question by providing a generous explanation, there are a couple of things I'd like to add, you haven't provided sample inputs of the type of data you wanted to run this script on with regard to SOURCE, FILE2SEARCH and DEST, you haven't shown how this data looks like and whether appending to DEST every time you run this script is acceptable because probably by then DEST would contain a lot of duplicates.

    Another thing, you did not abide by the mantra which says:

    use strict; use warnings;
    This would be saving you a lot of debugging headaches, so you may want to pick up this habit urgently.

    Now, on to your code and changing:

    while (<SOURCE>) { $script = <SOURCE>; chomp $script; @line=grep( /$script/, @sfile ); }
    to:
    while(read(SOURCE, $script,1)){ #reading one byte at a time chomp $script; @line=grep( /$script/, @sfile ); }
    made the script work as well, but I can't tell if it serves what you sought for lack of samples that should have been provided.

    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.