in reply to comparing multiple files for patterns

welcome 2015_newbie

In the case you are trying to learn new things, cosider the below, somehow long, oneliner: you'll find many things to learn about the power of Perl commandline see perlrun. The oneliner search for occurences of lines in first file given as arguments in all other files.

Just for a matter of taste i've changed 'police' for 'pretty woman' in your example files..

perl -lne '%ln;BEGIN{open $f,shift;map{chomp;$ln{$_}++}<$f>}print qq($ +ARGV line\t$.\t[$_]) if exists $ln{$_};close ARGV if eof' dog.txt cat.txt other.txt cat.txt line 2 [pretty woman] other.txt line 1 [pretty woman]
In details: perl -lne execute the code 'cause -e, -l does autochomp on lines when there is also -l or -n,-n assumes a while loop reading every file passed as arguments. Again see perlrun

In the oneliner content we have: %ln that put that lines hash into namespace. Is important have it before the following BEGIN block. The BEGIN block executes as soon as possible: so it shift @ARGV (see shift to know why) privating the -n switch of his first argument. That shifted arg is opened and then the list returned by <$f> (the diamond operator return all lines in list context!) is elaborated by map. We are in a BEGIN block so (i suppose) is too early for the -l switch to do his autochomp so in the block we chomp and autoincrement the value of the key $_ (is the current line feeded by <$f>) of the hash %ln.

Now we are in the main body of the oneliner where -ln are in effect; we print the current filename ($ARGV when using the diamond operator <> see perlvar) the line num $. (again perlvar) and the current line $_ but we print only if exists the corresponding hash entry $ln{$_}

close ARGV if eof close the special filhandle ARGV(se perlvar) if eof is reached: this is important because $. does not reset in case of implicit close of a filehandle, as in our case. Remove that part to see $. constantly increasing for every file opened.

Have fun and happy new year (maybe reading Perl White Magic - Special Variables and Command Line Switches)
L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: comparing multiple files for patterns -- oneliner explained
by 2015_newbie (Novice) on Jan 04, 2016 at 03:21 UTC
    Hoping everyone had a great New Year and thanks for the replies. I started trying some more tactics to find out how to compare columns and lines. I am trying it as an exercise. There are actually not 50 files, but I meant it could be more than 2. Here are actual files I created: first.txt
    /vol/cat,feline /vol/dog,canine /vol/cat,feline /vol/cat,feline /vol/amphibian,FROG /vol/amphibian,FROG
    second.txt
    9,/vol/elephant,fourfeet 1999,/vol/dolphin,fish 10,/vol/cat,feline 1111,/vol/goldfish,fish 2222,/vol/spider,arachnid 5555,/vol/camel,dromedary 3333,/vol/wolf,canine
    I am trying to do the following: 1. select the /vol/cat,feline as the element common in the array - this will require that the first column be excluded. 2. If the /vol/cat,feline is found (if an element is found in common to the array, print the ID # - for example #10. Here is what I did so far:
    use strict; sub get_animal { open my $FILE, '<', shift or die $!; return map {chop; $_ => $_} <$FILE>; } my %a = get_animal '/tmp/first.txt'; my %b = get_animal '/tmp/second.txt'; { print "$_\n" for grep {$_} @a{keys %b}; }
    It works if the column with the numbers in it is deleted from second.txt. I don't know how to make it compare the first and second columns from first file with the second and third columns from the second file. After that, it needs to return the ID # when it finds a match. Any ideas?
      Hello, i cannot fully understand your requirements, given the two example files; can you rephrase?

      Some observations:
      • where is use warnings;?
      • do not use uppercase variables names
      • also rembember to close your filehandles anyway: it is safer.
      • chop is not chomp
      • Avoid a or b as variable name: the scalar form are special variables and even if the hash is not, avoid it anyway. Instead choos meaningfull variables names
      • when learning or debugging i think is preferable to write down plain syntaxes: you have a superfluous bare block: why? the syntax inside it is not so begenner's one. How can inpsect it without a place where insert the basic debugging tool aka print?
      #{ # print "$_\n" for grep {$_} @a{keys %b}; #} # # should be something like (untested..) foreach my $bkey (keys %b) { warn "key not defined" unless $bkey; ## what is the purpose of you +r "grep {$_}"??? if ($a{$bkey}){print "FOUND: [$bkey] in the hash \%a\n"} else{print "NOT found key [$bkey] in the hash \%a\n"} }

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.