Re: How to efficiently search for list of strings in multiple files?

There's a number of things you can do to improve this code. Here's a list of some of the things I found.

Always use strict and warnings. See "perlintro -- Perl introduction for beginners" for further discussion.
Use lexical variables, not package variables. Use them in the smallest possible scope. They're faster and you won't suffer from any of the classic problems associated with global variables.
Order your list of unions. If you have some domain-specific knowledge regarding which are more likely to be found, put them first; if not, put the shortest strings first because it's quicker to match a short string than a longer one.
Canonicalise your data. It's quite possible there's extraneous whitespace in the union names; long names may be spread over two lines in the agreements; someone may have failed to capitalise part of a name. Whatever you do to the agreement text, do the same to the list of unions.
Avoid regexes wherever possible. Perl's string handling functions, such as index, are typically, measurably faster than a regex.
If you're only looking for the first match, don't use grep; List::Util's first function is likely a much better choice.
Given you're dealing with .txt and .pdf files, these are probably relatively small and would easily fit into memory. Slurp the entire text of each file into a single string and perform your matches on that. (See the special variable $/.) If any names are spread over multiple lines, attempting to match on a line-by-line basis will be pointless.
Check your work as you go. Use print statements to check that variables contain reasonable values. Your regex /.pdf.txt/ is quite clearly not doing what you want: a print statement showing that the array @files was empty would have told you that.

I put together a short script to show how all of those points might be implemented. I also dummied up some highly contrived data just to give the script something to work on.

Here's union.txt (note the second line has an extra space):

$ cat union.txt 
ABC Union
XYZ  Union
[download]

I then created a number of very short files in two directories. These have one, two or no matches; one has a name spread over two lines with a slew of extra whitespace; one file is completely empty.

$ for i in agreements other; do for j in `ls $i`; do echo "*** $i/$j *
+**"; cat $i/$j; done; done
*** agreements/abc.txt ***
...
ABC Union
...
*** agreements/abc_xyz.pdf ***
...
XYZ Union and ABC Union
...
*** agreements/def.txt ***
...
DEF Union
...
*** agreements/pqrpdf ***
... temp data ...
*** agreements/xyz.pdf ***
.................. XYZ
   Union .............
*** other/dummy_empty ***
[download]

Here's the script to process that data:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use File::Spec;
use List::Util 'first';

{
    my $union_file = 'union.txt';
    my @dirs = qw{agreements other};

    my $unions = get_unions($union_file);

    print "Unions to check:\n";
    print "\t$_\n" for @$unions;

    process_files($_, $unions) for @dirs;
}

sub get_unions {
    my ($union_file) = @_;

    open my $fh, '<', $union_file;

    my @unions;

    while (<$fh>) {
        chomp;
        y/ / /s;
        push @unions, $_; 
    }

    return [ sort { length $a <=> length $b } @unions ];
}

sub process_files {
    my ($dir, $unions) = @_;

    print "Prcessing directory: $dir\n";

    opendir(my $dh, $dir);

    for (grep /\.(?:txt|pdf)\z/, readdir $dh) {
        my $path = File::Spec::->catfile($dir, $_);

        print "\tProcessing path: $path\n";

        my $text = do { open my $fh, '<:crlf', $path; local $/; <$fh> 
+};
        $text =~ y/ \n/ /s;
        my $found = first { -1 < index $text, $_ } @$unions;

        if (defined $found) {
            print "\t\tMATCH: $found\n";
        }
        else {
            print "\t\tNo matches found.\n";
        }
    }

    return;
}
[download]

Here's the output:

Unions to check:
    ABC Union
    XYZ Union
Prcessing directory: agreements
    Processing path: agreements/abc.txt
        MATCH: ABC Union
    Processing path: agreements/abc_xyz.pdf
        MATCH: ABC Union
    Processing path: agreements/def.txt
        No matches found.
    Processing path: agreements/xyz.pdf
        MATCH: XYZ Union
Prcessing directory: other
[download]

Take whatever ideas, or actual code, you want from that. I'd recommend you run Benchmarks to see what improvements you're making: probably also useful for the person who told you "... my codes aren't efficient ...".

— Ken

Comment on Re: How to efficiently search for list of strings in multiple files? Select or Download Code