comment on

There's a number of things you can do to improve this code. Here's a list of some of the things I found.

Always use strict and warnings. See "perlintro -- Perl introduction for beginners" for further discussion.
Use lexical variables, not package variables. Use them in the smallest possible scope. They're faster and you won't suffer from any of the classic problems associated with global variables.
Order your list of unions. If you have some domain-specific knowledge regarding which are more likely to be found, put them first; if not, put the shortest strings first because it's quicker to match a short string than a longer one.
Canonicalise your data. It's quite possible there's extraneous whitespace in the union names; long names may be spread over two lines in the agreements; someone may have failed to capitalise part of a name. Whatever you do to the agreement text, do the same to the list of unions.
Avoid regexes wherever possible. Perl's string handling functions, such as index, are typically, measurably faster than a regex.
If you're only looking for the first match, don't use grep; List::Util's first function is likely a much better choice.
Given you're dealing with .txt and .pdf files, these are probably relatively small and would easily fit into memory. Slurp the entire text of each file into a single string and perform your matches on that. (See the special variable $/.) If any names are spread over multiple lines, attempting to match on a line-by-line basis will be pointless.
Check your work as you go. Use print statements to check that variables contain reasonable values. Your regex /.pdf.txt/ is quite clearly not doing what you want: a print statement showing that the array @files was empty would have told you that.

I put together a short script to show how all of those points might be implemented. I also dummied up some highly contrived data just to give the script something to work on.

Here's union.txt (note the second line has an extra space):

$ cat union.txt 
ABC Union
XYZ  Union
[download]

I then created a number of very short files in two directories. These have one, two or no matches; one has a name spread over two lines with a slew of extra whitespace; one file is completely empty.

$ for i in agreements other; do for j in `ls $i`; do echo "*** $i/$j *
+**"; cat $i/$j; done; done
*** agreements/abc.txt ***
...
ABC Union
...
*** agreements/abc_xyz.pdf ***
...
XYZ Union and ABC Union
...
*** agreements/def.txt ***
...
DEF Union
...
*** agreements/pqrpdf ***
... temp data ...
*** agreements/xyz.pdf ***
.................. XYZ
   Union .............
*** other/dummy_empty ***
[download]

Here's the script to process that data:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

use File::Spec;
use List::Util 'first';

{
    my $union_file = 'union.txt';
    my @dirs = qw{agreements other};

    my $unions = get_unions($union_file);

    print "Unions to check:\n";
    print "\t$_\n" for @$unions;

    process_files($_, $unions) for @dirs;
}

sub get_unions {
    my ($union_file) = @_;

    open my $fh, '<', $union_file;

    my @unions;

    while (<$fh>) {
        chomp;
        y/ / /s;
        push @unions, $_; 
    }

    return [ sort { length $a <=> length $b } @unions ];
}

sub process_files {
    my ($dir, $unions) = @_;

    print "Prcessing directory: $dir\n";

    opendir(my $dh, $dir);

    for (grep /\.(?:txt|pdf)\z/, readdir $dh) {
        my $path = File::Spec::->catfile($dir, $_);

        print "\tProcessing path: $path\n";

        my $text = do { open my $fh, '<:crlf', $path; local $/; <$fh> 
+};
        $text =~ y/ \n/ /s;
        my $found = first { -1 < index $text, $_ } @$unions;

        if (defined $found) {
            print "\t\tMATCH: $found\n";
        }
        else {
            print "\t\tNo matches found.\n";
        }
    }

    return;
}
[download]

Here's the output:

Unions to check:
    ABC Union
    XYZ Union
Prcessing directory: agreements
    Processing path: agreements/abc.txt
        MATCH: ABC Union
    Processing path: agreements/abc_xyz.pdf
        MATCH: ABC Union
    Processing path: agreements/def.txt
        No matches found.
    Processing path: agreements/xyz.pdf
        MATCH: XYZ Union
Prcessing directory: other
[download]

Take whatever ideas, or actual code, you want from that. I'd recommend you run Benchmarks to see what improvements you're making: probably also useful for the person who told you "... my codes aren't efficient ...".

— Ken

In reply to Re: How to efficiently search for list of strings in multiple files? by kcott
in thread How to efficiently search for list of strings in multiple files? by stray_tachyon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.