Pattern matching with a variable

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a small script that reads single line entries from a file and then searches for that entry in another file. It all seemed to be working fine, until I encountered a situation where there were brackets in the line entry eg. abc (123). In this case, the code didn't find the entry, I am pretty sure because it's evaulating the bracket as part of pattern (regexp). Since I don't neccassrily know what data I will be getting, is there a way to force my variable to $entry to be interpreted as a string and not a regexp. The code below is a simplified version of what i am using...

use FileHandle::Deluxe;
 
$fh = FileHandle::Deluxe->new("library.txt", allow_insecure_code=>1, a
+uto_chomp=>1);
my $library = $fh->contents;
 
open (DATA, "input.txt");
while (<DATA>)
{
  chomp($_);
  $entry = $_;
 
  if ($library =~ /$entry/i)
  {
    print "found - $entry\n";
  }
  else
  {
    print "new - $entry\n";
  }
}
[download]

Thanks in advance.

Comment on Pattern matching with a variable Select or Download Code

Replies are listed 'Best First'.
Re: Pattern matching with a variable by JediWizard (Deacon) on Nov 03, 2004 at 23:05 UTC
I think you are looking for quotemeta or \Q and \E. See perlre. May the Force be with you	[reply] [d/l]
Re: Pattern matching with a variable by ikegami (Patriarch) on Nov 03, 2004 at 23:16 UTC
`if (index(uc($library), uc($entry)) < 0)` or `if ($library =~ /\Q$entry\E/i)`	[reply] [d/l] [select]
Re: Pattern matching with a variable by TROGDOR (Scribe) on Nov 04, 2004 at 09:01 UTC
There are more efficient ways to implement this algorithm. The implementation you've presented is n squared. That is, as input.txt or library.txt become longer, the program takes exponentially more time to complete. If the files are small, this won't be a big deal. If the files are large, you can make this run much more efficiently by using hashes. Hashes are the fastest way to compare the contents of one list against another list. It is an n algorithm, which means you only take a linear hit in run time as input.txt or library.txt grow. `$library = [contents of library.txt] $input = [contents of input.txt] my %in_library = (); foreach $entry (split(/\n/, $library)) { $in_library{"$entry"} = 1; } my @input = split(/\n/, $input); foreach $entry (@input) { if ($in_library{"$entry"}) { print "found - $entry\n"; } else { print "new - $entry\n"; } }` [download] Regular expressions take time to compile, so you want to avoid putting dynamic regexps in a tight loop. TROGDOR	[reply] [d/l]
Re^2: Pattern matching with a variable by Ytrew (Pilgrim) on Nov 04, 2004 at 17:08 UTC
The implementation you've presented is n squared. That is, as input.txt or library.txt become longer, the program takes exponentially more time to complete Actually, it takes quadradically longer to complete. :-) The point TROGDOR is making is correct: you need to use a faster algorithm. Currently, if you double the input size, it will take more than twice as long to execute the program. The adjective I'm quibbling about is the one that describes how much longer. An exponential time algorithm (one that runs in time proportional to kn, for some constant k), always takes longer than a quadradic time algorithm (one that runs in time proportional to n2), as n "gets big". Here, the number n is the size of our data, and the algorithms are categorized based on how many operations they make relative to n. Actually, this is true of any polynomial time algorithm (a1(k1n) eventually gets bigger than a2(n**k2), for any given set of contants (a1,k2) and (a2,k2), as n increases). In English, this just says that "exponential algorithms are really, really, slow". This is fairly obvious when you consider an algorithm with an exponent of 2: if you time the program,,then add one element, and time it again, the second run takes twice as long. If you add another element, it takes four times as long, and so on. The hardest problems in computer science require exponential space or time to solve. Quadradics are bad, but much more managable. Quicksort is, in the worst case, quadradic, and perl used it for ages. -- Ytrew Q. Uiop	[reply]
Re: Pattern matching with a variable by Anonymous Monk on Nov 03, 2004 at 23:28 UTC
Thanks JediWizard and ikegami, I will give it a go tonight.	[reply]