in reply to Re^3: Comparing pattern
in thread Comparing pattern

Ok, to be more clear.

I have this code:
#!/usr/bin/perl -w use strict; my $patterns = "/path/to/patterns.txt"; my $arg1 = shift; open (PAT, '<', $patterns) or die "$patterns: $!\n"; my @patterns = <PAT>;. close(PAT); chomp @patterns; my $regex_string = join '|', @patterns; open( FILE, "<", "$arg1") or die "$arg1: $!\n"; $_ = do { local $/; <FILE> }; close(FILE); if ( /($regex_string)/is ) {print "\n$arg1\n$1\n";}
Test list with patterns:
/path/to/patterns.txt
part1.*part2 Foo bar Other pattern
Test file to scan:
hghghgghghh part1 fff part2 jhhjhjkjk Foo bar kkjkjkj Other pattern
$1 will show all wildcarded text between part1 and part2 and not only the pattern part1.*part2 as it should.
/path/to/file part1 fff part2

Also, only first pattern found is displayed now. That's not a problem, but I'd also like to know how to display all patterns if a file contains more than one.
Please bear an unexperienced user like me. Thank you!
Regarding the other problem with xml file to scan, I must do more tests to know exactly where the problem is.

Replies are listed 'Best First'.
Re^5: Comparing pattern
by graff (Chancellor) on Sep 22, 2009 at 00:32 UTC
    only first pattern found is displayed now. That's not a problem, but I'd also like to know how to display all patterns if a file contains more than one.

    That's easy -- instead of using an "if" statement like this:

    if (/($regex_string)/is) {
    just use a while loop like this -- making sure to add the "g" modifier (and while I'm at it, I'll add some clarification to the output):
    while (/$regex_string)/isg) { print "\nmatched in $arg1:\n==$1==\n"; }
    As for your other issue:

    $1 will show all wildcarded text between part1 and part2 and not only the pattern part1.*part2 as it should.

    What makes you think it "should" display the string "part1.*part2"? When using the capture variables ($1, $2, ...), the normal situation is to want the actual (complete, literal) string that matched the regex, rather than the regex string with its wildcards.

    If you want the wildcard-enabled regexes in your list to return a specific constant string, you'll probably want to include that string in your regex list file, store those replacement strings with their regexes in a separate hash, and add some logic in the while loop shown above that will replace any given matching string with the appropriate constant replacement string. Here's an adapted version the three files involved: