Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have a lot of large strings, which I want to find in a file. The strings contain all kind of charcaters. The problem is, that some of this string can contain a number (or part), which can be different, but should match.

for example

"nnxx.yy2 = 234 abc"

should match

"nnxx.yy2 = 333 abc" "nnxx.yy2 = 1 abc" "nnxx.yy2 = 2345 abc"

My idea was to convert such a string into a regular expression. So I would have to escape all special regular chars, like * or + or ^ and so on. The number I could than replace by a regular expression like "\d+".

so I would match my example with

m/nnxx\.yy2 = \d+ abc/

My question:

Which characters would I have to escape from a string, that I do not miss any ?

How I can change the part, that can be different, is clear for me. (I have the fear I forget some seldom characters and than my script has problems after some time)

Or do you have an Idea, how it can convert a string automatically into a match pattern even more simplier ?

MANY THANKS

Replies are listed 'Best First'.
Re: convert string into a match pattern
by hippo (Archbishop) on Dec 20, 2017 at 13:17 UTC
    Which characters would I have to escape from a string, that I do not miss any ?

    Use the \Q and \E quoting expressions to retain all those parts you wish to use as literal strings:

    use strict; use warnings; use Test::More; my @good = ( 'nnxx.yy2 = 333 abc', 'nnxx.yy2 = 1 abc', 'nnxx.yy2 = 2345 abc', ); my @bad = ( 'foo', 'nnxxayy2 = 1 abc', 'nnxx.yy2 = 1a abc', ); my $re = qr/^\Qnnxx.yy2 = \E\d+\Q abc\E$/; plan tests => @good + @bad; for my $str (@good) { like ($str, $re, "$str matched"); } for my $str (@bad) { unlike ($str, $re, "$str not matched"); }
Re: convert string into a match pattern
by dave_the_m (Monsignor) on Dec 20, 2017 at 14:10 UTC
    In addition to quotemeta, note also that if you want to match many strings against each line in a file, it's much more efficient to combine all the strings into a single pattern, so something like the following:
    my @strings = ( 'nnxx.yy2 = 234 abc', 'foo bar = 39 baz *', .... ); my $pattern = join '|', map { my $s = quotemeta; $s =~ s/(=\s*)\d+/$1\\d+/g; $s } @strings +; my $qr = qr/^($pattern)$/; while (<>) { print if /$qr/; }
    In the code above I only convert numbers into \d+ if preceded by '='. I've also added ^ and $ anchors to the pattern, which you may or may not want.

    Dave.

      Hello Dave

      Many Thanks for your comment to make if more efficient. One additional question:

      If I do it in your way and combine the search for all strings, is there a simlple possibility to get also the string index in case of a match ?

      e.g. 0 for a match with 'nnxx.yy2 = 234 abc', 1 for a match with 'foo bar = 39 baz *',....

      Thanks

        is there a simlple possibility to get also the string in index in case of a match
        Not that I can immediately think of. Less simple techniques depend on whether you expect most lines to match at least one of the strings, or for most lines to be rejected. If the latter, then you can use my suggested join'|' technique to quickly reject most lines, then use a slower technique only on the matching lines to find which string matched. For example you could generate a second pattern which includes captures, e.g. /(string1)|(string2)|..../ and apply it to matched lines, then see what is the first non-undef value out of $-[1], $-[2], .., $-[N]. This pattern is less efficient than /string1|string2|.../ as it isn't internally compiled into a trie and thus has to check every string in in turn, which is slow for many strings.

        Dave.

Re: convert string into a match pattern
by Anonymous Monk on Dec 20, 2017 at 12:48 UTC
Re: convert string into a match pattern
by Anonymous Monk on Dec 20, 2017 at 15:33 UTC

    Perfekt answers, which helped me a lot to solve my problem and to improve my perl skills !

    Thanks a lot !!