carolw has asked for the wisdom of the Perl Monks concerning the following question:

how to search a string in another string allowing one character to be any thing (.)?

if "abcdfg" is searched in "djflskgjdslkgjabfoéabcdfg", then "abcdfg", "fbcdfg", "aMcdfg", "abZdfg" etc are accepted. Basically, . denoting for any character could replace any character in "abcdfg" from the first to the last of the string but only one character at a time

  • Comment on search of a string in another string with 1 wildcard

Replies are listed 'Best First'.
Re: search of a string in another string with 1 wildcard
by hippo (Archbishop) on Jul 09, 2014 at 14:05 UTC

    What have you tried? If not String::Approx then that might be worth a look. You could specify there to be a maximum of 1 substitution with no insertions or deletions which I guess would fulfil your requirements.

    The question is also sufficiently general that it says nothing about the use case. It could well be that there is a more appropriate method which relies on some domain knowledge. Without the context we cannot know about this.

      I get compilation problem

      Can't locate String/Approx.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .)

      should I instal any thing?

        Yes, you will need to install String::Approx before you can use it. If you are unfamiliar with installing third-party modules take a look at A Guide to Installing Modules as a starting point. As String::Approx is not a pure-perl module you will need a C compiler also. gcc is a popular choice.

Re: search of a string in another string with 1 wildcard
by ww (Archbishop) on Jul 09, 2014 at 14:39 UTC

    So, don't search "abcdfg" [ :-) what happened to 'e' ? ]

    But -- more seriously -- what do you mean by the statement that '"abcdfg", "fbcdfg", "aMcdfg", "abZdfg" etc are accepted' when only the first combination actually exists in your all-lower-case-string and the last two require upper case letters?

    From the info in your question, there are several ways to solve your problem -- some quite easy (see the following list) -- but it's easier to avoid off-track suggestions if the question is unambiguous AND we've seen your code (for a hint at what you're actually trying to accomplish).
    • character classes with quantifiers
    • look behinds
    • grouping
    And various combinations of the above.

    See On asking for help -- the "short version' at the top of that node will illuminate your next question.

    Updated: Multiple edits of typos and formatting. Only the second para is actually new. Mea culpa. Brain seized in the heat.


    check Ln42!

      This seems similar to Multiple Approximate Pattern Matching Problem. Here's my solution:
      #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; chomp(my $text = <>); my @patterns = split ' ', <>; my $threshold = 0 + <>; my @ctext = split //, $text; my @results; for my $pattern (@patterns) { my @cpat = split //, $pattern; POSITION: for my $pos (0 .. @ctext - @cpat) { my $mismatches = 0; for my $i (0 .. @cpat - 1) { if ($cpat[$i] ne $ctext[$pos + $i]) { next POSITION if ++$mismatches > $threshold; } } push @results, $pos; } }; say join ' ', sort { $a <=> $b } @results;
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: search of a string in another string with 1 wildcard
by Anonymous Monk on Jul 09, 2014 at 14:22 UTC
    Brute force approach:
    my $pattern = 'abcdef'; my @regexes; for (my ($i, $len) = (0, length $pattern); $i < $len; ++$i) { my $regex = $pattern; substr($regex, $i, 1) = '.'; push @regexes, $regex; } say join "\n", @regexes;
    Output:
    .bcdef a.cdef ab.def abc.ef abcd.f abcde.
    Something like that?
      (should be "push @regexes, qr/$regex/", of course)

      yes and then, the string should be search with

      index(myString,@regexes);

      ?

      Doesn't seem to work.

        You need to loop through regexes. Or, maybe something like:
        ... push @regexes, $regex; } my $r = join '|', @regexes; $r = qr/($r)/; # compile the regex say "Regex is: $r"; # debug my $string_to_search = "djflsbcdefgkgjdslkgjabfoéabcdefg"; if ($string_to_search =~ $r) { say "Found it ($1) at position ", $-[0]; } # there is a useful magic variable @- (LAST_MATCH_START) # check perldoc for it
        Output:
        Regex is: (?^u:(.bcdef|a.cdef|ab.def|abc.ef|abcd.f|abcde.)) Found it (sbcdef) at position 4

      To extend this problem to any number of wildcards (and not necessarily 1), would it be elegant and efficient to use the same code and change just

      substr($regex, $i, 1) = '.';

      to

      substr($regex, $i, m) = '.';

      where m will be the user's free parameter?

        carolw:

        Not quite. You're changing an $m character substring to a single char, so you could wind up with something like: .cdef, a.def, ab.ef, abc.f, abcd. where you're really wanting ..cdef, a..def, ab..ef, abc..f, abcd..; so you really want something a bit more like:

        substr($regex, $i, $m) = '.' x $m;

        But that's assuming you want your wildcards to be adjacent. If you want the wildcards to be anywhere, you've got a bit more work to do.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

      I would like to slightly change the question:

      How to modify the code, exactly '.' if the pattern to be matched is a fixed string and one character is not any character but can be a character in a set of characters at the same position:

      $pattern = 'abcdef';

      at the 3rd position, c could only be replaced by any character in the set {r,d,n,f,q,m}, for ex.

        One way:

        c:\@Work\Perl>perl -wMstrict -le "my $string = 'abcdef'; ;; my $pattern = qr{ [rdnfqm] }xms; ;; print qq{matched '$1' at offset $-[1]} if $string =~ m{ ($pattern) }xms; " matched 'd' at offset 3
        The construct  [rdnfqm] defines a "character class". Please see perlre, perlrequick, and perlretut.

Re: search of a string in another string with 1 wildcard
by InfiniteSilence (Curate) on Jul 09, 2014 at 13:49 UTC

    Sounds like you need to apply some form of pattern recognition. A crude method could (presuming that what you meant to say was that at minimum some number of your desired characters must exist in the target),

    • Start with any six letter characters
    • Identify if ANY of your target letters are in the six, if not, move to the next six characters
    • If so, determine if that match meets your minimum requirements for a successful match, store and report the find

    Fortunately (or unfortunately depending upon your perspective), pattern recognition in strings is quite an involved subject. You might consider shopping around for some algorithms to either implement or search CPAN for.

    Celebrate Intellectual Diversity

      Is it not possible to combine the query string (6 letters), with . with grep or =~ to do the search?