Two-dimensional match/regex?

osfameron has asked for the wisdom of the Perl Monks concerning the following question:

I've recently got very interested in Ascii art and animation.

I've stumbled across an interesting problem: how to search for one Ascii picture within another. e.g. does the picture:

  MM
  oo
 <
  -/     contain     /^\     ?
 /^\                  C
: C :
8===8
 |^|
 |||
-~-~
[download]

In pseudocode the best algorithm I've found is:

Data

Target list
Match list (the picture to search for)
Results list

Main sub

Escape the match list for regex special characters, e.g. .+*?()[]{}
For each line of target,
for each match found against first line of match, run submatch specifying the position of the match.
If submatch returned true, add the line and position to the results list.
Return all results.

submatch sub

Taking next lines of target & match, return false if there is no match at specified position.
Return a true value (like the position of the match) if there are no more lines left of the match.
Return the value of the *next* submatch (e.g. recurse)

This is of course just a search for a 2 dimensional literal string, but it would be as easy to search for a list of regexes. Is there any practical use for that?

Is there already a module for doing this type of match? Or can anyone suggest a better algorithm? (I think that recursing makes sense for this problem, but I'm open to suggestions!)

Cheerio!
Osfameron
http://osfameron.perlmonk.org/chickenman/

Comment on Two-dimensional match/regex? Select or Download Code

Replies are listed 'Best First'.
Re: Two-dimensional match/regex? by rob_au (Abbot) on May 22, 2002 at 10:31 UTC
I'm sure I must be missing the problem at hand to a certain extent, but have you looked at the `m` option for regular expression matches which allows for multi-line matches. For example, the regular expression match for the sample match given may look something like the following - `\/\^\\\n.+C` - to match: the character sequence `/^\`, a new-line character (`\n`), one or more characters (`.+`), and, the character, `C`. This regular expression can be used in conjunction with the multi-line match option for regular expressions as follows, similar to how one would use the case-insensitive option for regular expressions: `if ($target =~ /\/\^\\\n.+C/m) { }` [download] For further information see perlman:perlre Update - This regular expression has been tested and does work. The effectiveness of a multi-line regular expression match can be tailored within the regular expression itself - That is, the more specific which you make your regular expression, the greater the specificity of the match, but as with all targeted solutions, there is a trade-off between sensitivity and specificity.	[reply] [d/l] [select]
Re: Re: Two-dimensional match/regex? by osfameron (Hermit) on May 22, 2002 at 10:44 UTC
I think the above won't work as such, because it might find false matches (e.g. where the second line isn't aligned with the first.) But, I could find the first line and then generate a multi-line regex with the rest of the search string, using `/m` as you suggest: something like `/^.{$pos}$string1.\n^.{$pos}$string2./m` to make it search only at the right position... (Apologies for dodgy regex, not tested and probably wrong). thanks for the tip, I should have thought of that already! Cheerio! Osfameron http://osfameron.perlmonk.org/chickenman/	[reply] [d/l] [select]
Re: Re: Re: Two-dimensional match/regex? by rsteinke (Scribe) on May 23, 2002 at 05:31 UTC
You could always surround the terms which lie between the stuff you want to match with (), and check that $1, $2, etc. are the same length as the first, second, etc. line matched.	[reply]
Re: Two-dimensional match/regex? by Aristotle (Chancellor) on May 22, 2002 at 14:26 UTC
You're best of with an array of lines here, I think. Try to match the first regex against each line in succession, if it does use pos with substr and regex with a `^` anchor for the following line(s). Makeshifts last the longest.	[reply] [d/l]
Re: Two-dimensional match/regex? by educated_foo (Vicar) on May 22, 2002 at 15:59 UTC
You might be able to use the `(??{ ... })` construct to construct a regex with each line anchored at the right place. I haven't worked through the details, but it seems possible. Multi-dimensional pattern matching is actually an area of active research, and there are some interesting (and complex) algorithms out there to do it. Depending on time and level of interest, you might want to take a look on citeseer for "two dimensional pattern matching". /s Update: So, I actually tried this, and found you probably don't want to go this way: `my $pat = qr{^ (?{$pre0 = pos}) (.) (?{$x = pos() - $pre0; $pre = qr/.{$x}/; print STDERR "$pre\n +"}) aa (?{$post0 = pos}) (.) (?{$x =pos() - $post0; $post = qr/.{$x}/; print STDERR "$post +\n"}) $ ^(??{ $pre }) bb (??{ $post }) $ ^(??{ $pre }) cc (??{ $post }) $ }xm;` [download] is intended to match aa bb cc but in fact yields: perl% perl -w 2dpm.pl (?-xism:.{8}) zsh: 4283 segmentation fault (core dumped) perl% perl -v This is perl, v5.6.1 built for ppc-linux-thread-multi Not sure if this is my fault or not.	[reply] [d/l] [select]
Re: Two-dimensional match/regex? by robin (Chaplain) on May 23, 2002 at 16:24 UTC
As far as I know, the best algorithm for doing exact two-dimensional matching is still the one that was published in 1993 by Ricardo Baeza-Yates and Mireille R�gnier. You can download their paper about it. I don't know much about two-dimensional regular expressions, but this looks like a good place to start researching the subject if you're interested.	[reply]
Re: Two-dimensional match/regex? by Anonymous Monk on May 23, 2002 at 13:20 UTC
Here's a subroutine that solves the problem if you want an exact match - no regex supported. The solution can be extended to regexes, but you have to be careful, iterating with m//g over a line to "find all matches" will fail, as m//g will never find overlapping matches. `sub multi_match { my ($large, $small) = @_; my @large = split /\n/ => $large; my @small = split /\n/ => $small; my $first = $small [0]; foreach my $i (0 .. @large - @small) { my $pos = 0; LOOP: while ((my $new_pos = index ($large [$i], $first, $pos)) != -1 +) { $pos = $new_pos + 1; foreach my $j (1 .. $#small) { next LOOP unless length $large [$j] >= $new_pos + length $small [$ +j] && $small [$j] eq substr $large [$i + $j], $new_pos, length $small [$j]; } return ($i, $new_pos); } } return; }` [download]	[reply] [d/l]
Re: Two-dimensional match/regex? by abstracts (Hermit) on May 23, 2002 at 19:03 UTC
Here is my stab at the problem. Basically we pad the image with spaces to make a rectangular image. We do the same with the subimage but keeping the width of the subimage the same as the width of the image (padding with spaces). Create a regex by matching anything in space or the same character otherwise. Print the row and column of where the match occured #!/usr/bin/perl -w use strict; my $image = <<'EOF'; MM oo < -/ /^\ : C : 8===8 \|^\| \|\|\| -~-~ EOF my $subimage = <<'EOF'; /^\ C EOF matches($image, $subimage); sub matches{ my ($img, $simg) = @_; my $len = 0; my $normalize = sub { my @arr = split /\n/, shift; for(map length, @arr){ $len = $_ if $len < $_; } join "", map {$_ . ' ' x ($len - length)} @arr; }; $img = $normalize->($img); $simg = quotemeta $normalize->($simg); $simg =~ s/\\ /./g; if($img =~ /(.*?)$simg/){ my $l = length $1; my $row = int($l / $len); my $col = $l % $len; print "Matches at $row x $col\n"; } else { print "No match\n"; } } # Matches at 4 x 1 [download] The row and column start with zero. Hope this helps,,,	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


Problems? Is your data what you think it is?
	PerlMonks