Re: Pattern Finding

Replies are listed 'Best First'.
Re: Re: Pattern Finding by demerphq (Chancellor) on Sep 17, 2001 at 03:19 UTC
Well I came up with a solution but for bizarre reasons decided to obfu it, so I posted it as Pattern Matching Obfu, you will have to change the string $S as appropriate to your requireements. Your clarification of the constraints on the problem lead to some interesting angles, some that I suspect are unintended. Most especially that irrelevent of the short pattern solution there are likely to be very many long patterns, each of which _ONLY_ match once. My algorthm, in a rather humourous fashion found the following solutions, amongst many others, that meet your critera in a very short amount of time (the \| is the seperator between sub patterns): bookhelloworldh\|ellohellohihell\|ohiworldhihello\|worldhihelloboo\|kpenbo +okpenworld bookhelloworldh\|ellohellohihell\|ohiworldhihello\|worldhihellobook\|penbo +okpenworld bookhelloworldh\|ellohellohihell\|ohiworldhihello\|worldhihellobookp\|enbo +okpenworld bookhelloworldh\|ellohellohihell\|ohiworldhihell\|oworldhihellobookp\|enbo +okpenworld bookhelloworldhe\|llohellohihell\|ohiworldhihello\|worldhihellobookp\|enbo +okpenworld bookhelloworldhel\|lohellohihell\|ohiworldhihello\|worldhihellobookp\|enbo +okpenworld bookhelloworldhel\|lohellohihello\|hiworldhihello\|worldhihellobookp\|enbo +okpenworld bookhelloworldhel\|lohellohihello\|hiworldhihello\|worldhihellobook\|penbo +okpenworld bookhelloworldhel\|lohellohihelloh\|iworldhihello\|worldhihellobook\|penbo +okpenworld bookhelloworldhell\|ohellohihelloh\|iworldhihello\|worldhihellobook\|penbo +okpenworld [download] Yves -- You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)	[reply] [d/l]
Re: Re: Pattern Finding by tachyon (Chancellor) on Sep 24, 2001 at 21:12 UTC
Have you considered what your 'rules' imply: 3. There could be multiple answers. 4. A Pattern may show up only once. 5. A Pattern may contain single character only Every single char is by definition a pattern. So to is every combination of substrings. There will be quite a few answers. The number will be given by: l + (l-1) + (l-2) + ..... ( l - (l - 1) ) + ( l - l ) where l = length of the string. This is `(l+(l**2))/2` for each and every string under your rules. BTW my substring/dictionary and best match hack returns this: `Best Matches: 6 occurrences of hello 4 occurrences of hi 4 occurrences of world 3 occurrences of book 3 occurrences of oh 2 occurrences of low 2 occurrences of pen C:\>` [download] Your rules are not specific enough to formulate an answer. How do you define what is part of a pattern. Is this "Igohellohellohellohi" a string that contains 3 'hello' or 4 'hi' or 6 'l'..... Perhaps concluding this was the real task? cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Pattern Finding by runrig (Abbot) on Sep 13, 2001 at 03:20 UTC
Here is something that I believe almost satisfies your requirements (just needs a bit more work which I'm not ready to do at the moment, and its not thoroughly tested). It doesn't do very well without a min and max length for each pattern, so maybe if this was wrapped in a sub which adjusted the min and max to various sizes, and evaluated the results on each pass by some heuristic, it could do fairly well with all of the requirements (and that'll have to wait 'till later): `use warnings; use strict; # Min and max length for each pattern my $min = 2; my $max = 8; # Number of patterns my $num = shift; # Generate pattern to capture words my $words = join ('', map { "(.{$min,$max})" . "(?:" . join( '\|', map("\\$_", 1..$_)) . ")" } 1..$num); $_="bookhelloworldhellohellohihellohiworldhihelloworldhihellobookpenbo +okpenworld"; if (my @pats = /^$words$/) { for my $pat (@pats) { print "[$pat]\n"; } } stan:~/tmp >./tst 5 [book] [hello] [world] [hi] [pen]` [download] Update:* Greatly simplified. Wondering if I'm doing someone's homework. Noticed that its very similar to nardo's approach, but cleaner, I think, and slightly different behavior due to the newest problem definition. Great minds think alike :)	[reply] [d/l]
Re: Re: Re: Pattern Finding by Anonymous Monk on Sep 13, 2001 at 19:15 UTC
Hi, This is one of the classic problem in AI. The problem I posted, is actaully an exercise on segmentation section of the OpenLab on http://www.a-i.com.(You will need to register) I have extended it to some other critera such as 'spaces allowed' to meet more general problems. I tried runrig's solution and it doesn't work when number of patterns is 6, for the condition that one pattern cannot be part of another pattern. I am trying to solve this problem myself also, what I am looking for is good design to begin with. Artist. (My computer doesn't keep the login for more than one page, Please let me know if you know the soltuion).	[reply]
Re: Re: Re: Re: Pattern Finding by runrig (Abbot) on Sep 13, 2001 at 21:12 UTC
for the condition that one pattern cannot be part of another pattern This is the toughest condition, and so I don't think it can be done with a regex, at least not with perl's regex engine (hope someone can prove me wrong :). At every stage of capuring a pattern, you'd have to be able to fail if the longer of the current pattern and each of all past patterns doesn't contain the other. Here's some psuedo perl regex code which, if it worked would accomplish this (hope you get the idea), but I'm using things in the wrong way, the regex engine isn't re-entrant, it uses "$1" instead of "\1" (and in a symbolic reference sort of way), etc, but I though it was interesting nonetheless. It would go right after each pattern caputure in my solution: `join('', map { "(?{(length($$i)>length($$_))$$i !~ /$$_/ \| $$_ !~ /$$i/})" } 1..$_)` [download]	[reply] [d/l]


Don't ask to ask, just ask
	PerlMonks