Re: Seeking the longest string in an array
by ikegami (Patriarch) on Dec 11, 2004 at 22:38 UTC
|
Update: Fixed. As Jenda observed, my original solution solved a problem other than the one presented.
O(N*log(N) + N*(M + M3) + M)
= O(N*log(N) + N*M3)
N = number of words.
M = length of words.
Don't let the M3 scare you. Benchmark them to find out which is best for you.
use strict;
use warnings;
my @data = qw/a mnk ab m b bc abcd cd bcd bd m nk /;
@data = sort { length($a) <=> length($b) } @data;
my %result;
foreach my $word (@data) {
my $uc_word = uc($word);
my $partial_word;
foreach my $start_letter (1..length($uc_word)) {
foreach my $num_letters ($start_letter..length($uc_word)-$start_
+letter+1) {
delete($result{substr($uc_word, $start_letter-1, $num_letters
+)});
}
}
$result{$uc_word} = $word;
}
print join(",", values(%result));
| [reply] [d/l] |
|
|
I don't think your code is correct. You are only removing the items that are equal to some prefix of the currently processed $word. You should delete all substrings, not only the prefixes. That would make your solution O(N*log N + N*M*M). We don't know the expected N and M, but this makes the M much more important.
Jenda
We'd like to help you learn to help yourself
Look around you, all you see are sympathetic eyes
Stroll around the grounds until you feel at home
-- P. Simon in Mrs. Robinson |
| [reply] |
|
|
Shoot, you're right. I misunderstood what his program did.
| [reply] |
Re: Seeking the longest string in an array
by Jenda (Abbot) on Dec 11, 2004 at 23:28 UTC
|
Can you give us some more details? How long are your strings on average? How many do you expect to have in the list? What characters do they contain?
This looks like you might be doing something biology related so I would not be surprised if the strings you are working for consisted of only ACGT characters or something similar.
Also there are two problems with your code. First, you should not be using the $a and $b variables. They are kinda special (used inside the block optionaly passed to sort()) which is why "use strict" did not complain about them not being declared.
Second, you are not excaping special characters in $a when making the regexp, what if the "words" contain some dots, question marks, asterisks, parenthesis, ... You should use $b =~ /\Q$a\E/i if you want to use the regexp to find the substring. There is actually no reason to do that, you'd better use the index() function. No need to escape anything and it should be quicker. Though if you really want your comparison to be case insensitive you'd have to uppercase (or lowercase) both strings before you try to find one in the other.
Last thing, finding the strings length is quick. Much quicker than trying to find a substring. You should revert your condition and only try to find short strings in longer ones. A ten chars long string cannot contain a 20 char long one ;-)
Jenda
We'd like to help you learn to help yourself
Look around you, all you see are sympathetic eyes
Stroll around the grounds until you feel at home
-- P. Simon in Mrs. Robinson |
| [reply] [d/l] |
|
|
Thanks, acutally my strings are relatively short, let's say less than 30, but the number of strings can be large (not that large, for the moment, say 10K), so only N is important, not M. It can be safely assumed that the strings don't contain those special characters.
| [reply] |
Re: Seeking the longest string in an array
by ikegami (Patriarch) on Dec 11, 2004 at 23:13 UTC
|
That's not O(N2). /$a/ involves some serious looping. Probably O(M2). So O(N2*M2) total
N = number of words.
M = length of word.
| [reply] [d/l] |
Re: Seeking the longest string in an array
by nedals (Deacon) on Dec 11, 2004 at 23:32 UTC
|
use strict;
my @data = qw/a mnk ab m b bc abcd cd bcd bd m nk/;
@data = sort {length($b) <=> length($a)} @data;
my @result = ();
while (@data) {
push @result, $data[0];
@data = grep { $data[0] !~ /\Q$_\E/ } @data;
}
print "@result\n";
Update
Changed to reflect ikegami's commets.
Please explain the /\Q...\E/
I'm not so sure about the /../i. The OP may want it to be case sensitive.
Thanks.
Update2:
mpeters,
Having read your post and jenda's above, I've incorporated 'index'. Your solution, as it stands, will not work because
index returns a position on match and the grep expression needs to return false. So I did it this way.
...
while (@data) {
push @result, $data[0];
@data = grep { index($data[0], $_) == -1 } @data;
}
...
| [reply] [d/l] [select] |
|
|
Change cmp to <=>.
Change /.../ to /\Q...\E/.
Change /.../ to /.../i.
O(N*log(N) + N2*M2). That's a worse worst case than the OP. In practice, though, yours would be faster, because the inner loop will be get much smaller than N very fast.
| [reply] [d/l] [select] |
|
|
change $data[0] =~ /\Q$_\E/ to index($data[0], $_).
Should be faster.
| [reply] [d/l] [select] |
|
|
|
|
|
|
use strict;
use Bit::Vector;
my @data = qw/a mnk ab m b bc abcd cd bcd bd m nk/;
@data = sort {length($b) <=> length($a)} @data;
@data = map [$_, uc($_), Bit::Vector->new_Enum(256, join(',', map ord(
+$_), (split //, uc($_))))->Block_Read()], @data;
my @result = ();
while (@data) {
push @result, $data[0][0];
@data = grep { ($data[0][2] & $_->[2]) ne $_->[2] or index($data[0
+][1], $_->[1]) == -1 } @data;
}
print "@result\n";
Whether and how much would this help depends on the strings. In this case I saved just 5 calls to index() from the original 18.
Jenda
We'd like to help you learn to help yourself
Look around you, all you see are sympathetic eyes
Stroll around the grounds until you feel at home
-- P. Simon in Mrs. Robinson |
| [reply] [d/l] |
Re: Seeking the longest string in an array
by dave_the_m (Monsignor) on Dec 12, 2004 at 01:07 UTC
|
my @data = qw/a mnk ab m b bc abcd cd bcd bd m nk /;
my $prev = '';
@data = grep { substr($prev, 0, length) ne $_ and ($prev = $_,1) }
reverse sort @data;
update:
Whoops, I hought it was initial substrings. Thanks ikegami for pointing out my foolishosity!
Dave.
| [reply] [d/l] |
|
|
Doesn't work. Gives
nk, mnk, cd, bd, bcd, abcd
instead of
mnk, abcd, bd
| [reply] [d/l] [select] |
Re: Seeking the longest string in an array
by Arunbear (Prior) on Dec 11, 2004 at 23:51 UTC
|
First, sort the array. Then in your outer loop, use a binary search instead of a linear scan to test if the word is a substring.
| [reply] |
Re: Seeking the longest string in an array
by TedPride (Priest) on Dec 12, 2004 at 21:40 UTC
|
The problem with these solutions is they all require nested comparisons of text. It's much better to require only one comparison per item, and also to only require searching of text that has already been validated as being unique. You do this as follows:
use strict;
use warnings;
my @data = qw/a mnk ab m b bc abcd cd bcd bd m nk/;
(my $result, @data) = sort {length($b) <=> length($a)} @data;
for (@data) { $result .= ' '.$_ if index($result, $_) == -1; }
print $result;
I think you'll find that this method easily beats any other in both processing speed and coding efficiency. | [reply] [d/l] |