Re: Regexp string concat
by delirium (Chaplain) on Jun 14, 2004 at 21:32 UTC
|
I think we can solve this without doing any fancy sorting or iterating. How about a simple regular expression approach using upper and lower case characters as our markers?
#!/usr/bin/perl
use strict;
use warnings;
$\ = $/;
my $string = 'eichenbaumschule';
my @query = qw(baum ums eic chu le);
$string =~ s/$_/uc $_/ieg for @query;
@query = $string =~ /([A-Z]+)/g;
print for @query;
__OUTPUT__
EIC
BAUMSCHULE
| [reply] [d/l] |
|
|
sub delirium{
my( $string, $qRef ) = @_;
$string =~ tr[A-Z][a-z];
$string =~ s/$_/uc $_/ieg for @$qRef;
return reduce{
length $a > length $b ? $a : $b
} split '[a-z]+',;
}
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
|
|
An optimisation I've thought of for this, even though the thread is long dead, is to lc the original string, and have all the strings to match as upper case, then all you need do is:
$string =~ s/$_/$_/i for @$qRef;
I'm sure the regex is considerably faster for having taken the executable bit out of it.
| [reply] [d/l] |
|
|
For a slight speed boost, use compiled regular expressions:
my $string = 'eichenbaumschule';
my @query = qw(baum ums eic chu le);
@query = map { qr/($_)/i } @query;
$string =~ s/$_/\U$1/g for @query;
(I get ~6000 iterations/sec this way vs. ~3000 your way).
Update: Just realized, this is only a factor if you do the same substitutions multiple times in the same program...if you're only doing this once per run, then you may as well do it delirium's way :-) | [reply] [d/l] |
Re: Regexp string concat
by CombatSquirrel (Hermit) on Jun 14, 2004 at 17:37 UTC
|
I hacked this piece of code together. It may work for you or it may not. I think it is pretty straightforward, except for the two-level Schwartzian transform-like anonymous array nesting (Data::Dumper should help, though)...
#!perl
use strict;
use warnings;
my $string = "EICHENBAUMSCHULE";
my @query = qw/EIC BAUM UMS CHU LE/;
@query = sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
grep { $_->[1] > -1 }
map { [$_, index($string, $_), index($string, $_) + length($
+_) - 1] }
@query;
my @matches;
@query
or die "No matches";
my $i = -1;
my $p = -2;
for (@query) {
if ($p + 1 < $_->[1]) {
++$i;
}
push @{$matches[$i]}, $_;
$p = $_->[2];
}
@matches = sort { $b->[0] <=> $a->[0] }
map { [$_->[-1]->[2] - $_->[0]->[1] + 1, $_] }
@matches;
print "Longest continuous match has length $matches[0]->[0]\n\n";
print "$string\n";
for (@{$matches[0]->[1]}) {
print ' ' x $_->[1] . $_->[0] . "\n";
}
Hope this helped.
CombatSquirrel.
Entropy is the tendency of everything going to hell.
| [reply] [d/l] |
Re: Regexp string concat
by dragonchild (Archbishop) on Jun 14, 2004 at 17:16 UTC
|
First - doesn't BioPerl have a few functions that will do this?
As for code ... you still don't have all the requirements laid out yet. Specifically:
- Do the matches have to be in the order specified? For example, what should @new_query contain if $string = 'EICHENLEBAUMSCHU'?
- What happens if something in @query isn't there? For example, $string = 'XXX'.
- What happens if there is more than one match? For example, $string = 'EICBAUMEIC'.
- What happens if something isn't there and there's two matches on something else?
- What happens if one element in @query fits into another element? For example, @query = qw( BAUM AU ).
- What happens if there are two possible solutions? For example, $string = 'BAABAB' and @query = qw( AA BA ).
I'm sure there are other possibilities that I didn't think of in the first five minutes of looking at the problem. Remember - if you can't explain the problem to a teddy bear, you can't explain the problem to a computer.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
|
|
- First - doesn't BioPerl have a few functions that will do this?
- I don't think so.
As for code ... you still don't have all the requirements laid out yet. Specifically:
- * Do the matches have to be in the order specified? For example, what should @new_query contain if $string = 'EICHENLEBAUMSCHU'?
- @new_query = qw( EIC, LEBAUMSCHU);
- * What happens if something in @query isn't there? For example, $string = 'XXX'.
- @new_query = @query
- * What happens if there is more than one match? For example, $string = 'EICBAUMEIC'.
- will not be the case
- * What happens if something isn't there and there's two matches on something else?
- remove double
-
- * What happens if one element in @query fits into another element? For example, @query = qw( BAUM AU ).
- @new_query = qw(BAUM)
- * What happens if there are two possible solutions? For example, $string = 'BAABAB' and @query = qw( AA BA ).
- will not be the case
I'm sure there are other possibilities that I didn't think of in the first five minutes of looking at the problem. Remember - if you can't explain the problem to a teddy bear, you can't explain the problem to a computer.
>thanks for good comments
Murcia
| [reply] |
|
|
Heh. Your requirements seemingly contract each other. Specifically, you state that @new_query = @query if a value in @query isn't matched. However, you say that if something isn't matched and something else is matched twice, remove the double. But, you say that there can never be a double match.
In other words, it doesn't sound like you can explain the requirements to a teddybear. I would spend some time with pencil and paper before going much further. Then, when you have pencil and paper down, convert those into test cases before writing any code.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
Re: Regexp string concat
by BrowserUk (Patriarch) on Jun 14, 2004 at 18:13 UTC
|
Sorry for the false start, hopefully this does the job.
#! perl -slw
use strict;
use List::Util qw[ reduce ];
sub longestComposite{
my( $string, $qRef ) = @_;
my $copy = ' ' x length $string;
$string =~ s[($_)]{
my $len = length $1;
substr( $copy, pos( $string ), $len ) = $1;
$1;
}ge for @$qRef;
return reduce{
length $a > length $b ? $a : $b
} split ' ', $copy;
}
my $string ="EICHENBAUMSCHULE";
my @query = qw( EIC BAUM UMS CHU LE );
print longestComposite( $string, \@query );
__END__
P:\test>366569
BAUMSCHULE
No promises about efficiency if you're doing this on large volumes of data.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
Re: Regexp string concat
by BrowserUk (Patriarch) on Jun 14, 2004 at 17:00 UTC
|
This doesn't handle any error conditions, like if the query contains non-matches, but it may give you a starting point.
#! perl -slw
use strict;
sub display{
my $string = shift;
return $string . $/,
map{ ' ' x index( $string, $_ ) . "$_\n" } @_;
}
my $string ="EICHENBAUMSCHULE";
my @query = qw( EIC BAUM UMS CHU LE );
print display( $string, @query );
__END__
P:\test>366569
EICHENBAUMSCHULE
EIC
BAUM
UMS
CHU
LE
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
|
|
but this does not give me the longest concatenated string
"BAUMSCHULE"
| [reply] |
Re: Regexp string concat
by davidj (Priest) on Jun 14, 2004 at 16:53 UTC
|
questions regarding the resulting query_string:
do the overlapping and/or touching matches have to be in the same order in which they are indicated in the @query array. That is, given:
my $string ="EICHENBAUMSCHULE";
my @query = qw(EIC BAUM UMS CHU LE);
would the following also match
<BR>
BAUMSLECHU # the CHU and LE are not in the same order as indicated in
+ @query
davidj | [reply] [d/l] [select] |
Re: Regexp string concat
by Happy-the-monk (Canon) on Jun 14, 2004 at 15:15 UTC
|
(please fix that missing closing </pre>-tag in your node. Thank you.)
Sounds like homework to me. I don't think having your homework done by the community is the right way to get through school - or life for that matter.
Tell us your thoughts on that matter and tell us your difficulties or problems, how you tried to solve them, where your insecurities are: we will help you along then.
Cheers, Sören | [reply] |
|
|
Sören,
thanks about your comment ...
I found a more or less working solution, but I do not like the code I wrote ...
and do not want to annoy with it
so I start just giving the task
The real task is coming from a Genome project ...
Displaying peptides in a protein sequence ...
| [reply] |
Re: Regexp string concat
by Jasper (Chaplain) on Jun 15, 2004 at 14:41 UTC
|
Bit-wise oring is your friend. Although the answer comes out lower-cased :( - unless I change those pesky spaces to '@'s - which I just did.
my $string ="EICHENBAUMSCHULE";
my @query = qw(EIC BAUM UMS CHU LE);
my $matchline = '@' x length $string;
for $match(@query) {
my $spacer = index $string, $match;
$matchline |= (('@' x $spacer) . $match) if $spacer > -1;
}
print for sort { length($b) <=> length($a) } $matchline =~ /[^@]+/g;
| [reply] [d/l] |