Re: group things?
by Joost (Canon) on Apr 25, 2008 at 22:26 UTC
|
Please correct me if my assumptions are wrong: Basically, what you want is to find the start and end position of subsequences of same characters that aren't "-".
A straightforward way to do that is to inspect each character in sequence and compare it to the previous one. But that would be slow and annoying. It's easier to use a regex:
#!/usr/bin/perl -w
use strict;
$_ = "-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----";
while (/(\w)(\1*)/g) {
print "$1:".(1+pos()-length "$1$2")."-".pos."\n";
}
| [reply] [d/l] |
|
|
Joost, your code works perfect for me... Unfortunately, I don't seem to understand much of what you do... but thank you for your time...
| [reply] |
|
|
As someone else already hinted, you would probably benefit from reading perlretut and pos.
It may seem like overkill for this problem but learning regular expressions will make string processing problems like this a lot easier. And almost all serious programming languages today have a regular expression library very much like perl's, so chances are high you'd be able to use that knowledge if you're using some other language too.
| [reply] |
|
|
|
|
|
|
OK, Joost's code works, now it's time for you to go to work.
The code uses only a regex and two built-in functions, pos and length, and string concatenation (the . operator). Time to apply Your Mother's suggestions about documentation access and do some reading. Then, maybe try some variations and see what happens. After that, specific questions will usually elicit pertinent answers.
| [reply] [d/l] [select] |
Re: group things?
by pc88mxer (Vicar) on Apr 25, 2008 at 22:42 UTC
|
This is related to run length encoding. A regular expression to perform RLE is (found at http://hpaste.org/1987):
sub encode {
(my $string = shift) =~s/((.)\2*)/$2 . (length $1) /eg;
$string;
}
Maybe you can modify this function to capture the beginning and ending positions of each run.
Update: Spoiler follows...
| [reply] [d/l] [select] |
Re: group things?
by jwkrahn (Abbot) on Apr 25, 2008 at 22:48 UTC
|
$ perl -le'
my $x = q[-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----];
while ( $x =~ /([^-])/g ) {
my $y = $1;
pos( $x )--;
$x =~ /\G$y+/g and print $-[0] + 1, $-[0] + 1 == $+[0] ? "" : "-$+
+[0]", ":$y";
}
'
6-10:M
17-21:I
26-28:M
32-35:O
39:I
43-46:M
| [reply] [d/l] |
Re: group things?
by Anonymous Monk on Apr 25, 2008 at 22:25 UTC
|
I know how to split the string and read it letter by letter, doing something like:
$str="-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----";
@my_splitted_str=split("",$str);
My problem is how will Perl "understand" that, for instance letters 6-10 are M and print 6-10:M and not 6:M, 7:M, 8:M, 9:M, 10:M? | [reply] [d/l] [select] |
|
|
my $str = '-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----';
my ( $start, $end );
for ( split /(-+)/, $str ) {
my $char = substr $_, 0, 1;
next unless length; # WTF does split return an empty string at the
+ start
if ( $char ne '-' ) {
print $start+1,
length > 1
? ( "-", $start + length )
: "",
": $char\n"
}
$start += length;
}
Output:
6-10: M
17-21: I
26-28: M
32-35: O
39: I
43-46: M
| [reply] [d/l] [select] |
|
|
# WTF does split return an empty string at the start
Because you can interpret "-" as being two empty strings separated by "-". IMHO the real WTF is why does it remove the empty trailing string.
| [reply] |
|
|
use strict;
use warnings;
my $str="-----MMMMM------IIIII----MMM---OOOO---I---MMMM-----";
my @my_splitted_str=split(//,$str);
# add a sentinel value after the other characters, so
# the end of the last range is properly detected
push @my_splitted_str, "-";
my $range = ""; # we are not initially in a range
my $start_of_range;
for my $index ( 0 .. $#my_splitted_str ) {
# if we are in a range, see if it ends now
if ( $range && $range ne $my_splitted_str[$index] ) {
# the previous character was the last of this range
my $end_of_range = $index - 1;
if ($end_of_range == $start_of_range) {
print $start_of_range + 1, ":", $range, "\n";
}
else {
print $start_of_range + 1, "-", $end_of_range + 1, ":", $r
+ange, "\n";
}
$range = "";
}
# see if this is the start of a range
if (! $range && $my_splitted_str[$index] ne "-") {
$start_of_range = $index;
$range = $my_splitted_str[$index];
}
}
| [reply] [d/l] |
Re: group things?
by Your Mother (Archbishop) on Apr 25, 2008 at 22:04 UTC
|
Ah, so much homework, so little time.
| [reply] |
|
|
I am trying to learn so If you could give me some advice I would greatly appreciate it. I don't understand why you have to make fun of me...
| [reply] |
|
|
Oh, I was just horsing around. Have you tried writing any code yet? Try looking through some of the POD for Perl functions like substr, index, split, pos, and the page for regexes perlre. Like so-
# update, adding one with an -f flag for clarity
perldoc -f pos
perldoc perlre
See if you can at least parse the string out a little in any way. If you're stuck. Show the code you tried. A little effort is usually rewarded generously here. A question without it tends to get less attention. And as learning experiences go, someone showing you something is drastically less effective than trying to do it, even when having trouble. You'll be much more receptive to answers after banging your head on a problem a little. We all are.
| [reply] [d/l] [select] |