Find the boundaries of a substring in a string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Find the boundaries of a substring in a string by Corion (Patriarch) on Jun 27, 2023 at 08:43 UTC
If you want the indices of a match, take a look at `@+` and `@-` in perlvar.	[reply] [d/l] [select]
Re^2: Find the boundaries of a substring in a string by Anonymous Monk on Jun 27, 2023 at 08:56 UTC
Can you help me understand how to use them in my example please?	[reply]
Re^3: Find the boundaries of a substring in a string by AnomalousMonk (Archbishop) on Jun 27, 2023 at 09:01 UTC
`Win8 Strawberry 5.8.9.5 (32) Tue 06/27/2023 4:54:38 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $seq = 'ddBddddBBBBBBBBBBBBDDDDDDDDBBBBBBBBBBBBBBddddddddddBBBBBBBB +BBBBBBDDBBBBBBBBddddddddd'; my $rx_sub_str = qr{ B+ }xms; my @endpoints; while ($seq =~ / $rx_sub_str /xmsg) { push @endpoints, [ $-[0], $+[0] ]; } dd \@endpoints; ^Z [[2, 3], [7, 19], [27, 41], [51, 65], [67, 75]]` [download] Use of index might be faster than a regex approach for large data. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: Find the boundaries of a substring in a string by hippo (Archbishop) on Jun 27, 2023 at 09:19 UTC
TIMTOWTDI. Using your code as a starting point I would use pos to find the end point of the sequence. Here is a runnable example. `use strict; use warnings; my $seq = 'dddddddddBBBBBBBBBBBBDDDDDDDDBBBBBBBBBBBBBBddddddddddddddddddddBBBB +BBBBBBBBBBDDBBBBBBBBddddddddddddd'; while ($seq =~ /(B+)/g) { my $seg = $1; my $seg_length = length ($seg); my $seg_end = pos ($seq); my $seg_start = $seg_end - $seg_length; print $seg. "\|" . $seg_start . "-" . $seg_end . "\n"; }` [download] 🦛	[reply] [d/l]
Re^2: Find the boundaries of a substring in a string by Anonymous Monk on Jun 27, 2023 at 09:24 UTC
Thank you both! I noticed in both snippets that the end boundaries are off by 1, or am I counting wrong?	[reply]
Re^3: Find the boundaries of a substring in a string by hippo (Archbishop) on Jun 27, 2023 at 09:27 UTC
Are you counting from 1 or zero? If from 1 then it should probably be this (to give the first match at 10 to 21 inclusive): `use strict; use warnings; my $seq = 'dddddddddBBBBBBBBBBBBDDDDDDDDBBBBBBBBBBBBBBddddddddddddddddddddBBBB +BBBBBBBBBBDDBBBBBBBBddddddddddddd'; while ($seq =~ /(B+)/g) { my $seg = $1; my $seg_length = length ($seg); my $seg_end = pos ($seq); my $seg_start = $seg_end - $seg_length + 1; print $seg. "\|" . $seg_start . "-" . $seg_end . "\n"; }` [download] To say what you expect the answer to be when posting have a read of How to ask better questions using Test::More and sample data. 🦛	[reply] [d/l]
Re^4: Find the boundaries of a substring in a string by Anonymous Monk on Jun 27, 2023 at 09:31 UTC
Re^5: Find the boundaries of a substring in a string by hippo (Archbishop) on Jun 27, 2023 at 09:36 UTC
Re^5: Find the boundaries of a substring in a string by Anonymous Monk on Jun 27, 2023 at 09:35 UTC
Re: Find the boundaries of a substring in a string by kcott (Archbishop) on Jun 27, 2023 at 09:44 UTC
Take a look at pos. `$ perl -e ' my $str = "BaBBaBBBaBBBB"; # start pos: 0 2 5 9 # end pos: 0 3 7 12 my $fmt = "%d -> %2d\n"; while ($str =~ /(B+)/g) { my $len = length $1; my $pos = pos $str; printf $fmt, $pos-$len, $pos-1; } ' 0 -> 0 2 -> 3 5 -> 7 9 -> 12` [download] Edit: `s/look as pos/look at pos/` Thanks LanX. — Ken	[reply] [d/l] [select]
Re: Find the boundaries of a substring in a string by harangzsolt33 (Deacon) on Jun 28, 2023 at 14:53 UTC
My example will give you the start and end positions without using regex. I did some testing, and it appears that this solution runs much slower than the regex solution. But it's just to show that there is more than one way to do it : `#!/usr/bin/perl use strict; use warnings; my @STARTPOS = (); my @ENDPOS = (); my $str = "BaaaaBBBBBBaaaaaaaBBBBBBBBBBBBBBBBBBBBaaaBBBBBBBBBBBxB"; print "$str\n\n"; my $i = 0; while (($i = index($str, 'B', $i)) >= 0) { push(@STARTPOS, $i++); while (vec($str, $i++, 8) == 66) {} push(@ENDPOS, $i-1); } # Display results: for (my $i = 0; $i < @STARTPOS; $i++) { print "\nstart: ", $STARTPOS[$i], "\t-> end: ", $ENDPOS[$i]; }` [download]	[reply] [d/l]
Re^2: Find the boundaries of a substring in a string by Anonymous Monk on Jun 28, 2023 at 18:22 UTC
So a worse way, coding perl like C, defeating the purpose, but with much worse performance. Awesome	[reply]