Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks!
I keep trying to understand why I am getting an exception error although it does not seem to be some kind of mistake. So, I have a script that is reading a file and storing some number ranges, which I will use to re-structure a string.
Initial string is something like:
PEGYNDRQAVNGSFYKLTFAPTFKVGSIGDFFSRPEIRFYTSWMDWSKKLNNYA ......................................................

and, with my script, I will replace the '.' with other characters. The script seems to stumble upon an error in this chunk:
while ($rest=~/\<REGION seq\_beg\=\"(\d+)\"\s+pdb\_beg\=\"\d+\"\s+seq\ +_end\=\"(\d+)\"\s+pdb\_end\=\"\d+\"\s+type\=\"(\w+)\"\/\>/mg) { $start=$1; $end=$2; $type=$3; $TM_part_to_store = "$start-$end"; $length_part="$start-$end"; if($type==1) { substr($topo_initial, ($start-1), ($end-$start+1), ($side1 x ( +$end-$start+1))); } elsif($type==2) { substr($topo_initial, ($start-1), ($end-$start+1), ($opposite{ +$side1} x ($end-$start+1))); } elsif($type eq 'B') { push @all_TMs_line, $TM_part_to_store; } else { substr($topo_initial, ($start-1), ($end-$start+1), ('U' x ($en +d-$start+1))); } }

where I am getting the info I need and change the '.' accordingly. For debugging, I made the script print the string length and each $start and $end, to see which substring is out of bounds, in the form of string length <TAB> range
Weirdly enough, in the string that it fails, which has a length of 413 characters, the debugging prints the following:
413 1-6 413 7-15 413 16-34 413 35-46 413 47-53 413 54-67 413 68-83 413 84-85 413 86-95 413 96-112 413 113-118 413 119-133 413 134-142 413 143-153 413 154-160 413 161-174 413 175-181 413 182-186 413 187-193 413 194-217 413 218-224 413 225-237 413 238-244 413 245-266 413 267-273 413 274-282 413 283-290 413 291-304 413 305-311 413 312-320 413 321-328 413 329-347 413 348-355 413 356-369 substr outside of string at myscript.pl

As you can see, for some weird reason, and although my string's length is 413 chars and there are more ranges after 356-369, it exits with this error.
Any ideas?

Replies are listed 'Best First'.
Re: substr out of str error - but why?
by hv (Prior) on Jun 18, 2022 at 01:14 UTC

    First off, it would really help if you could provide an SSCCE.

    With the code provided it is far from clear what you are trying to do, or where the error is occurring.

    My first question would be: at what line of the provided code is the error actually occurring?

    My second question would be: is length($opposite{$side1}) always 1, or is it sometimes 0, or more than 1?

    My third question would be: does your real code have use strict and use warnings enabled?

    Then I would observe that the code would be easier to read if you used //x on your regexp, and if you had my $length = $end - $start + 1 and used that throughout. It would also be useful to add more instrumentation: show $type in each iteration, show the full string before and after each change, show all the parameters to substr() before each invocation.

    I suspect that with better instrumentation the answer will be obvious to you, but if you want help from us you need to help us to help you. And that starts with a Short, Self-Contained, Correct Example.

    Hugo

Re: substr out of str error - but why?
by haukex (Archbishop) on Jun 18, 2022 at 04:09 UTC

    In addition to what hv said, especially Use strict and warnings, do not parse XML with regular expressions (at least your input looks like it's XML). Since you didn't show your input (<REGION> tags) or the rest of your code (e.g. the %opposite hash), it's really hard to guess where the problem might be coming from, but it might be one of those two things.

    use warnings; use strict; use XML::LibXML; my $xml = <<'EOT'; <root> <REGION seq_beg="1" pdb_beg="2" seq_end="3" pdb_end="4" type="1"/> <REGION seq_beg="5" pdb_beg="6" seq_end="7" pdb_end="8" type="2"/> <REGION seq_beg="9" pdb_beg="10" seq_end="11" pdb_end="12" type="3"/> </root> EOT my $topo_initial = "ABCDEFGHIJK"; my $side1 = "x"; my %opposite = ( x => "?" ); my $dom = XML::LibXML->load_xml(string => $xml); for my $node ($dom->findnodes('//REGION')) { my $offset = $node->{seq_beg} - 1; my $length = $node->{seq_end} - $node->{seq_beg} + 1; if ( $node->{type}==1 ) { substr($topo_initial, $offset, $length, ($side1 x $length) ); } elsif ( $node->{type}==2 ) { substr($topo_initial, $offset, $length, ($opposite{$side1} x $ +length) ); } else { substr($topo_initial, $offset, $length, ('U' x $length) ); } } use Test::More tests=>1; is $topo_initial, "xxxD???HUUU";
Re: substr out of str error - but why?
by BillKSmith (Monsignor) on Jun 18, 2022 at 19:56 UTC
    Your code has processed several iterations correctly. The error message suggests that the next 'end' is beyond the end of the string. Are you sure that you are printing the range before you try to use it? Try printing your debug messages to STDERR. There must be either an error in your data or a special case that you are not considering. Use your editor to search for the last 'good' end. Parse the next few records by hand. If you cannot find the problem, you can write and post a simple program that demonstrates the problem with these offending records.
    Bill
Re: substr out of str error - but why?
by LanX (Saint) on Jun 18, 2022 at 08:59 UTC
    Like the other honorable brothers already said, you didn't provide enough information for a decent analysis.

    My guess is that your debugging only shows the initial length of the string, but not the current one.

    If one of your previous replacements resulted in a shorter string because of a bug, you'll end up out of bounds.

    see also hv's remark

    > is length($opposite{$side1}) always 1.

    I may add the same question for $side1 ...

    On a side note: another critique to your code is that you are escaping too many elements in your regex, neither _ nor " are meta.

    update

    furthermore are you relying on input coordinates you are parsing, but without validating their correctness.

    I can imagine various scenarios how this could fail.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery