sarani has asked for the wisdom of the Perl Monks concerning the following question:

The code works.

BUT I've been trying to find out why I'm getting warnings of "substr outside of string" from my code for about three hours now, so I'd be really grateful if someone could tell me what I'm doing wrong.

In this section of code, I extract two cartesian coordinates from two different files, save it to two arrays, and compute the distance between them for each pair of coordinates. The format of the two files is such that the coordinate values I need are bang in the middle of the file. Wait, here is the sample data:

HETATM 1562 O HOH 189 48.728 50.544 -16.104
HETATM 1563 O HOH 190 56.288 63.252 16.442

I'm new to perl and I'm used to use C, so I used a nested foreach loop.

I've read the manual, and I've hit search on both google and the Supersearch here. (I haven't waded through the entire mass in that, but I spent about half an hour trying).

The relevant bit of code:

foreach (@hlines) #each line from file1 was saved to this array. { $xh= substr $_,31,8; $yh= substr $_,40,8; $zh= substr $_,48,8; foreach $pline (@plines) #ditto, each line from file2 saved to this array { $x1= substr $pline,31,8; #using $_ here instead of $pline aborts execu +tion due to compile errors. $y1= substr $pline,40,8; $z1= substr $pline,48,8; $r2=((&square($xh-$x1))+(&square($yh-$y1))+(&square($zh-$z1))); $r=sqrt($r2); print "$i. $r2 and square root is $r\n"; } }

The warning shows up for the inner loop, and I'm using strict, subs, and warnings. This is perl v5.8.0 and I'm on a Red Hat 9.0 Linux system.

I don't understand why I'm getting the warning for the inner but not the outer loop.
The manual says that the warning means that I 'looking' outside the string for the substring. All those three substr in the inner loop have as many characters on either side of them as the outer loop substr. 56 char in a line of input, btw.

What am I missing here?

Replies are listed 'Best First'.
Re: substr in nested foreach loop
by roboticus (Chancellor) on May 22, 2006 at 12:04 UTC
    sarani:

    You didn't show enough code & data to be able to say for sure. But one thing occurs to me: Perhaps you've got blank lines in @plines, so you're asking for characters after the end? Try inserting the following line as the first line in your inner loop to check:

    print '"',$pline,'": len=',length($pline),"\n";

    --roboticus

      roboticus: Thanks :D
      There are no blank lines anywhere in the data file. The lenght reads as 57, and the last substr asks for 8 chars after the offset,48 - that's 56. So no char after the end, unless my basic math is wrong.(Right?) If you'd like to see the entire code, I can put it up...

      UPDATEIt's up on my scratchpad.

      #! /root/bin/perl use strict; use warnings; use subs; sub square; sub read; sub dist; my $r; my $r2; my $i=0; my $xh; my $yh; my $zh; my $f1; my $FILE1; my $hline; my @hlines; $f1=&read; my $x1; my $y1; my $z1; my $f2; my $FILE2; my $pline; my @plines; $f2=&read; open (FILE1, "$f1") or die "Argh! File not opened! $!"; open (FILE2, "$f2") or die "Argh! Second file not opened! $!"; while (<FILE1>) { $hline=<FILE1>; push (@hlines,$hline); } + while (<FILE2>) { $pline=<FILE2>; push (@plines,$pline); } close FILE2; close FILE1; print @hlines; print @plines; &dist; exit; sub dist { foreach $hline (@hlines) { $xh= substr $hline,31,8; $yh= substr $hline,40,8; $zh= substr $hline,48,8; foreach $pline (@plines) { print '"',$pline,'": len=',length($pline),"\n"; $x1= substr $pline,31,8; $y1= substr $pline,40,8; $z1= substr $pline,48,8; $i++; $r2=((&square($xh-$x1))+(&square($yh-$y1))+(&square($zh-$z +1))); $r=sqrt($r2); print "$i. $r2 and square root is $r\n"; if(($r<=2.4)&&($r>=3.6)) { print "match!"} } } print $i; return 1; } sub square { my $num; my $sq; $num=shift; $sq = $num * $num; return $sq; } sub read { print "Enter file name:"; $a=<STDIN>; chomp($a); return $a; }

        Not directly related to your issue, but some comments anyway, in no particular order:

        while (<FILE1>) { $hline=<FILE1>; push (@hlines,$hline); }

        is the same as

        @hlines = <FILE1>;

        Also,

        my $FILE1;

        probably doesn't do what you think. $FILE1 is not the same as the FILE1 you use in your open statement. Either change all places where you use a bare FILE1 to $FILE1 (which would make it a lexical filehandle), or just remove the my $FILE1; declaration.

        In your open call, you don't have to put quotes around the filename. It is however a good idea to use the three-argument form of open, in case someone tries to use something like >datafile as filename and causes you to overwrite the data file that way. So instead of

        open (FILE1, "$f1") or die "Argh! File not opened! $!";

        use something like

        open (FILE1, "<", $f1) or die "Argh! File not opened! $!";

        In your read subroutine, you're using the variable $a without declaring it. With almost any other name this would have caused a compile time error, but $a and $b are special. For that reason, it's best to avoid using either of those names.

        $xh, $yh, $zh, $x1, $y1, $z1, $i, $r and $r2 are only used within your dist subroutine, so you should only declare them in there, and not at the global level

        read is also the name of a built-in function, so it's usually better to avoid using it for any of your own functions unless you explicitely want to override it

        Your shebang line is showing /root/bin/perl as the path to perl. Since /root is usually only accessible by root, this means that you're developing and testing this code as root, which is generally not recommended.

        The use subs; is useless here as far as I can tell.

        Since the two file reads are really unrelated, you can reuse the same filehandle for both of them

        Making the changes above, the code looks something like this:

        #!/root/bin/perl use strict; use warnings; sub square; sub read_filename; sub dist; my $FILE; my $f1 = read_filename; open ($FILE, "<", $f1) or die "Argh! File not opened! $!"; my @hlines = <$FILE>; close $FILE; my $f2 = read_filename; open ($FILE, "<", $f2) or die "Argh! Second file not opened! $!"; my @plines = <$FILE>; close $FILE; print @hlines; print @plines; dist; exit; sub dist { my $i=0; my ($r, $r2); foreach my $hline (@hlines) { my $xh = substr $hline,31,8; my $yh = substr $hline,40,8; my $zh = substr $hline,48,8; foreach my $pline (@plines) { print '"',$pline,'": len=',length($pline),"\n"; my $x1 = substr $pline,31,8; my $y1 = substr $pline,40,8; my $z1 = substr $pline,48,8; $i++; $r2=((square($xh-$x1))+(square($yh-$y1))+(square($zh-$z1)) +); $r=sqrt($r2); print "$i. $r2 and square root is $r\n"; if(($r<=2.4)&&($r>=3.6)) { print "match!"} } } print $i; return 1; } sub square { my $num = shift; my $sq = $num * $num; return $sq; } sub read_filename { print "Enter file name: "; my $fn = <STDIN>; chomp $fn; return $fn; }

        This code can stil be further improved, but it should be a nice step in the right direction.

        Well... I don't know what the problem may be, but maybe you should add a
        print length($hline),"\n";
        before the substr that's dying, just to be sure, maybe even print the line.

        But the main reason I write is to suggest using one unpack instead of three substrs... syntax would be (approximately, not tested)

        ($xh,$yh,$zh) = unpack("x31A8x1A8A8",$hline);
        Pretty sure that would be more efficient.

                        - Ant
                        - Some of my best work - (1 2 3)

Re: substr in nested foreach loop
by Jasper (Chaplain) on May 22, 2006 at 17:09 UTC
    The problem is that this line :

    HETATM 1562 O HOH 189 48.728 50.544 -16.104

    is only 43 characters long, and you're doing $zh= substr $_,48,8; on it. That takes the 48th through to the 55th characters and sticks them in a string. Read the error message with this in mind, and see if you can work it out :)
      There are whitespaces on either side, and length gives 57. The file is formatted such that except for the last line, each line has that many chars. :) That's the cause of my confusion.
        Yeah but no but yeah.

        If you stick a warning before each substr operation giving the length of the string and which characters it's trying to find, I bet you'll trace the problem pretty quickly.
Re: substr in nested foreach loop
by Gilimanjaro (Hermit) on May 22, 2006 at 15:19 UTC

    How often are you getting the warning?

    Beware, that if that last input line of your file ends in a newline, you'll probably get an empty string ('') as your last line from perl...

      I'm getting the warning for the last input line from the file, and I'm getting it the number of lines there are in the outer loop file ... :D That might be it...
        Glad to be of help... ;-)
Re: substr in nested foreach loop
by rminner (Chaplain) on May 22, 2006 at 15:23 UTC
    #HETATM 1562 O HOH 189 48.728 50.544 -16.104 my $num = qr{-?\d+\.\d+}; my $coordinate = qr{\s+($num)\s+($num)\s+($num)\s*$}; sub get_coordinates_for { my $filename = shift; my @coordinates = (); if (open $FH, $filename) { my $linecounter = 0; while (my $line = <$FH>) { $linecounter++; if ($line =~ m{$coordinate}) { my ($x , $y , $z) = ($1 , $2 , $3); push @coordinates , [$x , $y , $z]; } else { warn "[$filename : $linecounter] invalid lineformat!\n +"; } } close $FH; } else { die "failed to open $filename for reading($!)\n"; } return \@coordinates; } my $file1 = shift @ARGV; my $file2 = shift @ARGV; my $coordinates1_ref = get_coordinates_for($file1); my $coordinates2_ref = get_coordinates_for($file2); # calculate the results: foreach my $coordinate1 ( @$coordinates1_ref ) { my ($x , $y , $z) = @$coordinate1; foreach my $coordinate2 (@$coordinates2_ref) { my ($a , $b , $c) = @$coordinate2; my $distance = sqrt( ($x-$a)**2 + ($y-$b)**2 + ($z-$c)**2 ); print "distance ($x,$y,$z) to ($a,$b,$c) = $distance\n"; } }
      A nice way to parse the coordinates is using Regexp::Common.
      use Regexp::Common; ... my $num = $RE{num}{real}; my ($x, $y, $z) = ($1, $2, $3) if $line =~ /($num) ($num) ($num)\s*$/;
        :D thank you.
        Isn't the behavior of "my" with an "if" modifier unspecified? I think
        my ($x, $y, $z) = ($line =~ /($num) ($num) ($num)\s*$/);
        would do what you want here. If it doesn't match all three variables will be undefined.
      Oh wow, that is as smooth as silk.. thank you muchly!