Substr warning

New Novice has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am writing a program that extracts information from websites. As the websites all use the same format, I use a loop to extract the information I am interested in. For some documents, I get the error message

substr outside of string at extract.pl line 187
use of uninitialized value in string eq at extract.pl line 182
use of uninitialized value in concenation (.) at extract.pl line 185
use of uninitialized value in concenation (.) at extract.pl line 185

This message is then repeated endlessly (presumably because of the loop?). What causes the "substr outside of string" warning?

Here is the code the message is referring to, the first line is number 178:

 if ($content_sub2=~'Rapporteur') { 
     $ch=substr($', $ch_count, 1);
     my $ch_count2=0;
     while ($ch_count2<2) {
      if ($ch eq" ") { 
       $ch_count2++;
      }
      $d_ep_rap=$d_ep_rap.$ch;
      $ch_count++;
      $ch=substr($', $ch_count, 1);
     }
    }
[download]

Thank you

Comment on Substr warning Download Code

Replies are listed 'Best First'.
Re: Substr warning by davido (Cardinal) on Sep 24, 2004 at 14:58 UTC
After your code acts upon the first HTML file, do you ever reset `$ch_count` to zero, or does it keep on getting incremented larger and larger with each subsequent file? The code snippet you've provided doesn't tell us that. Also, is $ch_count starting out being undef initially? In particular, the first scenario could get you into trouble. If you're acting upon multiple files, and $ch_count has grown to some value that exceeds the length of the postmatch string, you've got a problem. Dave	[reply] [d/l]
Re^2: Substr warning by Anonymous Monk on Sep 27, 2004 at 11:07 UTC
Hi, I always reset ch_count. I use this structure of code for a number of occasion and each time I reset ch_count.	[reply]
Re: Substr warning by JediWizard (Deacon) on Sep 24, 2004 at 14:36 UTC
This is a little off topic, but I think it is of note anyway. You might want to consider changing your code so as not to use $', $', $` and $& are not set by perl unless you use them in your code. If you use any of these variables for one regex, perl must suplly them for all regexes (performace hit, memory hit). I don't see any reason that would stop you from modifying your regex so that the desired additional characters are returned in $1. This might help avoid the errors you are seeing, as well as boost the performance of your script. May the Force be with you	[reply]
Re: Substr warning by Happy-the-monk (Canon) on Sep 24, 2004 at 14:09 UTC
The first message means that your variable `$ch` actually is empty at that time in the loop. The second means the same. "`substr outside of string`" means you are trying to get at characters where there is no string left to get them from. Imagine you're trying to get the second character of a string that's only one character long. Cheerio, Sören	[reply]
Re^2: Substr warning by Anonymous Monk on Sep 24, 2004 at 14:16 UTC
Hi, thanks for this. There is plenty of string left, maybe there es a problem with special characters. The problems occurs at least one time while trying to read in a French name (with accent grave). Is there a way to tell perl to ignore these special characters for the moment, so that I can transform them later? Cheers!	[reply]
Re^3: Substr warning by bart (Canon) on Sep 25, 2004 at 00:33 UTC
No, you're barking up the wrong tree. That warning can only mean that your starting index is bigger then the length of the string — or, if negative, too large on the other side. Perl starts counting from the end of the string backwards, then. Special characters have nothing at all to do with it. As an aside: I think your approach is wrong. You're tackling this as if you're programming in C. Perl has better ways to parse strings than this extremely low level stuff: you really should take a step back, decide what you're actually trying to achieve, and use a regex or maybe two, to do the same job.	[reply]
Re: Substr warning by TrekNoid (Pilgrim) on Sep 24, 2004 at 15:21 UTC
There's not a lot to go on here, but my gut feeling is that $ch_count must be getting larger than the string you're looking at. It looks to me like you're trying to grab the first two words out of each line... maybe a different approach would work better? `my $line = 'This is a test'; my (@words) = split( /\s+/, $line); my $twowords = $words[0] . ' ' . $words[1]; print "$twowords\n";` [download] Trek	[reply] [d/l]
Re: Substr warning by Jenda (Abbot) on Sep 24, 2004 at 14:55 UTC
Could you explain what is the code supposed to do? It seems to me it could be written in one or two lines but I just can't get it straight. Maybe showing and explaining a bigger chunk would help. All in all it looks to me like you are making things much more complex than they have to be. Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne	[reply]
Re^2: Substr warning by Anonymous Monk on Sep 27, 2004 at 11:05 UTC
Hi, I am extracting information from a html-site, which I have stripped of tags, etc. beforehand. In this instance, I am interested in the two words following the string "Rapporteur: ". There is probably a better way of doing it, but at the moment I am using $' to get the text after the string I searched for and then read in every single character once at a time, add it to my variable until there have been two character that equal " " (thus, I have two words). I am confused about the substr warning as I am sure, that there are plenty of characters left in the substring. Hope this makes my dilemma clearer.	[reply]
Re^3: Substr warning by Jenda (Abbot) on Sep 27, 2004 at 13:45 UTC
Yes it does. You can (and should) do this by a single regex. Like this: `if ($html =~ /Rapporteur: (\S+ \S+)/) { $reporters_name = $1; }` [download] The regexp searches for "Rapporteur:" followed by a single space, then some non-space characters, a single space and again some nonspace chars. And it captures those two groups of nonspace characters. Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne	[reply] [d/l]