http://qs1969.pair.com?node_id=238980

fauxpas has asked for the wisdom of the Perl Monks concerning the following question:

What's a quick way to read in a line (as defined by $/) with a specified maximum length? I want to read in lines of input from unauthenticated connections over the network and I'm worried that by using <FILEHANDLE> I allow someone to blow out the memory on my server. Something that's as easy to use as

while ( $line=read_a_line(FILEHANDLE,$maxlinelen) ) {}

would be nice. =)

Replies are listed 'Best First'.
Re: Read a line with max length ?
by graff (Chancellor) on Feb 27, 2003 at 04:02 UTC
    There is a detail in "perldoc perlvar" about assigning an integer value to $/, which led me to discover the following, which I think is just what you want:
    my $line = ""; my $maxlen = 10; $/ = \1; while (<>) { $line .= $_; last if ( /[\r\n]/ or length( $line ) == $maxlen ); } print $line;
    I haven't tested this thoroughly in terms of what happens with underlying input buffers, but in terms of the behavior of variables and values within the perl script, it seems to do exactly what you'd like.

    Setting $/ to \1 means the input record size is one byte; the while loop will append one character byte at a time to $line, and will terminate either when you read $maxlen bytes or when you get any sort of line termination. (This will work sensibly for all character encodings I've heard of.)

    No doubt this will raise some hackles because it seems like a really non-optimal amount of overhead for reading input; maybe you can set $/ to $maxlen, but then if you're really expecting to do line-oriented input, and you're going back for additional reads during a given connection, you have to worry about making sure that any residue that follows a line termination is carried over to the next time that you clear $line to start filling it again. One way or another, you pay extra for being really careful (so just believe that it ends up less expensive than being left open to hackers).

    UPDATE: Having thought about this a bit more, I think that any approach that tries to read more than one byte at a time will get into a lot of trouble, if your intention is really to do line-oriented input safely.

    The point is that, as soon as you leave behind the default value of $/ and expect some minimum number of bytes greater than one on each read, you run the risk that (a closing portion of) a line will be left stranded in the input buffer until either: (a) more stuff is written by the remote host to fill the buffer, or (b) you close the connection. This would hose your process, putting it into an indefinite wait. I bow to Elian's more informed experience on this issue -- but also second Zaxo's point about making sure to watch for multiple lines in one read. Thanks, folks!

      You should read more than one byte at a time. It's not particularly dangerous, and works just fine. A buffer size of 1500 is good, since it tends to match the maximum TCP/IP frame size. Sockets have timeouts, so worst case your program will pause waiting for the remote end, but that'll happen for a one-byte read just as often, so it's not a problem that you're avoiding.

      Note that a read on a socket for more data than is available won't stall. If you issue a read for 1500 bytes but there's only 100 available, you'll get 100 back, barring really bizarre OS bugsfeatures.

      This seems to do the trick, thanks. I'll be careful now and figure out how to be optimal later. ;)
Re: Read a line with max length ?
by FoxtrotUniform (Prior) on Feb 27, 2003 at 03:24 UTC
      What's a quick way to read in a line (as defined by $/) with a specified maximum length?

    Hmm. read FILEHANDLE, $line, $max_len will read exactly $max_len bytes into $line, which:

    • Isn't the same thing on multibyte-character input, and
    • Isn't what you want.

    Nonetheless, if you're really worried, it may do the trick.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

Re: Read a line with max length ?
by robmueller (Novice) on Jun 18, 2009 at 08:22 UTC
Re: Read a line with max length ?
by cees (Curate) on Feb 27, 2003 at 03:22 UTC
    perldoc -f sysread

    read in your data a piece at a time using sysread, then use index to look for your line breaks so you can split the raw data into lines.

      From the Camel (3rd edition), p. 810:

      You should be prepared to handle the problems (like interrupted syscalls) that standard I/O normally handles for you. Because it bypasses standard I/O, do not mix sysread with other kinds of reads, print, printf, write, seek, tell, or eof on the same filehandle unless you are into heavy wizardry (and/or pain).

      Why sysread over read?

      --
      F o x t r o t U n i f o r m
      Found a typo in this node? /msg me
      The hell with paco, vote for Erudil!

Re: Read a line with max length ?
by Zaxo (Archbishop) on Feb 27, 2003 at 11:43 UTC

    Be careful with local $/ = \$maxlength;. That changes what the diamond operator thinks of as a line. If the input stream contains "foo\nbar\nbaz", then the line read will have as many "\n" as $maxlength will contain.

    You may want to keep the default record seperator and limit length with something like this:

    while (<FILEHANDLE>) { $_ = substr $_, 0, $maxlength if length > $maxlength; # ... }

    Update: graff is right that this does not avoid problems with extra-long lines. Buffer overflow should not be a problem on most OS's, but forcing the machine into swap and OOM could be an attack.

    After Compline,
    Zaxo

      At the point where this statement executes, with $/ having any sort of string value (including the default line terminator):
      while (<SOCKETHANDLE>) { ...
      I think the potential damage will already be done, if the process at the other end of the socket happens to write, say, 4 GB of data with nothing that matches $/.

      (Then again, I could be wrong about that, 'cuz I haven't tested it... does the <> mechanism provide some sort of safe buffering or allocation method to avoid stuffing an impossible amount of data into $_? If so, this seems magical and quite unexpected.)