Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Strange regex behavior

by Tanktalus (Canon)
on Aug 14, 2005 at 21:46 UTC ( [id://483730]=note: print w/replies, xml ) Need Help??


in reply to Strange regex behavior - beware chunk boundaries!

$ perl -e '$/=undef;$t=<>;foreach(3815,3975,3871){$i=index($t,"ID=$_") +;printf"ID=$_:%d(chunk=%d,offset=%d)\n",$i,int($i/1024),$i%1024}' Alp +habeticalListing.asp ID=3815:103836(chunk=101,offset=412) ID=3975:104688(chunk=102,offset=240) ID=3871:105271(chunk=102,offset=823)

Ok, perhaps I could have used some spaces on that one-liner, but I was having too much fun this way. Oddly, it seems that 3975 is the first match in its chunk, so I would have expected it to be 3871 that got missed.

You should try this as your loop:

while (1) { my $buf; my $n = $http->read_entity_body($buf, 1024); die "read failed: $!" unless defined $n; last unless $n; push @listing, $buf =~ /ID=(\d+)/g; }
Note that the problem still could exist where the literal string "ID=xxxx" crosses over the boundary - say "ID=3" at the end of one 1024-byte chunk, and "975" at the beginning of the next. It's probably easiest to slurp the whole thing in, and then do a single global match.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://483730]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-25 04:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found