Re: tags being broken in the wrong places

Writing reliable code like this can be annoyingly tricky. Assuming that you cannot use, say, an HTML or XML parser (like XML::Simple), you might “reasonably assume” that no tag will exceed, say, 512 characters. You write code that maintains a string-buffer of that (arbitrary, but longer than 256...) length. By examining the first non-empty character, you decide if the next thing in the buffer is (either...) “a tag” or “not a tag.” You either write out an appropriate chunk of the non-tag string, or the complete tag. Then, you remove the characters from the buffer that you have just written out, and then you read enough new characters (if there be any...) to refill the 512-character buffer, and repeat. (If you encounter a tag that exceeds the size of your buffer, well, you’re screwed, but “bits are cheap.”)

Obviously, if you can simply slurp the whole file into memory ...