Why can you only read one line at a time? HTML has no internal line breaks (\n is meaningless in an HTML file except in <pre> blocks )
Why not read the whole file first and parse it that way?
It wouldn't take much then to find the plain text.
Easy ways to read the whole file include: