in reply to Re^2: Match text from txt to html
in thread Match text from txt to html
Just as a text file is both a set of lines and a stream of bytes, an HTML document is both a tree and a stream of elements. HTML::Parser extracts the latter, which is equivalent to walking the DOM tree in some order. The advantage of using HTML::Parser for an application like this is the same as the advantage of processing a text file line-by-line without reading the whole file into memory.
While it is unlikely that an HTML document would not fit into memory on a client, our questioner could be building something that runs on a server, with an instance of the program for each concurrent client connection which can quickly become very large in aggregate if many clients are active. In this case, building the entire tree in memory is unnecessary because the transformation to be applied is very simple: find and mark ocurrances of certain text in a finite sliding window. If this is running on a server, building the DOM tree in memory is both wasteful and foolish, creating an opportunity for easy DoS attacks.
Put simply, if you do not actually need the DOM tree, do not waste time and memory building it!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Match text from txt to html
by Anonymous Monk on Sep 06, 2019 at 04:04 UTC | |
by jcb (Parson) on Sep 06, 2019 at 04:09 UTC | |
by Your Mother (Archbishop) on Sep 06, 2019 at 05:03 UTC | |
by jcb (Parson) on Sep 06, 2019 at 22:01 UTC | |
by Anonymous Monk on Sep 06, 2019 at 08:31 UTC | |
by jcb (Parson) on Sep 06, 2019 at 23:09 UTC | |
by Anonymous Monk on Sep 07, 2019 at 05:26 UTC | |
by Anonymous Monk on Sep 06, 2019 at 23:44 UTC | |
|