Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I would like to simulate the effect of VI's :set number function on an already generated HTML file -- So every line in the HTML file would have a line number at the beginning. The HTML files are almost all text with a few links and a couple of tables. I have tried to use HTML::Parse to take off all tags (code used from a search I have done on this site) and then place numbers at the beginning of the text file, but that doesn't seem to work very well for these HTML files.
use HTML::Parser; use LWP::Simple; my $html = get "http://192.168.0.3/testing.htm"; HTML::Parser->new(text_h => [\my @accum, "text"])->parse($html); $file = "test.txt"; open(FILE,">>$file") or warn "File $file not found!\n"; print FILE map $_->[0], @accum;
I don't even know if using HTML::Parse is the best way to accomplish this. Does anyone have any other ideas? Any help would be greatly appreciated. ~Jason

Replies are listed 'Best First'.
Re: Add line numbers to HTML
by davido (Cardinal) on Mar 20, 2004 at 06:16 UTC
    Please clarify: You want to put line numbers at the start of each line of the HTML file, or you want to put line numbers at the start of each line displayed in the browser?

    If you're putting line numbers at the start of each line of the HTML file, it's a Perl 1-liner kind of thing:

    perl -pi.bak -e "$_ = qq/$.: $_/;"

    But that will break HTML where tags span lines. And it will look pretty confusing when rendered in the browser since logical file newlines don't equate to browser-rendered newlines.

    On the other hand, if you want every line rendered in the browser to display a line number, you've got a bit of a problem, as HTML output is generally formatted into paragraphs, not lines. Unless you're using <pre> tags, how are you going to determine where new lines begin in someone's browser? Regular HTML text auto-wraps in normal browsers.


    Dave

      I should have been more clear. I would like the line numbers to appear at the begining of each line as viewed in the browser. I am starting to think that this is going to be impossible... ~Jason
        Unless you can assure that lines don't wrap, you can't do that. However, <pre> tags are able to prevent wrapping; they preformat the text to your specifications, regardless of browser window size. That has a BIG disadvantage though. Consider someone using a screen smaller than the one for which your document was designed. The benefit of line numbering will be outweighed by the pain in the rear of having to horizontally scroll to read each line.

        Nevertheless, you do see it done sometimes. See our own Craft section as an example of using <pre> tags, tables, etc., to create a formatting that is condusive to line numbering.


        Dave

Re: Add line numbers to HTML
by matija (Priest) on Mar 20, 2004 at 06:08 UTC
    It's not entirely clear to me where you want your line numbers. If you just want them where a :set number would put them, then you don't need to parse the file - the code is simple:
    $cnt=1; while (<>) { print sprintf "%5d%s",$cnt++,$_; }
    Of course, that might make for some invalid HTML - like if a tag is broken across a newline (Frontpage seems to be fond of doing that).

    If you're looking for numbers in the HTML after it has been rendered into text, look up HTML::FormatText.

    If you want a number pre-prended to every line of the rendered HTML page in a browser, than I doubt you can do it. Particularly once tables get involved. Not because of the limitations of Perl, but because of the nature of HTML rendering.

Re: Add line numbers to HTML
by b10m (Vicar) on Mar 20, 2004 at 10:09 UTC

    What about:

    $ nl file.old.html > file.new.html
    --
    b10m

    All code is usually tested, but rarely trusted.