in reply to Rendering HTML / capturing pixels

Rendering HTML is far from "easy", especially with the "simple" things like tables and images. You might find some inspiration in the converters that convert HTML to Postscript and/or (La)TEX. For the actual rendering, you will also have to consider CSS and the like.

Under Win32, there are two relatively easy ways to capture the image of a webpage, either you automate Internet Explorer to display the HTML, and then take a screenshot, or you automate Internet Explorer to print the page into a file, and then postprocess that file.

Under Unix, I see only the way of printing to a file, but there is no such nice way of automating a browser as there is under Win32. You might be able to write some XS-glue to automate one of the rendering engines (KHTML, Gecko), but that's not "easy" per se (IMO).

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Rendering HTML / capturing pixels
by traveler (Parson) on Feb 27, 2003 at 15:30 UTC
    Some (all? most?) versions or *nix Netscape allow remote control. You start netscape with the "-remote" option. You could probably generate Postscript as Corion suggests with the commands openURL() and saveAs(). I have not tried that particular combination. See this for more information.

    Another option would be to get the Mozilla source and modify it directly or see if something in the source allows what you want.

    Finally, building on what PodMaster said, there is a tkHTML widget here, but I do not know if there is a perl binding, yet. I have not played with it at all.

    HTH, --traveler

      Thanks for the information and links. I will look into the Netscape angle.

      SpaceAce

       If it's just plain text formatting then 'links --dump' might be a way to go.

       I guess it depends upon what the motivation for this is, if it supposed to be used as a CGI script, for example, there might not be an X session running for the graphical browser to use..

      Steve
      ---
      steve.org.uk
Re: Re: Rendering HTML / capturing pixels
by Anonymous Monk on Feb 27, 2003 at 19:32 UTC
    I am not overly concerned with the task being "easy" After all, the easy ones are usually the least interesting :)

    I had already considered browser automation, but I would prefer to make the program as standalone as possible. If I have to depend on a browser to do it, I will probably try to work with a *nix version of Netscape as opposed to going for a Win32 solution.

    SpaceAce