artist has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
Little more explaination:
I would like to catch the page data while surfing..the net. and then act with intelligence.

Ex> I would like to get all links on the browser page, store in a local database and then, while surfing other pages, browser will display only those links which are not in database.

I know the HTML::LinkExtor modules etc.. and Also know the Win32::OLE to start browser. I need solution to build the framework.
Ex: How to put a button in the browser, which will activate to run the perl script to capture data.

Thanks,..
Artist

Replies are listed 'Best First'.
Re: catch browser page while surfing
by voyager (Friar) on Jul 05, 2001 at 18:45 UTC
    One approach would be to have a frame set where your surfing is in one frame and a "get links" button/link is in another frame.

    The button would have a form with a target="content_frame_name"could grab the url from the surfing frame (with onClick javascript) and pass it to a Perl script that used LWP::Simple to get($url). You would use HTML::LinkExtor to get the links; an insert command into your history sql table that added the link to your list to display if it wasn't already in the database.

    When you're done with all the links, send back a page with all the links not in the database.

Re: catch browser page while surfing
by mattr (Curate) on Jul 05, 2001 at 18:52 UTC
    Hmm.. I *think* I understand what you want. I suppose there might be a way to get the information out of Mozilla, but probably not just with Perl and not without a lot of study.

    I expect the best way would be to build your own web browser and not try to stick a button into Netscape. So you have to write a browser, and somehow render the page which is very hard to do well, unless plain text is enough for you.

    It seems that HTML and LWP modules would be useful for writing a text-only version, and you could run this inside a GUI window if you like but you will not be gaining much in the way of functionality by using a window unless you have something special on your mind about which we don't know.

    Perhaps someone knows about some magical dll that will do what you want but generally I'd say this is not a case that Perl will be able to enhance your current experience of using Internet Explorer with that software as-is. Please be a bit clearer about what exactly you want to do since it is a little confusing what you mean by "browser".

    Also it seems that if you only display links which are new, you are going to be unable to read a lot of pages that have links embedded in the text, like PerlMonks. Perhaps you want not to remove data, but to add data to the page so as to give you more information about the connectedness of the pages you are browsing?

(ichimunki) Re: catch browser page while surfing
by ichimunki (Priest) on Jul 05, 2001 at 19:43 UTC
    You could write a plug-in to your favorite browser. Probably Perl is not going to apply here.

    You could write a patch for an open source browser like Mozilla. Again, Perl is probably not going to be too much use here (I assume if you know enough C to understand the Mozilla source, you won't need to use Perl to implement the features you want).

    You could write a browser in Perl-- or extend one that already exists. I've seen a few projects along these lines:
    the sample browser included in the Tk::HTML module (which I've had mixed success with and which has pretty sparse documentation),
    this one (I wrote this, it has almost no features, it gets ignored a lot and grows very slowly)
    this Tk browser (I don't see any source for this one, but you might email the author)
    this other Tk browser (the most advanced of the group)
Re: catch browser page while surfing
by Anonymous Monk on Jul 05, 2001 at 23:50 UTC
    You could write a proxy Server, that parses each file it gets from the net, modifying the links in the process, that all links point to your proxy that fetches them in turn for you...