in reply to How to strip everything in a string except HTML Link

Hello,
i think HTML::LinkExtor will be a useful tool in your case, and this old node too.

If you want to update a list of unique links you can store them somehow (plain text, database, storable file..) then you firstly load this cache in the program, building up an hash (keys are unique, so it helps). After you can extract links and update the hash only if key does not exists. On success write the new copy of the storage.
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: How to strip everything in a string except HTML Link
by Anonymous Monk on May 15, 2015 at 07:32 UTC
    But in there, you know the base:

    my $base = 'http://perlmonks.org/';
    I will have no idea what they are, could be any news affiliate website in the world.

    I just want to remove the other stuff and leave what is in the html link:

    Link: <a href="http://example.com">and Anchor</a>
    If that above were the string, it would remove Link: and leave the rest.

    my $string = q~Link: <a href="http://example.com">and Anchor</a>~; $string =~ s/<[a href.... # I cannot remember this string. There was o +ne that worked perfect, even if the link had target="_blank" it did n +ot matter what else it had... but I cannot find it in any of my files + or remember who to write it.


    Also, I've at this point already downloaded the one page they are all on, and I've parsed it down to just one table cell, that has other data in it and I've gotten out of that table cell the information I need, all that is left is the remnants including the html link with anchor... so I want to just use that string to remove everything left, except the link and anchor.