in reply to Re^2: GREP Question: Filtering out third-party images with Privoxy
in thread GREP Question: Filtering out third-party images with Privoxy

"I should have been clearer, ..."

No, you were clear enough; I should have been more thorough in my reading of your question. Anyway, I've already picked up on that and updated my response.

Here's how you'd go about using a variable domain name in Perl; I'll leave you to figure out how to implement that in Privoxy. Note: I've added a few more tests.

#!/usr/bin/env perl use strict; use warnings; my $html_fragment = <<'END_HTML'; <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> <img src="http://pictures.google.com/someimage.jpg" /> <img src="http://google.com/someimage.jpg" /> END_HTML my $domain_to_keep = 'google.com'; print "Initial markup:\n"; print $html_fragment; $html_fragment =~ s/\s*<img.*src="http:\/\/(?!.*\Q$domain_to_keep\E\/) +[^>]+>//gm; print "Modified markup:\n"; print $html_fragment;

Output:

Initial markup: <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> <img src="http://pictures.google.com/someimage.jpg" /> <img src="http://google.com/someimage.jpg" /> Modified markup: <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://pictures.google.com/someimage.jpg" /> <img src="http://google.com/someimage.jpg" />

-- Ken

Replies are listed 'Best First'.
Re^4: GREP Question: Filtering out third-party images with Privoxy
by karld12 (Initiate) on Jan 24, 2014 at 10:14 UTC
    Thanks again, Ken. After more research and learning I managed to make it work. Though in the end, I settled on simply deleting all .gif images instead of looking for third-party serving because in practice they tend to be hosted locally, even if the link is pointing elsewhere. (But the third-party search can be done, Privoxy provides a variable $host and custom option 'D' to use it.) As an aside, I must say it's not easy finding info on Perl-style regex/grep if you're not actually coding in Perl. For the life of me I haven't been able to find authoritative info on setting delimiters, and was thrown by Privoxy's liberal use of things other than "/". And so on. Will keep learning! Karl
      "As an aside, I must say it's not easy finding info on Perl-style regex/grep if you're not actually coding in Perl. For the life of me I haven't been able to find authoritative info on setting delimiters, and was thrown by Privoxy's liberal use of things other than "/". And so on. Will keep learning!"

      A good place to start would be "perlretut - Perl regular expressions tutorial". This has links to further, relevant information (including more detailed descriptions of the topics covered in the tutorial).

      Here's the documentation for grep.

      If you have questions arising from any of that documentation, feel free to ask but it would probably be better to raise them in a new thread. Also, the guidelines in "How do I post a question effectively?" will help in getting the best answers.

      -- Ken