in reply to GREP Question: Filtering out third-party images with Privoxy
G'day karld12,
Welcome to the monastery.
I'm not familiar with Privoxy; however, looking at its User Manual, I think this regex will probably do what you want:
s/\s*<img.*src="[^"]*(?<!\/someimage)\.jpg".*>//gm
s/\s*<img.*src="http:\/\/(?!images\.google\.com\/)[^>]+>//gm
Update: My apologies. I originally focussed on the image name but, on rereading your question, I see you want to exclude domains. My original solution is in the spoiler; a more appropriate solution folllows.
Here's my test:
#!/usr/bin/env perl use strict; use warnings; my $html_fragment = <<'END_HTML'; <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> END_HTML print "Initial markup:\n"; print $html_fragment; $html_fragment =~ s/\s*<img.*src="[^"]*(?<!\/someimage)\.jpg".*>//gm; print "Modified markup:\n"; print $html_fragment;
Output:
Initial markup: <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> Modified markup: <img src="http://images.google.com/someimage.jpg" />
Here's my test:
#!/usr/bin/env perl use strict; use warnings; my $html_fragment = <<'END_HTML'; <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> END_HTML print "Initial markup:\n"; print $html_fragment; $html_fragment =~ s/\s*<img.*src="http:\/\/(?!images\.google\.com\/)[^ +>]+>//gm; print "Modified markup:\n"; print $html_fragment;
Output:
Initial markup: <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" /> <img src="http://google.somesite.org/image.jpg" /> <img src="http://somesite.net/google/image.jpg" /> <img src="http://anythingelse.com/etc.jpg" /> Modified markup: <img src="http://images.google.com/someimage.jpg" /> <img src="http://images.google.com/NOTsomeimage.jpg" />
If this doesn't work for you, please provide an example of the HTML and indicate actual and expected output.
-- Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: GREP Question: Filtering out third-party images with Privoxy
by karld12 (Initiate) on Jan 22, 2014 at 13:51 UTC | |
by kcott (Archbishop) on Jan 22, 2014 at 14:15 UTC | |
by karld12 (Initiate) on Jan 24, 2014 at 10:14 UTC | |
by kcott (Archbishop) on Jan 24, 2014 at 11:02 UTC |