Re: Finding plagarized content
by planetscape (Chancellor) on Feb 10, 2009 at 04:45 UTC
|
| [reply] |
|
|
| [reply] |
Re: Finding plagarized content
by GrandFather (Saint) on Feb 10, 2009 at 02:10 UTC
|
Have you thought about what that task entails? Have you any idea at all just how much data is accessible through the web?
Have you a firm set of criteria for what "duplicating" means in this context? Do you mean the whole site? Whole pages? Parts of pages? Little bits of pages? Edited versions of any of those?
You could run (scripted) Google (or similar) searches for key phrases in your content them pass a Mk I eyeball over the result, but otherwise you have a rather interesting task ahead of you!
Perl's payment curve coincides with its learning curve.
| [reply] |
Re: Finding plagarized content
by Your Mother (Archbishop) on Feb 10, 2009 at 05:31 UTC
|
Google generally makes this pretty easy. "Just put a long uncommon string in quotes." Exact duplicates will turn up (sometimes, depends on Google's attention and cache status). You might be able to automate it with a Google API, of which there are at least a couple on the CPAN; like Net::Google. Follow their TOS.
| [reply] |
|
|
The "Similar pages" links in Google listings may also be useful.
| [reply] |
Re: Finding plagarized content
by ww (Archbishop) on Feb 10, 2009 at 02:05 UTC
|
For what kind of values of "anybody?" Globally (as in "anywhere in the known universe")? Selected (aka 'suspect') domains? Some other value?
And in what medium? Web only? Print (hint: this one might be beyond a script's capabilities)?
And what types of content? Are you looking for plagarized text? Pirated images? mp3s? Animated .gifs converted to flash?
Please, ednorton111, read On asking for help and How do I post a question effectively? and refine your question.
That said, if you have modest perl fu, you might want to try searching CPAN with some terms appropriate to your question... and welcome to PM.
| [reply] |
Re: Finding plagarized content
by dsheroh (Monsignor) on Feb 10, 2009 at 11:58 UTC
|
Personally, I'd deal with this by setting up one or more google alerts. Granted, it doesn't involve writing any Perl (at least not on my end), but it saves me from having to choose between either spidering the web myself or violating google's TOS by scraping their search results.
(If you're not familiar with google alerts, it's basically a way of setting up a search to run automatically and email you the top new results as the googlebots find them.) | [reply] |
Re: Finding plagarized content
by Gavin (Archbishop) on Feb 10, 2009 at 11:42 UTC
|
The simple answer is if you don't want to risk your intellectual property being plagiarised don't put it on a website.
| [reply] |
Re: Finding plagarized content
by DrHyde (Prior) on Feb 11, 2009 at 10:18 UTC
|
I have such a script, but the Hollywood MAFIAA have a patent on the algorithm so I can't share it with you. Not because their patent is valid, you understand, but because I can't afford to prove that it's invalid. | [reply] |