If you already have some code, show us please. If not, try the following recipe and if that fails, feel free to come back with a code sample that troubles you.
In case your only problem is to remove duplicate URLs, the following recipe might be helpful:
- open file for reading (1)
- read file line by line, while
- extract URL (hint: chomp, split, perlre)
- print (2) line if URL has never been encountered
(hint: perlfaq4)
- remember that URL has now been seen (hint: $seen{$url})
(1,2): Open another file for writing and print to filehandle unless output to STDOUT is sufficient.
Update: Or perform a Super Search with this query for more inspiration...
Update: In response to code presented below:
- Fix1: Change $file_hash{$key} = $value; to $file_hash{$key} //= $value; to save first match only.
- Fix2: Change for my $key (keys %file_hash) to for my $key (sort keys %file_hash) to
potentially recover original order of entries.
- Better: Have a look at the original hint again. It allows to process the file without saving the whole contents to memory which is an advantage when processing huge files.
- Extra-Hint: perltidy
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.