We'll need more information to (ahem!) address this properly. What size are the files you need to manipulate and what is the format of the files? It you have small files, much of the important data can simply be read into memory. If not, you'll have to explore alternatives.

For the sake of argument, we'll assume each line of the file(s) has one email address and nothing else. Use a hash to track the addresses (untested code follows):

#!/usr/bin/perl -w use strict; my $in_file = 'email.log'; my $out_file = 'new_email.log'; # Note that this is written to a *diff +erent* file # so we can go back if we screw up # If that's not good, backup the email + log. open IN, "< $in_file" or die "Can't open $in_file for reading: $!"; open OUT, "> $out_file" or die "Can't open $out_file for writing: $!"; my %address; while (<IN>) { print OUT, $_ if ! $address{ $_ }++; } close OUT or die "Can't close $out_file: $!" close IN or die "Can't close $in_file: $!";

You also might want to check out the Perl Cookbook. There's not a line of original code above. All of it was shamelessly stolen from many hours of enjoying this tome.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.


In reply to (Ovid - hash to control printing) Re: Files by Ovid
in thread Removing duplicate lines from files (was 'Files') by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.