imlou has asked for the wisdom of the Perl Monks concerning the following question:

If I have a large file and I wanna grab only the names and emails in the lines that contain, FROM: || TO: at the beginning. Since in those lines it contains a name, email how would I split them into a hash? Would it be
my %name_emails = map {something} split
Or is there another more efficient way to do it?

Replies are listed 'Best First'.
Re: grabbing certain lines
by Wonko the sane (Curate) on Nov 11, 2002 at 17:49 UTC
    Something like this should do the trick.

    my %names_emails = map { /^(?:From|To): (\S+) (\S+)/ } @data;
    You may have to change the two '(\S+)' in the regex to better match your specific data format,
    but this should grab whatever is captured in the last two capturing paren groups and use those
    as key value pairs to populate the hash.

    Best Regards,
    Wonko
      This method assumes you're pulling the entire file into @data.

      If you don't want to do that, filter the lines first:
      my @lines = grep { /^(From|To):/ } <FILE>;

        Well this is what I did
        if(/^(((From:)|(To:)|(Reply-To:)).+)/){ print "$1\n";
        This give me the whole line. i.e. From: misterbob <bob@mister.com> But now I want to get rid of the From:, To: or what ever is before the ":" and keep misterbob and <bob@moster.com>. I tried splitting on ":" but that didn't seem to work for me. Please advise. Thank you.

        I belive you'll find that still reads the entire file in before running the grep - so filtering the file first doesn't really give you anything.

        If the size of the file is an issue you'll want to use a loop to go over it line by line. Something like this should do it:

        my %name2email; while (<DATA>) { next unless m/^(?:To|From|Reply-To): (\S+)\s+<?([^\s>]+)/; $name2email{$1} = $2; };

        Note: the regex doesn't cope with a fair bit (multiple addresses, addresses without names, etc.).

Re: grabbing certain lines
by Wonko the sane (Curate) on Nov 12, 2002 at 00:51 UTC
    I still think that using a regex to break the data up into the two parts you want,
    and catching them in a hash, is the easiest way to achieve what you are asking for.

    Orig regex beefed up slightly.

    #!/usr/local/bin/perl -w use strict; use Data::Dumper; my %names_emails = map { /^(?:To|From|Reply-To): (\S+)\s+<?([^\s>]+)/ +} <DATA>; print Dumper( \%names_emails ); __DATA__ To: Bob <test@spam.com> From: lucy test@wherever.c0m Reply-To: somebody somebody@spam.com To: Bob2 <test2@spam.com>

    Output:
    :!./test.pl $VAR1 = { 'somebody' => 'somebody@spam.com', 'lucy' => 'test@wherever.c0m', 'Bob2' => 'test2@spam.com', 'Bob' => 'test@spam.com' };

    Disagree? Am I missing what you are looking for?