The regex to match names is somewhat fussy. Many names are hypehnated so it is likely a hyphen should be included in the character set. It may be that some names comprise more than two components. You want to grab the whole name includding white space, but not the white space at the ent of the name - there may be none. Given all that (but ignoring a host of other possible gotchas), here's a solution

#!/usr/bin/perl use strict; use warnings; my %names; my $nextID = 'AAAAAA'; while (<DATA>) { if (/( (?:[A-Z-]{2,} (?:(?=\s+[A-Z-]{2})\s+)?)+ )/x) { my $name = $1; $names{$name} = $nextID++ if ! exists $names{$name}; s/$name/$names{$name}/g; } print; } __DATA__ 7/21/2006 6:22:49 start new visit signin - Requirements Passed 7/21/2006 6:22:49 visitor data captured for JON DOE 7/21/2006 6:22:49 visit record saved for JON DOE 7/21/2006 6:22:49 starting to send print job for JON JOE DOE 7/21/2006 8:25:42 visitor data captured for JANE SMITH 7/21/2006 8:25:43 visit record saved for JANE SMITH 7/21/2006 8:25:43 starting to send print job for JANE-BOB SMITH 7/21/2006 8:25:51 finished with visit sign in for JANE-BOB SMITH

Prints:

7/21/2006 6:22:49 start new visit signin - Requirements Passed 7/21/2006 6:22:49 visitor data captured for AAAAAA 7/21/2006 6:22:49 visit record saved for AAAAAA 7/21/2006 6:22:49 starting to send print job for AAAAAB 7/21/2006 8:25:42 visitor data captured for AAAAAC 7/21/2006 8:25:43 visit record saved for AAAAAC 7/21/2006 8:25:43 starting to send print job for AAAAAD 7/21/2006 8:25:51 finished with visit sign in for AAAAAD

Note that the first part of the regex matches a name component. The second part looks ahead to see if there is white space followed by anotehr name component and captures the white space if there is. Those two parts are repeated for as many times as there are adjacent name components to match.


DWIM is Perl's answer to Gödel

In reply to Re: Log cleanup script by GrandFather
in thread Log cleanup script by abdiel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.