The regex to match names is somewhat fussy. Many names are hypehnated so it is likely a hyphen should be included in the character set. It may be that some names comprise more than two components. You want to grab the whole name includding white space, but not the white space at the ent of the name - there may be none. Given all that (but ignoring a host of other possible gotchas), here's a solution
#!/usr/bin/perl use strict; use warnings; my %names; my $nextID = 'AAAAAA'; while (<DATA>) { if (/( (?:[A-Z-]{2,} (?:(?=\s+[A-Z-]{2})\s+)?)+ )/x) { my $name = $1; $names{$name} = $nextID++ if ! exists $names{$name}; s/$name/$names{$name}/g; } print; } __DATA__ 7/21/2006 6:22:49 start new visit signin - Requirements Passed 7/21/2006 6:22:49 visitor data captured for JON DOE 7/21/2006 6:22:49 visit record saved for JON DOE 7/21/2006 6:22:49 starting to send print job for JON JOE DOE 7/21/2006 8:25:42 visitor data captured for JANE SMITH 7/21/2006 8:25:43 visit record saved for JANE SMITH 7/21/2006 8:25:43 starting to send print job for JANE-BOB SMITH 7/21/2006 8:25:51 finished with visit sign in for JANE-BOB SMITH
Prints:
7/21/2006 6:22:49 start new visit signin - Requirements Passed 7/21/2006 6:22:49 visitor data captured for AAAAAA 7/21/2006 6:22:49 visit record saved for AAAAAA 7/21/2006 6:22:49 starting to send print job for AAAAAB 7/21/2006 8:25:42 visitor data captured for AAAAAC 7/21/2006 8:25:43 visit record saved for AAAAAC 7/21/2006 8:25:43 starting to send print job for AAAAAD 7/21/2006 8:25:51 finished with visit sign in for AAAAAD
Note that the first part of the regex matches a name component. The second part looks ahead to see if there is white space followed by anotehr name component and captures the white space if there is. Those two parts are repeated for as many times as there are adjacent name components to match.
In reply to Re: Log cleanup script
by GrandFather
in thread Log cleanup script
by abdiel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |