regex help

rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm hoping someone can help me/teach me about regex's. I can do basic ones but i'm trying to do something and it's got me quite stumped.
Best way to describe it is with an example so here we go:

#!/usr/bin/perl

use strict;
use warnings;

my @full_authors = ( "Smith, John", "Smith, John Ronald", "Johnson, Ja
+mes", "James, Ray Jack", "Van der Burg, Jon", "O'Neil, Sarah" );
my @authors = ( "Smith J", "Jackson J", "James RJ", "Van der Burg J", 
+"O'Neil S" );

# Results should be:
# Smith J = Smith, John
# Jackson J = Jackson J
# James RJ = James, Ray Jack
# Van der Burg J = Van der Burg, Jon
# O'Neil S = O'Neil, Sarah

foreach my $author (@authors) {

        print "$author = ";

        # Regex rules
        # Last ' ' before all-uppercase word should become ', '
        # Every singular or grouped capital letter
        # (i.e. F or RJ) should become F(.*) or R(.*) J(.*)

        # What I have so far
        $author =~ s/ (\w+?)\p{IsUpper}/, $1\(\.*\)/;
        print "[ $author ] : ";

        if ( my ($match) = ( grep $_ =~ /$author/, @full_authors ) ) {
                $author = $match;
                @full_authors = grep { $_ ne $match } @full_authors;
        } # end-if

        print "$author\n";

} # end-foreach

exit;
[download]

Any help anyone could provide would be much appreciated.

Cheers,
Reagen

Comment on regex help Download Code

Replies are listed 'Best First'.
Re: regex help by dragonchild (Archbishop) on Dec 09, 2004 at 15:03 UTC
No need for a regex. Just create a hash where the keys are your full authors collapsed and your values are the full authors complete name. Then, just do lookups in your hash. If you find something, great! If you don't, oh well. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply]
Re^2: regex help by rsiedl (Friar) on Dec 09, 2004 at 15:11 UTC
Hey Dragonchild, What do you mean by "full authors collapsed"? Update: Think I have understood. You mean collapse the full author names to the abrev. like Smith, James -> Smith J?	[reply]
Re^3: regex help by dragonchild (Archbishop) on Dec 09, 2004 at 15:15 UTC
Exactly. Instead of making things hard for yourself, break the problem down to its component parts and solve it the easy way. :-) Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply]
Re^4: regex help by rsiedl (Friar) on Dec 09, 2004 at 15:18 UTC
Re^5: regex help by dragonchild (Archbishop) on Dec 09, 2004 at 15:27 UTC
Some notes below your chosen depth have not been shown here
Re: regex help by gopalr (Priest) on Dec 10, 2004 at 06:21 UTC
Hi Reagen, Here is simple regexp. Let me know if there is any mistake. No need to use Second array i.e. @author. `#!/usr/bin/perl use strict; use warnings; my @full_authors = ( "Smith, John", "Smith, John Ronald", "Johnson, Ja +mes", "James, Ray Jack", "Van der Burg, Jon", "O'Neil, Sarah" ); foreach my $author(@full_authors) { my $fullauthorname=$author; while ($author=~s#(, .?[A-Z])[a-z]+\s#$1#) { } $author=~s#,##g; print "\n$author = $fullauthorname"; }` [download] Regards, Gopal.R.	[reply] [d/l]
Re: regex help by Animator (Hermit) on Dec 09, 2004 at 15:42 UTC
You can use grep. Some working code: (all you need to do to let this suit your needs is check if @x has elements) Read more... (2 kB) If you have 'Smith, John' and 'Smith, Jack' in your @full_authors-array and you are searching for 'Smith J' then @x-array will have both these elemnts. Update, added the note about duplicates	[reply] [d/l]
Re^2: regex help by Animator (Hermit) on Dec 09, 2004 at 15:49 UTC
To suit your other needs (posted in the reply to the previous message): Make a hash of the full authors, with their literal name. Code you can use for that (I would advice you search yourself first before looking at my solution) Read more... (176 Bytes) If you've done that, then you can repalce the `@full_authors` in the grep method with `keys %my_own_hash`, and then you can remove all the elements that were found (or just the first one) from that hash using the `delete`-function.	[reply] [d/l] [select]
Re^2: regex help by Animator (Hermit) on Dec 09, 2004 at 17:41 UTC
Two other techniques to prevent that an author appears twice in your final result are: Using a hash in which the key is the author's name, and each time you find an author you increment's it value (and compare it) Build a hash of the author's full name and the value the index in the array, and when you find the author name, use that hash to look up the index and set the value of that index (in the array) to a symbol that can't occur in your data (a number, a semicolon, a colon, ...) I guess that both of these techniques will be faster but I'm not really sure of this since I'm too lazy to Benchmark it :) (so if you want to be sure then you should benchmark it yourself)	[reply]
Re^2: regex help by rsiedl (Friar) on Dec 09, 2004 at 16:06 UTC
Thanks heaps Animator. Very descriptive. Now I will try to modify it a little to cope with "Smith J Jr" and see if I've learnt something :) Cheers, Reagen	[reply]
Re: regex help by sasikumar (Monk) on Dec 09, 2004 at 15:07 UTC
Do it with hash table that is the best way	[reply]

Hi Reagen,