in reply to Re^2: regex help
in thread regex help

Exactly. Instead of making things hard for yourself, break the problem down to its component parts and solve it the easy way. :-)

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^4: regex help
by rsiedl (Friar) on Dec 09, 2004 at 15:18 UTC
    but that wont be able to deal with authors who have the same initials will it?
    i.e.
    # Smith, Jack $hash{'Smith J'} = "Smith, Jack"; # Smith, John $hash{'Smith J'} = "Smith, John";
    Jack gets lost...
      A very good point which also illustrates a problem with your current scheme. How do you determine, from "Smith J", if it's supposed to be "Smith, Jack" or "Smith, John"?

      A possible solution would be to have each abbreviation go to an arrayref of all possible authors that match that abbreviation. Then, if there's only one, happy day. If there's more than one, the program punts back to the user.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        the current scheme will assign the first "Smith J" - "Smith, Jack" and then remove him from the full authors array, meaning that when the next "Smith J" comes around in @authors, he will get "Smith, John".

        Unfortunately, I cant really "punt it back to the user" :) as its a script running over a database of several million authors and we dont want to have to worry about manual input.

        Update: Oh and I should mention that the same author cant appear twice in @authors, so if there are two "Smith J"'s they must be different authors.