in reply to regex: seperating parts of non-formatted names
Massaging messy data into formal structures
is often easier if you do not attempt a complete
algorithmic solution. Exploring the data is much of
the problem, so solving it as you explore it is a
possibility.
Often it is easier to solve the problem bit by bit.
Copying out all the two-field lines and solving them
is probably trivial.
This warms you up to do the three field lines, or maybe
you see that the titles are not very varied, and decide
to handle that aspect first.
By the time you get to the hard cases your remaining data
set may be quite small.
The approach I'm proposing is efficient in certain
situations. Your situation may or may not be such.
|
|---|