The regular expression you choose for any situation depends a great deal on
how much you can trust your data to follow a pattern. Just for some examples
(untested but should serve to illustrate):
- Your original attempt is vague (it matches 'ABCD xyz 123') but works better
if modified slightly by adding an anchor to the front, or
a word boundary if you don't want to be stuck to that position:
/\b\w{3}\s+\w{3}\s+\d+/
- If you are certain that the date always starts the line, then the split is certainly a nice option as Trimbach said.
- If you want to be more certain that you get a real date, you could do something like:
/(Sun|Mon|Tue|Wed|Thu|Fri|Sat)\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{1,2}/i
with the i option used if you can't trust the case of the letters. The alternations in this example will
make it very slow however, so if you use the line a lot, that may cause problems. This should find a
valid date anywhere in the line (anchor it with ^ if you don't want that as Trimbach said), but it will match cases which are not followed by the time, timezone and year, so you
might want to extend the regex to match them also for an extra validity check. Even with all that specificity, this will
still match "Wed Mar 98" which clearly isn't a date. To fix
that, the numeric match could be changed to ([012]?[0-9]|[3][0-1]) but this is
getting pretty messy!
- For another more reasonable regex, but less precise, try:
/[A-Z][a-z]{2}\s+[A-Z][a-z]{2}\s+\d{1,2}/
or the slightly more specific but definitely funny looking
/[SMTWF][uoehra][neduit]\s+[JFMASOND][aepuco][nbrylgtvc]\s+\d{1,2}/i
So in summary, a regex will just match what you are telling it to look for (if present), which may
very well
not be a date. It may be wise to do a validation after the match, using something like
Time::ParseDate, in
which case you can choose a much simpler less-specific regex.
--
I'd like to be able to assign to an luser
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.