I've tried to simplify the file I want to parse. I hope I didn't over simplify it to the point I negated my dilemma:
RECORD 1
###### Full Name 1a
Street Address 1a
City 1a ST1a Zip_1a
+ COUNTY 1a
0######## Full Name 1b
abcABCabc 99/99/9999 Street Address 1b
City 1b ST1b Zip_1b
+ COUNTY 1b
RECORD 2
############ Full Name 2a
99/99/9999 Street Address 2a
City 2a ST2a Zip_2a
+ COUNTY 2a
0### Full Name 2b
abcABCabc 99/99/9999 Street Address 2b
City 2b ST2b Zip_2b
+ COUNTY 2b
Notice a few things:
1) The # signs are actually digits
2) Certain lines may be prefixed by an erroneous '0' due to cobalt outputs
3) The two dates are different inputs (sometimes they appear sometimes they don't)
4) There are intricacies that make it not possible to do this with a fixed width grab.
So the following is some code that's pulling the data and storing it in variables. Note: this is in a loop and everything is set up correctly there are so many other lines I didn't include in the file above, and all the variables are storing correctly, it's just the second RegEx inside the if-statement that I'm slipping on.
(variable names and code modified for simplicity)
if ($array[$line] =~ /0?.*?(RECORD .*)/){
$record = trim($1); # works correctly
$array[$line+1] =~ /(\d+)(.*)/;
$id = trim($1); # works correctly
$name = trim($2); # works correctly
# still looking at the "a" lines, sometimes there's a date, sometim
+es no date
$array[$line+2] =~ /.*?(\d{2}\/\d{2}\/\d{4})?(.*)/;
$date = trim($1); # when no date it's using the previous $1 that
+ goes into $id
$address = trim($2); # when no date it's using the previous $2 that
+ goes into $name
... code continues ...
Please understand that this is my best attempt of simplifying my code and the program is a little more intense than I'm able to show you. While I welcome best practices, keep in mind that they may already be in place --- and know I appreciate your help (as always).
Update:
I've deduced to the fact that the second '?' after the pattern that looks for the date is not working how I'd like it to.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.