in reply to Re^3: Reg Exp to handle variations in the matched pattern
in thread Reg Exp to handle variations in the matched pattern
Ultimately, I'm looking to ascertain how Perl could parse this file to turn it into a structured format. My first thoughts are to update the file using Perl to break out elements that would then constitute a line, then import these lines into a database to extract out the various fields. Unless, Perl is able to do this better and more efficiently. I'm still learning Perl, so I'm sure there is a better way of doing it in Perl.
If you look at that file, from line 27:
Licensing at 202/622-2480. The following changes have occurred with respect to the Office of Foreign Assets Control Listing of Specially Designated Nationals and Blocked Persons since January 1, 2002: 01/09/02: The following have been named as "Specially Designated Global Terrorists" [SDGTs] -
There are two distinct patterns that I'm trying to match here, hence my original regexp (\s-\r)|(:\r). After the "January 1,2002:" text is a cariage return, line feed x2. Hex values 0D 0A 0D 0A. I'm looking to insert a string between ":" and the cariage return. So the first pattern is /(:)\r\n\r\n/ Therefore, my substuition code is this
s/(:)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working
It may be my hex/text editor, but It tells me there are lots of carriage returns in this data.
The second pattern is after the "01/09/02: The following have been named as "Specially Designated Global Terrorists" SDGTs -" text, where the dash at the end is proceeded by a space, and followed by a carriage return, new line feed x2, so my match regexp is /(\s-)\r\n\r\n/ Therefore, my substuition code is this s/(\s-)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working
The subsequent result would be:
Licensing at 202/622-2480. The following changes have occurred with respect to the Office of Foreign Assets Control Listing of Specially Designated Nationals and Blocked Persons since January 1, 2002:$$ 01/09/02: The following have been named as "Specially Designated Global Terrorists" [SDGTs] -$$
sorry for it not being much clearer. It's a bit difficult to explain.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Reg Exp to handle variations in the matched pattern
by bitingduck (Deacon) on Feb 23, 2012 at 06:33 UTC | |
by markjrouse (Initiate) on Feb 23, 2012 at 10:48 UTC | |
by bitingduck (Deacon) on Feb 23, 2012 at 17:03 UTC |