comment on

Ultimately, I'm looking to ascertain how Perl could parse this file to turn it into a structured format. My first thoughts are to update the file using Perl to break out elements that would then constitute a line, then import these lines into a database to extract out the various fields. Unless, Perl is able to do this better and more efficiently. I'm still learning Perl, so I'm sure there is a better way of doing it in Perl.

If you look at that file, from line 27:

Licensing at 202/622-2480.  The following changes have occurred
with respect to the Office of Foreign Assets Control Listing of
Specially Designated Nationals and Blocked Persons since January 1,
2002:

01/09/02:      The following have been named as "Specially
Designated Global Terrorists" [SDGTs] -
[download]

There are two distinct patterns that I'm trying to match here, hence my original regexp (\s-\r)|(:\r). After the "January 1,2002:" text is a cariage return, line feed x2. Hex values 0D 0A 0D 0A. I'm looking to insert a string between ":" and the cariage return. So the first pattern is /(:)\r\n\r\n/ Therefore, my substuition code is this

s/(:)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working

It may be my hex/text editor, but It tells me there are lots of carriage returns in this data.

The second pattern is after the "01/09/02: The following have been named as "Specially Designated Global Terrorists" SDGTs -" text, where the dash at the end is proceeded by a space, and followed by a carriage return, new line feed x2, so my match regexp is /(\s-)\r\n\r\n/ Therefore, my substuition code is this s/(\s-)\r\n\r\n/\1\$\$\n/g but of course this insertion is not working

The subsequent result would be:

Licensing at 202/622-2480.  The following changes have occurred
with respect to the Office of Foreign Assets Control Listing of
Specially Designated Nationals and Blocked Persons since January 1,
2002:$$
01/09/02:      The following have been named as "Specially
Designated Global Terrorists" [SDGTs] -$$
[download]

sorry for it not being much clearer. It's a bit difficult to explain.

In reply to Re^4: Reg Exp to handle variations in the matched pattern by markjrouse
in thread Reg Exp to handle variations in the matched pattern by markjrouse

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.