I don't have problem in making it easier for terrorists to know that we are looking for them - once we've said that publicly.

Update: Often when parsing data, there is not a well defined specification. We have to make guesses or ad-hoc rules based upon what we see. That is just the "real world" and how it is. To the best of my knowledge, the below code runs with the complete downloaded file as well as with my DATA segment.

The desired search term is at the beginning of the record, terminated by "," or "(" with trailing spaces removed.

#!usr/bin/perl -w use strict; # to process the file from: # http://www.treasury.gov/ofac/downloads/sdnlist.txt # this is about a 93K line file # that means that it easily fits into memory # # to get the valid "records" # (1) separate the records based upon them having # an extra \n between them # The records are "paragraphs". # (2) "squeeze" the lines together so that hyphenated # names will get put "back together" # This is needed so that simple searches will work. # (3) Apply hueristics to get rid of the extraneous # records, here a "valid input record": # (a) can't start with [ and must # (b) have a comma or 'a.k.a' in the first 50 characters # (c) get rid of leading ' if it is there # cannot get rid of ' globally because there are # records where this does have meaning. my @records = map {s/^'//;$_} # another hueristic grep{ !/^\s*\[/ and # huerististic (rule-of-thmub) substr ($_,0,50) =~/,|\Qa.k.a.\E/} map{s/\n//g; $_} # squeeze lines back together do { local $/= "\n\n"; (<DATA>)}; # at this point, there are <12K records # from the 93K lines that we started with foreach (@records) { # your regex to select a record could maybe go here.. # also possible to make a translation table # of any name back to one of these records print "$_\n"; } __DATA__ Output from: http://www.treasury.gov/ofac/downloads/sdnlist.txt goes here... ALPHABETICAL LISTING OF SPECIALLY DESIGNATED NATIONALS AND BLOCKED PERSONS ("SDN List"): This publication of Treasury's Office of Foreign Assets Control ("OFAC") is designed as a reference tool providing actual notice of actions by OFAC with respect to Specially Designated Nationals and ...blah... 17 NOVEMBER (a.k.a. EPANASTATIKI ORGANOSI 17 NOEMVRI; a.k.a. REVOLUTIONARY ORGANIZATION 17 NOVEMBER) [FTO] [SDGT] 32 COUNTY SOVEREIGNTY COMMITTEE (a.k.a. 32 COUNTY SOVEREIGNTY MOVEMENT; a.k.a. IRISH REPUBLICAN PRISONERS WELFARE ASSOCIATION; a.k.a. REAL IRA; a.k.a. REAL IRISH REPUBLICAN ARMY; a.k.a. REAL OGLAIGH NA HEIREANN; a.k.a. RIRA) [FTO] [SDGT] 32 COUNTY SOVEREIGNTY MOVEMENT (a.k.a. 32 COUNTY SOVEREIGNTY COMMITTEE; a.k.a. IRISH REPUBLICAN PRISONERS WELFARE ASSOCIATION; a.k.a. REAL IRA; a.k.a. REAL IRISH REPUBLICAN ARMY; a.k.a. REAL OGLAIGH NA HEIREANN; a.k.a. RIRA) [FTO] [SDGT] 101 DAYS CAMPAIGN (a.k.a. CHARITY COALITION; a.k.a. COALITION OF GOOD; a.k.a. ETELAF AL-KHAIR; a.k.a. ETILAFU EL-KHAIR; a.k.a. I'TILAF AL-KHAIR; a.k.a. I'TILAF AL-KHAYR; a.k.a. UNION OF GOOD), P.O. Box 136301, Jeddah 21313, Saudi Arabia [SDGT] ..etc..

In reply to Re^5: Fine tuning a reg exp by Marshall
in thread Fine tuning a reg exp by markjrouse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.