in reply to Re^3: REGEX for date
in thread REGEX for date

guess it would work like so:
if($line=~m/\s+(January|Febuary|March|April|May|June|July|August|Septe +mber|October|November|December\s+\d+,\s+\d+)/i) {$res->{fiscal +_year_ended} =$1;}
Now what happens when the date has some ASCII characters, like: December  30, 2015 I really appreciate your help!!

7 Mar 2017 Athanasius added code tags to display the  

Replies are listed 'Best First'.
Re^5: REGEX for date
by huck (Prior) on Mar 06, 2017 at 21:25 UTC

    Are you after this line?

    FISCAL YEAR END: 1231
    That looks nothing like your example, but is the only fiscal i find in my test list of @aonly i used.

    and remember when i said i was concerned with qr/\'\n'/, im even more concerned now, i suggest replacing

    for my $line (split qr/\'\n'/, $partial) {
    with
    for my $line (split qr/\n/, $partial) {
    That splits on pure line endings and doesnt require a quote before and after the \n. it makes adding a line like
    if($line=~m/\s*fiscal/mi ) {print $line."\n";}
    much more manageable. as far as i can see it doesnt affect your code, except it may make the m in /m unnecessary

      A few more lines down from the line you mention is what I'm after. The following is working pretty well, but missing a few hits here and there (i.e., no result):
      if($line=~m/\s+((January|Febuary|March|April|May|June|July|August|Sept +ember|October|November|December)\s+\d+,\s+\d+)/i) {$res->{fisc +al_year_ended} =$1;}
      But I'm getting there. Grateful for your help!!!!

        it may help if you paste me some example lines from @aonly that exhibit the lines you are after, my test group was just pulled from a simple edgar search. This page https://www.sec.gov/Archives/edgar/data/1540531/0000905718-16-001254.txt does not exhibit lines of that type

        my regex will capture anything that looks like a date, it may not be involved in the fiscal year end

        Edit: for instance https://www.sec.gov/Archives/edgar/data/1084869/0001437749-16-024828.txt shows a date match on the line

        <P id=PARA12 style="MARGIN-BOTTOM: 0px; TEXT-ALIGN: center; MARGIN-TOP +: 0px; LINE-HEIGHT: 1.25"><FONT style="FONT-SIZE: 10pt; FONT-FAMILY: +Times New Roman, Times, serif"><B>For the quarterly period ended Dece +mber 27, 2015</B></FONT></P>

        and i repeat my concern about qr/\'\n'/ should be qr/\n/. without that change the lines can be very long and my regex has no ^ for the /m to anchor on

      I can work it from here huck!! Thank you so much!!!!