I see that you've arrived at an approach that is working! Great! Sometimes with these things, just getting it done some way is a major hurdle!

For future info, I went ahead and adapted my match global approach to your new data set. Here's the code and then some explanation of the regex follows. I added some single quotes around the values so you could see that there aren't any leading or trailing spaces to clean up.

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $str2 = "902 M 903 Textmessage 904 PO 905 S 906 VAS 907 10 908 3629 + 909 85290200429/TYPE=thanos\@test.com 910 NA 911 NA 912 NA 913 NA 91 +4 NA 917 0 918 NA 919 Wed,_01_Feb_2017_19:56:23_GMT 922 NA 923 PO 924 + NA 925 NA 926 07594d85 927 100 928 20170202035623000+08 929 20170202 +035623000+08 930 NA 931 85260531042/TYPE=thanos2\@test.2.com 932 1 93 +4 258;3259 920 NA 921 NA 935 NA 936 NA 938 NA 939 NA 940 thanos-local + 942 NA 944 NA 945 4880 946 NA 948 NA 950 454000000927816 953 NA 954 +13 955 5.3.0 956 NA 957 07594d85 958 NA 961 13 981 NA 982 0 983 85290 +200429/TYPE=thanos3\@test.3.com 984 Wed,_01_Feb_2017_19:56:23_GMT 985 + RegularThanos 986 TEST 987 NA 988 NA 991 NA 992 NA 993 NA 994 123456 +789 995 NA 996 NA 997 NA 998 NA 603 0E552E92 602 0 617 NA 618 NA 621 +NA This is a test line that I want to Capture2 635 NA 636 NA 637 NA 6 +38 NA 639 This is a test line that I want to Capture"; my (%hash)= $str2 =~/(\d{3})\s+(.+?)\s*(?=\d{3}|$)/g; foreach my $key ( sort {$a<=>$b}keys %hash ) { print "$key => \'$hash{$key}\'\n"; } __END__ 259 => '920 NA' 602 => '0' 603 => '0E' 617 => 'NA' 618 => 'NA' 621 => 'NA This is a test line that I want to Capture2' 629 => '909' 635 => 'NA' 636 => 'NA' 637 => 'NA' 638 => 'NA' 639 => 'This is a test line that I want to Capture' 789 => '995 NA' 816 => '953 NA' 880 => '946 NA' 902 => 'M' 903 => 'Textmessage' 904 => 'PO' 905 => 'S' 906 => 'VAS' 907 => '10' 908 => '3' 910 => 'NA' 911 => 'NA' 912 => 'NA' 913 => 'NA' 914 => 'NA' 917 => '0' 918 => 'NA' 919 => 'Wed,_01_Feb_' 921 => 'NA' 922 => 'NA' 923 => 'PO' 924 => 'NA' 925 => 'NA' 926 => '0' 927 => '100' 928 => '2' 929 => '2' 930 => 'NA' 931 => '8' 932 => '1' 934 => '258;' 935 => 'NA' 936 => 'NA' 938 => 'NA' 939 => 'NA' 940 => 'thanos-local' 942 => 'NA' 944 => 'NA' 945 => '4' 948 => 'NA' 950 => '4' 954 => '13' 955 => '5.3.0' 956 => 'NA' 957 => '0' 958 => 'NA' 961 => '13' 981 => 'NA' 982 => '0' 983 => '8' 984 => 'Wed,_01_Feb_' 985 => 'RegularThanos' 986 => 'TEST' 987 => 'NA' 988 => 'NA' 991 => 'NA' 992 => 'NA' 993 => 'NA' 994 => '1' 996 => 'NA' 997 => 'NA' 998 => 'NA'
This my (%hash)= $str2 =~/(\d{3})\s+(.+?)\s*(?=\d{3}|$)/g; is of course the key line!

First we start by capturing a sequence of exactly 3 digits. Then throw away any sequence of spaces after those digits. Then we capture a sequence of any characters. The ? in the (.+?) makes this match "non-greedy". Without that, it would gobble up the entire rest of the line! Now comes a tricky part, how to tell the (.+?) to stop grabbing stuff? There might or might not be an unwanted space (at the end of the line, there is no extra space). This is the "real work", (?=\d{3}|$). The ?= means that this is a "look ahead" assertion. We stop grabbing stuff when we see that either a sequence of exactly 3 digits or end of string is coming up next. Although this expression is in parens(), it does not "capture" anything - it actually throws any matching stuff away once it is satisfied that the condition is true. Its like these trailing 3 digits never happened. When the /g (global) modifier kicks in, those 3 digits that caused us to stop will wind up getting matched by the first capture group at the beginning of the regex (the 3 consecutive digits).

Anyway, it is possible to "look ahead" to see what would happen and use that as a basis to stop capturing the previous "match almost anything" match.


In reply to Re^3: How to match and extract string after exact 3 digits [RESOLVED] by Marshall
in thread How to match and extract string after exact 3 digits [RESOLVED] by thanos1983

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.