comment on

I see that you've arrived at an approach that is working! Great! Sometimes with these things, just getting it done some way is a major hurdle!

For future info, I went ahead and adapted my match global approach to your new data set. Here's the code and then some explanation of the regex follows. I added some single quotes around the values so you could see that there aren't any leading or trailing spaces to clean up.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str2 = "902 M 903 Textmessage 904 PO 905 S 906 VAS 907 10 908 3629
+ 909 85290200429/TYPE=thanos\@test.com 910 NA 911 NA 912 NA 913 NA 91
+4 NA 917 0 918 NA 919 Wed,_01_Feb_2017_19:56:23_GMT 922 NA 923 PO 924
+ NA 925 NA 926 07594d85 927 100 928 20170202035623000+08 929 20170202
+035623000+08 930 NA 931 85260531042/TYPE=thanos2\@test.2.com 932 1 93
+4 258;3259 920 NA 921 NA 935 NA 936 NA 938 NA 939 NA 940 thanos-local
+ 942 NA 944 NA 945 4880 946 NA 948 NA 950 454000000927816 953 NA 954 
+13 955 5.3.0 956 NA 957 07594d85 958 NA 961 13 981 NA 982 0 983 85290
+200429/TYPE=thanos3\@test.3.com 984 Wed,_01_Feb_2017_19:56:23_GMT 985
+ RegularThanos 986 TEST 987 NA 988 NA 991 NA 992 NA 993 NA 994 123456
+789 995 NA 996 NA 997 NA 998 NA 603 0E552E92 602 0 617 NA 618 NA 621 
+NA This is a test line that I want to Capture2 635 NA 636 NA 637 NA 6
+38 NA 639 This is a test line that I want to Capture";

my (%hash)= $str2 =~/(\d{3})\s+(.+?)\s*(?=\d{3}|$)/g; 

foreach my $key ( sort {$a<=>$b}keys %hash )
{
   print "$key => \'$hash{$key}\'\n";
}


__END__
259 => '920 NA'
602 => '0'
603 => '0E'
617 => 'NA'
618 => 'NA'
621 => 'NA This is a test line that I want to Capture2'
629 => '909'
635 => 'NA'
636 => 'NA'
637 => 'NA'
638 => 'NA'
639 => 'This is a test line that I want to Capture'
789 => '995 NA'
816 => '953 NA'
880 => '946 NA'
902 => 'M'
903 => 'Textmessage'
904 => 'PO'
905 => 'S'
906 => 'VAS'
907 => '10'
908 => '3'
910 => 'NA'
911 => 'NA'
912 => 'NA'
913 => 'NA'
914 => 'NA'
917 => '0'
918 => 'NA'
919 => 'Wed,_01_Feb_'
921 => 'NA'
922 => 'NA'
923 => 'PO'
924 => 'NA'
925 => 'NA'
926 => '0'
927 => '100'
928 => '2'
929 => '2'
930 => 'NA'
931 => '8'
932 => '1'
934 => '258;'
935 => 'NA'
936 => 'NA'
938 => 'NA'
939 => 'NA'
940 => 'thanos-local'
942 => 'NA'
944 => 'NA'
945 => '4'
948 => 'NA'
950 => '4'
954 => '13'
955 => '5.3.0'
956 => 'NA'
957 => '0'
958 => 'NA'
961 => '13'
981 => 'NA'
982 => '0'
983 => '8'
984 => 'Wed,_01_Feb_'
985 => 'RegularThanos'
986 => 'TEST'
987 => 'NA'
988 => 'NA'
991 => 'NA'
992 => 'NA'
993 => 'NA'
994 => '1'
996 => 'NA'
997 => 'NA'
998 => 'NA'
[download]

This my (%hash)= $str2 =~/(\d{3})\s+(.+?)\s*(?=\d{3}|$)/g; is of course the key line!

First we start by capturing a sequence of exactly 3 digits. Then throw away any sequence of spaces after those digits. Then we capture a sequence of any characters. The ? in the (.+?) makes this match "non-greedy". Without that, it would gobble up the entire rest of the line! Now comes a tricky part, how to tell the (.+?) to stop grabbing stuff? There might or might not be an unwanted space (at the end of the line, there is no extra space). This is the "real work", (?=\d{3}|$). The ?= means that this is a "look ahead" assertion. We stop grabbing stuff when we see that either a sequence of exactly 3 digits or end of string is coming up next. Although this expression is in parens(), it does not "capture" anything - it actually throws any matching stuff away once it is satisfied that the condition is true. Its like these trailing 3 digits never happened. When the /g (global) modifier kicks in, those 3 digits that caused us to stop will wind up getting matched by the first capture group at the beginning of the regex (the 3 consecutive digits).

Anyway, it is possible to "look ahead" to see what would happen and use that as a basis to stop capturing the previous "match almost anything" match.

In reply to Re^3: How to match and extract string after exact 3 digits [RESOLVED] by Marshall
in thread How to match and extract string after exact 3 digits [RESOLVED] by thanos1983

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.