comment on

Here's another approach based on the extraction regex used by haj here. The line-by-line while-loop processing approach used in haj's example will scale to handle enormous input files, but if your input files can be guaranteed never to grow larger than, say, a few million lines, it may be easier to "slurp" the data of the entire file into a scalar (i.e., a single string) and process it all at once, as in the example below. (If you are uncertain about the file slurping process, please ask for more info.) This example needs Perl version 5.10+ for the \K regex operator, but this can easily be worked around.

c:\@Work\Perl\monks>perl -wMstrict -le
"use 5.010;
 ;;
 use Data::Dumper qw(Dumper);
 ;;
 my $data = qq{Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster\n}
          . qq{Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer\n}
          . qq{Do :not capture xeid - 999 gkn 34-21-xxx\n}
          . qq{Also do :not capture eid999 gkn 34-21-xxx\n}
          . qq{eid 762 biff bam1 zot@\n}
          ;
 print qq{[[$data]] \n};
 ;;
 my $separator = qr{ \s* - \s* | \s+ }xms;
 ;;
 my $captured_eids =
 my @EIDs = $data =~ m{ \b eid $separator \K \d+ }xmsg;
 ;;
 if ($captured_eids) {
   print 'captured EID(s): ', Dumper \@EIDs;
   }
 else {
   print 'no EIDs captured';
   }
"
[[Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster
Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer
Do :not capture xeid - 999 gkn 34-21-xxx
Also do :not capture eid999 gkn 34-21-xxx
eid 762 biff bam1 zot@
]]

captured EID(s): $VAR1 = [
          '1234',
          '4532',
          '762'
        ];
[download]

Defining $separator separately allows finer control of this aspect of the match IMHO. Please see perlre, perlretut, and perlrequick. Also see the core module Data::Dumper.

Update: For pre-5.10 version Perls, in place of the
m{ \b eid $separator \K \d+ }xmsg
match regex use the work-around (tested)
m{ \b eid $separator (\d+) }xmsg
(no \K operator).

Give a man a fish: <%-{-{-{-<

In reply to Re: How to search an substring and eliminate before and after the substring (updated) by AnomalousMonk
in thread How to search an substring and eliminate before and after the substring by Murali_Newbee

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.