Murali_Newbee has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl monks,

I have to find an numeric id and remove before part and after part of the id.

My file contains:

Something something -something1 something eid- 1234 gkn 12-34_loanmaster

Something :something something6 eid - 4532 gkn 34-21-hostmasfer

eid 762 something something1 something@

etc

etc

desired output:

1234

4532

762

I want to find the 1234 after eid, problem is I cannot find it with position cause eid is anywhere in the line. And the id is not with exact 3/4 digits

I'm very very new to Perl, could someone help me with this?

  • Comment on How to search an substring and eliminate before and after the substring

Replies are listed 'Best First'.
Re: How to search an substring and eliminate before and after the substring
by haj (Vicar) on Jul 26, 2018 at 15:37 UTC

    Hello Murali_Newbee,

    Being very new to Perl has happened to all of us at some time in the past, so welcome to the journey of learning it!

    With Perl, you would usually not remove the parts before and after the id, but simply grab the id from every line. Grabbing interesting stuff is done with "capturing" it by using regular expressions - the starting point to read would be the tutorial at perlretut.

    I highly recommend reading this tutorial because there might be some misinterpretation of your requirement in my suggestion. Save the following code in a file, say test.pl and run it with perl test.pl <your_input_file.

    use 5.014; use strict; use warnings; while (defined (my $line = <STDIN>)) { my ($id) = $line =~ /\beid\s*-?\s*(\d+)/; say $id; }
    Over time, if you get more familiar with Perl, you'll learn a lot of things how this could be made more compact, and in fact, this is one of the problems which can be pretty well solved with a one-liner:
    perl -n -E '/\beid\s*-?\s*(\d+)/; say $1;' your_data_file
    The fineprint of this invocation can be found in perlrun.

      Thank you so much

Re: How to search an substring and eliminate before and after the substring
by QM (Parson) on Jul 26, 2018 at 15:04 UTC
    Try
    my @match = $string =~ m/eid[ -]+(\d+)/;

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: How to search an substring and eliminate before and after the substring
by roboticus (Chancellor) on Jul 26, 2018 at 17:04 UTC

    Murali Newbee:

    When teaching someone programming, I generally try to get them to write down what they want, then ask them a few questions to get them to state what they want in the simplest form possible. Once you go through that exercise a few times, you'll find it easier to do the whole process in your head.

    It seems you've provided enough information, so I'll show you an imaginary dialog:

    MN: I'm trying to get a numeric ID into a variable.

    Robo: OK, if that's the only number on the line, you could try something like:

      if ($var =~ /(\d+)/) { $ID = $1 }

    Robo: Is it the only number on the line?

    MN: No, there could be numbers in several places.

    Robo: OK, then, how can you tell the ID from the other numbers:

    • Is it the first (or second, third, ..., last) one on the line?
    • Does it have a particular number of digits?
    • Does it have a particular suffix or prefix?
    • Is it something else I haven't come up with?

    MN: It's always got "eid-" or "eid -" or "eid - " before it.

    Robo: OK, then you'll want a regular expression to look for "eid", some optional spaces, a hyphen, perhaps some more spaces and then a number, right?

    MN: Yeah, that sounds about right.

    Robo: OK, then, you'll want something like:

    $ cat t.pl use strict; use warnings; my @examples = ( 'Something something -something1 something eid- 1234 gkn 12-34_loa +nmaster', 'Something :something something6 eid - 4532 gkn 34-21-hostmasfer', 'eid 762 something something1 something@', ); for my $v (@examples) { if ($v =~ / eid # Prefix for the ID \s* # might have some spaces (-\s*)? # maybe a hyphen with more spaces (\d+) # has one or more digits /x) { print "Found ID <$2> in <$v>\n"; } } $ perl t.pl Found ID <1234> in <Something something -something1 something eid- 123 +4 gkn 12-34_loanmaster> Found ID <4532> in <Something :something something6 eid - 4532 gkn 34- +21-hostmasfer> Found ID <762> in <eid 762 something something1 something@>

    I frequently find the process of coding to be breaking a problem down into smaller and smaller pieces. Once each piece is small enough, state the problem clearly enough to make it straightforward. From there, convert it into code. As you gain experience in programming, you'll find it easier and easier to do most of the process in your head and just write down the code, as it seems QM, haj and anonymized user 468275, did for you.

    Update: I didn't mean to slight the other respondents, I'm just having a slow morning today.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thank you So much friend, your explination is very understandable, I should also think like you, Your ans helps to many new to Perl people </P

Re: How to search an substring and eliminate before and after the substring (updated)
by AnomalousMonk (Archbishop) on Jul 26, 2018 at 16:26 UTC

    Here's another approach based on the extraction regex used by haj here. The line-by-line while-loop processing approach used in haj's example will scale to handle enormous input files, but if your input files can be guaranteed never to grow larger than, say, a few million lines, it may be easier to "slurp" the data of the entire file into a scalar (i.e., a single string) and process it all at once, as in the example below. (If you are uncertain about the file slurping process, please ask for more info.) This example needs Perl version 5.10+ for the  \K regex operator, but this can easily be worked around.

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; use Data::Dumper qw(Dumper); ;; my $data = qq{Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster\n} . qq{Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer\n} . qq{Do :not capture xeid - 999 gkn 34-21-xxx\n} . qq{Also do :not capture eid999 gkn 34-21-xxx\n} . qq{eid 762 biff bam1 zot@\n} ; print qq{[[$data]] \n}; ;; my $separator = qr{ \s* - \s* | \s+ }xms; ;; my $captured_eids = my @EIDs = $data =~ m{ \b eid $separator \K \d+ }xmsg; ;; if ($captured_eids) { print 'captured EID(s): ', Dumper \@EIDs; } else { print 'no EIDs captured'; } " [[Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer Do :not capture xeid - 999 gkn 34-21-xxx Also do :not capture eid999 gkn 34-21-xxx eid 762 biff bam1 zot@ ]] captured EID(s): $VAR1 = [ '1234', '4532', '762' ];
    Defining  $separator separately allows finer control of this aspect of the match IMHO. Please see perlre, perlretut, and perlrequick. Also see the core module Data::Dumper.

    Update: For pre-5.10 version Perls, in place of the
        m{ \b eid $separator \K \d+ }xmsg
    match regex use the work-around (tested)
        m{ \b eid $separator (\d+) }xmsg
    (no  \K operator).


    Give a man a fish:  <%-{-{-{-<

      Thank you So much friend, the Reg exp worked for me :D </P

Re: How to search an substring and eliminate before and after the substring
by anonymized user 468275 (Curate) on Jul 26, 2018 at 15:15 UTC
    I recommend looking at perlre on perldoc.perl.org first. There are two main operators m/ and s/ for match and substitute. It's a lot to learn, but no time like the present. To take an example in detail: I want to change all digits into X in the string $s:
    $s =~ s/\d/X/g;
    The =~ announces a regex operator, in this case s for substitute. '/' are the most common delimiters. You need two for match and three for substitutions. \d is the digit token, X is literal and the g at the end is for match all occurrences.

    There are lots of tokens and modifiers. In principle a complex matching is achieved simply by concatenating terms together e.g. ^\d+\S requires the \d+ to start at the beginning and the \S would be a non-space after the digits -- so not a digit which would have been consumed by the \d+ term.

    Bon voyage on your journey through perlre!

    One world, one people

      Thank you So much friend </P