How to search an substring and eliminate before and after the substring

Murali_Newbee has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to search an substring and eliminate before and after the substring by haj (Vicar) on Jul 26, 2018 at 15:37 UTC
Hello Murali_Newbee, Being very new to Perl has happened to all of us at some time in the past, so welcome to the journey of learning it! With Perl, you would usually not remove the parts before and after the id, but simply grab the id from every line. Grabbing interesting stuff is done with "capturing" it by using regular expressions - the starting point to read would be the tutorial at perlretut. I highly recommend reading this tutorial because there might be some misinterpretation of your requirement in my suggestion. Save the following code in a file, say `test.pl` and run it with `perl test.pl <your_input_file`. `use 5.014; use strict; use warnings; while (defined (my $line = <STDIN>)) { my ($id) = $line =~ /\beid\s-?\s(\d+)/; say $id; }` [download] Over time, if you get more familiar with Perl, you'll learn a lot of things how this could be made more compact, and in fact, this is one of the problems which can be pretty well solved with a one-liner: `perl -n -E '/\beid\s-?\s(\d+)/; say $1;' your_data_file` [download] The fineprint of this invocation can be found in perlrun.	[reply] [d/l] [select]
Re^2: How to search an substring and eliminate before and after the substring by Murali_Newbee (Novice) on Jul 27, 2018 at 08:02 UTC
Thank you so much	[reply]
Re: How to search an substring and eliminate before and after the substring by QM (Parson) on Jul 26, 2018 at 15:04 UTC
Try `my @match = $string =~ m/eid[ -]+(\d+)/;` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re: How to search an substring and eliminate before and after the substring by roboticus (Chancellor) on Jul 26, 2018 at 17:04 UTC
Murali Newbee: When teaching someone programming, I generally try to get them to write down what they want, then ask them a few questions to get them to state what they want in the simplest form possible. Once you go through that exercise a few times, you'll find it easier to do the whole process in your head. It seems you've provided enough information, so I'll show you an imaginary dialog: MN: I'm trying to get a numeric ID into a variable. Robo: OK, if that's the only number on the line, you could try something like: `if ($var =~ /(\d+)/) { $ID = $1 }` Robo: Is it the only number on the line? MN: No, there could be numbers in several places. Robo: OK, then, how can you tell the ID from the other numbers: Is it the first (or second, third, ..., last) one on the line? Does it have a particular number of digits? Does it have a particular suffix or prefix? Is it something else I haven't come up with? MN: It's always got "eid-" or "eid -" or "eid - " before it. Robo: OK, then you'll want a regular expression to look for "eid", some optional spaces, a hyphen, perhaps some more spaces and then a number, right? MN: Yeah, that sounds about right. Robo: OK, then, you'll want something like: $ cat t.pl use strict; use warnings; my @examples = ( 'Something something -something1 something eid- 1234 gkn 12-34_loa +nmaster', 'Something :something something6 eid - 4532 gkn 34-21-hostmasfer', 'eid 762 something something1 something@', ); for my $v (@examples) { if ($v =~ / eid # Prefix for the ID \s* # might have some spaces (-\s)? # maybe a hyphen with more spaces (\d+) # has one or more digits /x) { print "Found ID <$2> in <$v>\n"; } } $ perl t.pl Found ID <1234> in <Something something -something1 something eid- 123 +4 gkn 12-34_loanmaster> Found ID <4532> in <Something :something something6 eid - 4532 gkn 34- +21-hostmasfer> Found ID <762> in <eid 762 something something1 something@> [download] I frequently find the process of coding to be breaking a problem down into smaller and smaller pieces. Once each piece is small enough, state the problem clearly enough to make it straightforward. From there, convert it into code. As you gain experience in programming, you'll find it easier and easier to do most of the process in your head and just write down the code, as it seems QM, haj and anonymized user 468275, did for you. Update: I didn't mean to slight the other respondents, I'm just having a slow morning today. ...roboticus When your only tool is a hammer, all problems look like your thumb.*	[reply] [d/l] [select]
Re^2: How to search an substring and eliminate before and after the substring by Murali_Newbee (Novice) on Jul 27, 2018 at 08:07 UTC
Thank you So much friend, your explination is very understandable, I should also think like you, Your ans helps to many new to Perl people </P	[reply]
Re: How to search an substring and eliminate before and after the substring (updated) by AnomalousMonk (Archbishop) on Jul 26, 2018 at 16:26 UTC
Here's another approach based on the extraction regex used by haj here. The line-by-line `while`-loop processing approach used in haj's example will scale to handle enormous input files, but if your input files can be guaranteed never to grow larger than, say, a few million lines, it may be easier to "slurp" the data of the entire file into a scalar (i.e., a single string) and process it all at once, as in the example below. (If you are uncertain about the file slurping process, please ask for more info.) This example needs Perl version 5.10+ for the `\K` regex operator, but this can easily be worked around. c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; use Data::Dumper qw(Dumper); ;; my $data = qq{Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster\n} . qq{Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer\n} . qq{Do :not capture xeid - 999 gkn 34-21-xxx\n} . qq{Also do :not capture eid999 gkn 34-21-xxx\n} . qq{eid 762 biff bam1 zot@\n} ; print qq{[[$data]] \n}; ;; my $separator = qr{ \s* - \s* \| \s+ }xms; ;; my $captured_eids = my @EIDs = $data =~ m{ \b eid $separator \K \d+ }xmsg; ;; if ($captured_eids) { print 'captured EID(s): ', Dumper \@EIDs; } else { print 'no EIDs captured'; } " [[Foo bar -baz boff eid- 1234 gkn 12-34_loanmaster Fizz :faz foz6 eid - 4532 gkn 34-21-hostmasfer Do :not capture xeid - 999 gkn 34-21-xxx Also do :not capture eid999 gkn 34-21-xxx eid 762 biff bam1 zot@ ]] captured EID(s): $VAR1 = [ '1234', '4532', '762' ]; [download] Defining `$separator` separately allows finer control of this aspect of the match IMHO. Please see perlre, perlretut, and perlrequick. Also see the core module Data::Dumper. Update: For pre-5.10 version Perls, in place of the `m{ \b eid $separator \K \d+ }xmsg` match regex use the work-around (tested) `m{ \b eid $separator (\d+) }xmsg` (no `\K` operator). Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: How to search an substring and eliminate before and after the substring (updated) by Murali_Newbee (Novice) on Jul 27, 2018 at 08:04 UTC
Thank you So much friend, the Reg exp worked for me :D </P	[reply]
Re: How to search an substring and eliminate before and after the substring by anonymized user 468275 (Curate) on Jul 26, 2018 at 15:15 UTC
I recommend looking at perlre on perldoc.perl.org first. There are two main operators m/ and s/ for match and substitute. It's a lot to learn, but no time like the present. To take an example in detail: I want to change all digits into X in the string $s: `$s =~ s/\d/X/g;` [download] The =~ announces a regex operator, in this case s for substitute. '/' are the most common delimiters. You need two for match and three for substitutions. \d is the digit token, X is literal and the g at the end is for match all occurrences. There are lots of tokens and modifiers. In principle a complex matching is achieved simply by concatenating terms together e.g. ^\d+\S requires the \d+ to start at the beginning and the \S would be a non-space after the digits -- so not a digit which would have been consumed by the \d+ term. Bon voyage on your journey through perlre! One world, one people	[reply] [d/l]
Re^2: How to search an substring and eliminate before and after the substring by Murali_Newbee (Novice) on Jul 27, 2018 at 08:03 UTC
Thank you So much friend </P	[reply]