Capturing RegExp Matches

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Capturing RegExp Matches by Abigail-II (Bishop) on Jul 03, 2002 at 14:06 UTC
Parsing email addresses out of text is far from trivial. Any ASCII character can be part of an email address, including NUL characters, white space and control characters. Here are some examples of valid email addresses: `@example.net "\""@foo.bar fred&barny@example.com ---@example.com foo-bar@example.net "127.0.0.1"@[127.0.0.1] Muhammed.(I am the greatest) Ali @(the)Vegas.WBA ':; $@[] ()@[]` [download] As for the general question, "how do I store all strings that a regex matches", just use the regex in list context, with a `/g` modifier. If you have capturing parens in your regex, you'll have to put a set of parens around the whole regex, and filter out the submatches (but it's probably easier to turn the capturing parens into non-capturing). Abigail	[reply] [d/l] [select]
Re: Capturing RegExp Matches by Chady (Priest) on Jul 03, 2002 at 13:29 UTC
Your quest will get harder and harder, unless, you invest your power in any of the Email::* modules you can easily find at CPAN. They are all written and tested by people who know what they are doing, and they will save you a lot of debugging. He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/	[reply]
Re: Capturing RegExp Matches by arturo (Vicar) on Jul 03, 2002 at 13:38 UTC
The general answer to this sort of problem is to use capturing parentheses in your regex in combination with the /g operator: `# assume $regex has been built with qr// and matches what you want my $regex = qr/foobar/; # assuming $data contains all the data you want to scan from my @things_im_interested_in = ( $data =~ /($regex)/g ); # or, if it isn't and you are doing this from multiple data # strings push @things_im_interested_in, $data_chunk =~/($regex)/g;` [download] The /g modifier returns all the matches in the string, so those match operators return a list of the matching parts of the string. However, as a read through the Perl FAQ (or even on this site for, say "Email Address") will reveal, using regexes to match email addresses is a dicey issue in the first place. I mistrust all systematizers and avoid them. The will to a system shows a lack of integrity -- F. Nietzsche	[reply] [d/l]
Re: Capturing RegExp Matches by dda (Friar) on Jul 03, 2002 at 13:27 UTC
Are you talking about something like that: `#!/usr/bin/perl -w use strict; my $page = <<__EOT; jksjdsjk some\@one.com nnbcx jdsjl;'pejbkscd (aaa\@bbb.com) sdkmlsd __EOT my @emails = ($page =~ /\b\S+?\@\S+?\b/gs); foreach (@emails) { print "$_\n"; }` [download] --dda	[reply] [d/l]
Re: Re: Capturing RegExp Matches by Chady (Priest) on Jul 03, 2002 at 13:33 UTC
Your code is choking on it's own data. you are matching the word boundary `.` before the .com and it's getting stripped out. you cannot filter an email address with one simple regex. He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/	[reply] [d/l]
Re: Re: Re: Capturing RegExp Matches by dda (Friar) on Jul 03, 2002 at 13:44 UTC
Ohh, stupid me :) Thanks. --dda	[reply]