Re: How to capture quantified repeats?
by BrowserUk (Patriarch) on Sep 22, 2010 at 19:04 UTC
|
$ham = "spam\tspam\tspam\t\tyam\tclam";;
@jam = split /\t+/, $ham;;
print "@jam";;
spam spam spam yam clam
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to capture quantified repeats?
by kennethk (Abbot) on Sep 22, 2010 at 20:22 UTC
|
After reading other responses and your replies, I offer two solutions:
- Text::CSV and its ilk - You are dealing with large delimited data files, so why not use a module designed to handle those?
If you have a line in memory and you know you want to get the 1st, 4th and 5th terms, why not just grab those terms?
#!/usr/bin/perl
use strict;
use warnings;
my $ham = "spam\tspam\tspam\t\tyam\tclam";
my @jam;
for my $i (0, 3, 4) {
push @jam, $ham =~ /(?:[^\t]*\t){$i}([^\t]*)/;
}
print join("\n", '**', @jam, '**', '');
You could even code that into a single expression if you only wanted to run it once.
| [reply] [d/l] |
Re: How to capture quantified repeats?
by JavaFan (Canon) on Sep 22, 2010 at 19:48 UTC
|
Is there NO way to capture all the matches to a numerically quantified subexpression?
Indeed, there isn't. The number of capture groups is set by the number of capturing parens in the regular expression†.
This should be trivial, yet it seems to be impossible.
I'm glad someone thinks the code dealing with regular expression in perl is trivial. AKAIK, you're unique. I'd think that patches would be more than welcome.
†Actually, since 5.10, the number of capturing parens is an upper bound, due to the (?|) construct.
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to capture quantified repeats?
by TomDLux (Vicar) on Sep 22, 2010 at 20:23 UTC
|
How about using variables to turn confusion to clarity? Better yet, go to the library and borrow Perl Best practices to see why '\A' is better than \$'. spreading out your regex and adding comments, helps, too.
my $ham = "spam\tspam\tspam\tyam\tclam";
my $word = qr{[^\t]+};
my $sep = qr{\t};
my $capture = qr{($word)};
my (@jam ) = ($ham =~ m{\A # enforce beginning of stri
+ng
$word $sep # skip first word and separ
+ator
$word $sep # and the second
$capture $sep # capture next two words
$capture $sep # skip the separators
$word # skip a word
\z # and then it's the end of
+the string
}xms);
print join("\n", '**', @jam, '**', '');
Using the debugger helps, too ... Along the way I noticed your string has a double tab '\t\t', but you only ever accept single tabs; You specify end of string '$', when there's still another word to go.
But you're doing too much work. You could split() on '\t' and select only the components you want. If it's the 3rd % 4th ...
my @jam = ( spit "\t", $ham )[2,3];
If you do need to use a regex, do you need to check whether there is a word after your capture? Do you need to enforce there is nothing after that last word? Simplify your regex for greater happiness.
As Occam said: Entia non sunt multiplicanda praeter necessitatem.
| [reply] [d/l] [select] |
|
|
my $ham = "spam\tspam\tspam\t\tyam\tclam";
my @jam;
$ham =~ s/^[^\t]*\t[^\t]*((?:\t[^\t]*){3})\t[^\t]*$/push @jam,(split "
+\t",$1);$1/eg;
print join("\n", '**', @jam, '**', '');
maybe? | [reply] [d/l] |
Re: How to capture quantified repeats?
by moritz (Cardinal) on Sep 23, 2010 at 07:09 UTC
|
Is there NO way to capture all the matches to a numerically quantified subexpression?
Of course there is:
use v6;
if "abc" ~~ /(.)+/{
say $0.join(", ");
}
In Perl 6, quantifying a capturing group or atom just results in an array of Match objects.
Perl 6 - links to (nearly) everything that is Perl 6.
| [reply] [d/l] |
|
|
Thanks. Not sure whether Perl 6 is an option, but I'll take a look.
| [reply] |
Re: How to capture quantified repeats?
by umasuresh (Hermit) on Sep 22, 2010 at 19:28 UTC
|
You may have better luck changing the code to:
my $ham = "spam\tspam\tspam\t\tyam\tclam";
my @jam = ($ham =~ (m/^[^\t]+\t[^\t]+\t([^\t]+)(\t\t)([^\t]+)\t[^\t]+$
+/));
print join("\n", '**', @jam, '**', '');
prints:
**
spam
yam
**
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: How to capture quantified repeats?
by umasuresh (Hermit) on Sep 22, 2010 at 19:47 UTC
|
If you know which fields you want to extract say for e.g. columns 2,4 in a really large file, you can try
my ($col1, $col2) = (split(/\t/, $ham))[2,4] ;
| [reply] [d/l] |
Re: How to capture quantified repeats?
by james2vegas (Chaplain) on Sep 23, 2010 at 15:18 UTC
|
Sure, you can, but you'd probably need to use an extra variable, or two:
use strict;
use warnings;
my $ham = "spam\tspam\tspam\t\tyam\tclam";
my @foo;
my @jam = ($ham =~ (m/^[^\t]*\t[^\t]*(?:\t([^\t]*)(?{push @foo, $^N}))
+{3}\t[^\t]*$/));
print join("\n", '**', @jam, '**', '');
print join("\n", '**', @foo, '**', '');
You may need to do some variation of the local variable dance as shown in perlretut if backtracking is a concern. | [reply] [d/l] |
|
|
There are a couple of issues with that code, but the OP made it clear he only want a yes/no answer. I was going to post something of the kind, but it's not nearly as good as BrowserUk's solutions.
| [reply] |