How about this loop?
foreach my $filename (sort keys %mycorpus) { my $titles = ''; my $counter = 0; while ($mycorpus{$filename} =~ /title:#(.*?)#\s*$/gm){ if($counter++){ last if $counter++; # skip the rest of the matches # can also be used to print warnings about multiple titles # and check $1 against $titles if they are the same, or not }else{ $titles = $1; # first match, we can store it, print "$titles \n"; # or print it out } } }
the output is
this is text I want 1 this is text I want 2 this is text I want 3
You can also replace the while with an if, and then it just matches the first title# .
foreach my $filename (sort keys %mycorpus) { my $titles = ''; if ($mycorpus{$filename} =~ /title:#(.*?)#\s*$/m){ $titles = $1; print "$titles \n"; } }
The output is the same. I think you wanted the multiline regexp modifier to match a newline inside your filedump string.
edit: better structure to allow more post-work (commented what can be done there). Did also remove the /g (go) modifier in the "if" example as it is not needed there.
In reply to Re: Remove all duplicates after regex capture
by FreeBeerReekingMonk
in thread Remove all duplicates after regex capture
by Maire
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |