in reply to Remove all duplicates after regex capture
How about this loop?
foreach my $filename (sort keys %mycorpus) { my $titles = ''; my $counter = 0; while ($mycorpus{$filename} =~ /title:#(.*?)#\s*$/gm){ if($counter++){ last if $counter++; # skip the rest of the matches # can also be used to print warnings about multiple titles # and check $1 against $titles if they are the same, or not }else{ $titles = $1; # first match, we can store it, print "$titles \n"; # or print it out } } }
the output is
this is text I want 1 this is text I want 2 this is text I want 3
You can also replace the while with an if, and then it just matches the first title# .
foreach my $filename (sort keys %mycorpus) { my $titles = ''; if ($mycorpus{$filename} =~ /title:#(.*?)#\s*$/m){ $titles = $1; print "$titles \n"; } }
The output is the same. I think you wanted the multiline regexp modifier to match a newline inside your filedump string.
edit: better structure to allow more post-work (commented what can be done there). Did also remove the /g (go) modifier in the "if" example as it is not needed there.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Remove all duplicates after regex capture
by Maire (Scribe) on Aug 19, 2018 at 10:17 UTC | |
by haukex (Archbishop) on Aug 19, 2018 at 10:49 UTC | |
by FreeBeerReekingMonk (Deacon) on Aug 19, 2018 at 19:28 UTC |