Here's a solution that doesn't depend on the order of the lines. It finds all the titles using a regex and counts them using a hash, and then selects the one that appears exactly once, warning or dieing if there isn't exactly one. You haven't specified a few things about the title: line, like whether there can be #'s in the titles, and what kind of text might appear after the closing #.
use warnings; use strict; my %mycorpus = ( a => "<blah blah blah blah title:#this is text I want 1# blah blah blah", b => "blah title:#this is text I do not want# title:#this is text I want 2# blah title:#this is text I do not want# blah", c => "blah blah title:#this is text I do not want# title:#this is text I want 3# title:#this is text I do not want# title:#this is text I do not want# blah", ); for my $filename (sort keys %mycorpus) { my %titles; $titles{$1}++ while $mycorpus{$filename} =~ m{ ^ title:\# (.*) \# }xmg; my @once = grep { $titles{$_}==1 } sort keys %titles; die "No title found in $filename" unless @once; warn "More than one title found in $filename" if @once>1; my $title = $once[0]; print "$title\n"; } __END__ this is text I want 1 this is text I want 2 this is text I want 3
In reply to Re: Remove all duplicates after regex capture
by haukex
in thread Remove all duplicates after regex capture
by Maire
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |