throop has asked for the wisdom of the Perl Monks concerning the following question:

Brethren

I seek to split text, based on a bunch of keyword in the text, and shove it into a hash. I've been staring at 'Chapter 6: Pattern Matching' in the Perl Cookbook, but I just can't seem to get my mind around it. I'm pretty sure I need to be using the \G anchor, and non-greedy matching, but... I have an array of keywords like

$Keyptr = {PREFACE => 1, ANALYSIS => 1, DEFERRED => 1, 'NAME / NUMBER' => 1, CONCLUSION => 1, EFFECTS => 1, REMARKS => 1}

My input looks like

@inputs = ("fronttrash PREFACE: mumble ^M ANALYSIS: yada ^M yada CONCLUSION: d +rone drone ^M REMARKS: ixnay", "ANALYSIS: Chuckle REMARKS: Yada2 DEFERRED: blahblah ^M blah ^M NAM +E / NUMBER: John 369")
I want to coerce it into something like
[{PREFACE => "mumble", ANALYSIS => "yada ^M yada", CONCLUSION => "drone drone", REMARKS => "ixnay"}, {ANALYSIS => "Chuckle", REMARKS => "Yada2", "NAME / NUMBER" => "John 369", DEFERRED => "blahblah^M blah"}]
Is this the thing that can /should be done in a single golfed line? Alternatively, is there a module that I should be looking at?

Thanks
Throop

Replies are listed 'Best First'.
Re: Regex help - butchering text into paragraphs
by pc88mxer (Vicar) on May 02, 2008 at 19:03 UTC
    One way is to use split:
    my @list; for my $line (@inputs) { my @fields = split(/(PREFACE|ANALYSIS|...): /, $line); shift @fields; my $record = {}; while (@fields) { my ($tag, $content) = splice(@fields, 0, 2); $record->{$tag} = $content; } push(@list, $record); }
Re: Regex help - butchering text into paragraphs
by mwah (Hermit) on May 02, 2008 at 22:42 UTC
    Throop:
    Is this the thing that can /should be done in a single golfed line? Alternatively, is there a module that I should be looking at?

    Is this part of a larger project? Then, golf wouldn't make much sense.

    A simple way, in addition to the solution already posted, would combine the keys into a search expression.

    From your description, it isn't completely clear (for me) what you are trying to do. I'll give my alternative below:

    # ==> $keyptr, @inputs as in your code my $rg = join'|', map quotemeta, keys %$Keyptr; my @hits; push @hits, { /($rg):(.+?)(?=(?:$rg)|$)/g } for @inputs ; # thats it

    You could dump the result w/the following loop:

    ... for my $rec (@hits) { print "====\n"; print "$_ => $rec->{$_}\n" for sort keys %$rec } ...

    This would print then:

    ==== ANALYSIS => yada ^M yada CONCLUSION => drone drone ^M PREFACE => mumble ^M REMARKS => ixnay ==== ANALYSIS => Chuckle DEFERRED => blahblah ^M blah ^M NAME / NUMBER => John 369 REMARKS => Yada2

    Regards

    mwa