G'day kpr,

Welcome to the monastery.

"Is it possible to combine two patterns into normal split function or m/.../g pattern and capture the text from the middle?"

Yes, this is possible. There's also many ways to do it. I see a number of solutions have already been posted. Here's another one.

I've only included code that's been available since v5.8 or earlier. I've shown how to use the 'x' modifier to make your regex easier to read, easier to maintain and, generally, easier to deal with. Uncomment any of the "my $re = ..." lines to see that they all work the same (obviously, only uncomment one at a time). I've used test data that you posted as well as that provided by some of the monks who've already posted solutions.

Here's the code. Either "perlre - Perl regular expressions" or "perlvar: Variables related to regular expressions" should supply answers to any questions you have; if not, feel free to ask.

#!/usr/bin/env perl -l use strict; use warnings; my @lines = ( 'Iteration {Applied Field} {Total Energy} Mx', '{blaa blaa blaa}', 'Iteration {Applied Field} {Total Energy} Mx', 'Iteration {Applied Field} A {Total Energy} Mx F G {Third tes +t line}', 'Iteration {Applied Field} {Total Energy} Mx Fx {a} B C D {E F +} G', 'Iteration {Applied Field} {Total Energy} {Foo} { a b c d } + Mx', ); my $re_hard_to_read = qr/(?>[{]([^}]+)|\s*(?![{}])(\S+))/; my $re_easy_to_read = qr/(?> [{] ( [^}]+ ) | \s* (?! [{}] ) ( \S+ ) )/x; my $re_fully_annotated = qr/ (?> # start non-capturing, non-backtracking, alternati +on [{] # MATCH: exactly one literal left brace ( # start capture [^}]+ # CAPTURE: one or more of any character except rig +ht brace ) # end capture | # - OR - \s* # MATCH: zero or more whitespace (?! # start zero-width negative lookahead assertion [{}] # ASSERT: next character is not left or right brac +e ) # end zero-width negative lookahead assertion ( # start capture \S+ # CAPTURE: one or more non-whitespace characters ) # end capture ) # end non-capturing, non-backtracking, alternation /x; #my $re = $re_hard_to_read; #my $re = $re_easy_to_read; my $re = $re_fully_annotated; for (@lines) { # Print output heading print join "\n$_\n" => ('-' x 60) x 2; # Array to hold header names my @header_names; # Capture header names push @header_names => $+ while /$re/g; # Output header names print join "\n" => @header_names; }

The output starts like this:

------------------------------------------------------------ Iteration {Applied Field} {Total Energy} Mx ------------------------------------------------------------ Iteration Applied Field Total Energy Mx ------------------------------------------------------------ {blaa blaa blaa} ------------------------------------------------------------ blaa blaa blaa

Here's the rest:

-- Ken


In reply to Re: Splitting string using two overlapping patterns by kcott
in thread Splitting string using two overlapping patterns by kpr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.