I'm trying to split header names for colums in a datafile. The issue I encountered was awkward formatting. A simple example of this headerline is: 'Iteration {Applied Field} {Total Energy} Mx'
Each element is separated by \s+. However, if the element contains whitespace, it is surrounded with curly brackets. The two patterns I try to match are '\s+{([\s\w:]+)}\s+' and '\s+([\w:]+)\s+' The latter one sometimes overlaps the first ('{blaa blaa blaa}').
I'm looking for guidance how to improve my current solution:
i.e. try pattern 1 if it fails, try pattern 2. Then, remove the matching part from the beginning. Is it possible to combine two patterns into normal split function or m/.../g pattern and capture the text from the middle?use warnings; use strict; my $str = 'Iteration {Applied Field} {Total Energy} Mx'; while(length($str)>0){ #print "str:$str:\n"; if($str =~ m/^({([\s\w:]+)}(\s+)?)/){ print "1: $2\n"; $str =~ s/^$1//; } elsif($str =~ m/^(([\w:]+)(\s+)?)/){ print "2: $1\n"; $str =~ s/^$1//; } else{die "error";} } Result: 2: Iteration 1: Applied Field 1: Total Energy 2: Mx
In reply to Splitting string using two overlapping patterns by kpr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |