If you have a pattern which matches the beginning of a "paragraph", you can use the following code to partition a string into "paragraphs". Notes: This will produce a first element which does not match the pattern if the first match occurs after the beginning of the string. The pattern should not match the empty string.
For example, to split AIX stanza files (e.g. /etc/security/passwd):@list = split /(?=PATTERN)/;
If you have a pattern which matches the end of a "paragraph", you can use the following code to partition a string into "paragraphs". Notes: This code properly handles a missing delimiter at the end of the string. The pattern should not match the empty string.my $pat = qr/^[ \t]*[^\s:]+:[ \t]*$/m; # allow leading/trailing ws my $pat = qr/^[^\s:]+:/m; $_ = slurp_file; my @stanzas = split /(?=$pat)/o;
For example, to split paragraphs based on one or more blank lines at the end of a paragraph, use the following. Note the added complication of handling a non-newline-terminated line at the end of the string.@list = /( .*? PATTERN | .+ )/gsx;
If you don't care about capturing the blank lines between paragraphs, you can use the following code. Notes: This will properly handle a non-newline-terminated blank line at the end of the string. The first list element will be empty if the string starts with a blank line. The second line of code wll remove such a list element.my $pat = qr/(?:^[ \t]*\n)+(?:[ \t]+\z)?/m; $_ = slurp_file; my @list = /( .*? $pat | .+ )/ogsx;
Here is a pattern which can be used to split a string based on a delimiter followed by zero or more blank lines. It properly handles a non-newline-terminated blank line at the end of the string.my @list = split /^\s*(?:\n|\z)/m; shift @list if @list && $list[0] eq ""; # remove empty first element
Here are two subroutines which can be used to partition a string into paragraphs.my $delim = qr/^[ \t]*SOMETHING[ \t]*$/m; my $pat = qr/$delim(?:\n[ \t]*)*(?:\n|\z)/o;
# Partition a string into paragraphs based on a # pattern which matches the beginning of a paragraph. sub partition_para_beg { my ($pat, $str) = @_; $str = $_ unless defined $str; if ("" =~ /$pat/) { require Carp; Carp::croak("invalid pattern matches empty string: \"$pat\"\n"); } split /(?=$pat)/; } # Partition a string into paragraphs based on a # pattern which matches the end of a paragraph. sub partition_para_end { my ($pat, $str) = @_; $str = $_ unless defined $str; if ("" =~ /$pat/) { require Carp; Carp::croak("invalid pattern matches empty string: \"$pat\"\n"); } return $str =~ /(.*?(?:$pat)|.+)/gs; }
In reply to Parsing a string into "paragraphs" by jrw
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |