while (<CONFIG>) { if (/^\s*#/) { # ignore comment line } elsif (/^\s*$/) { # ignore blank line } elsif (/(\w+)\s*=\s*[<]{2}(\w+)/) { # heredoc (my $name, local $/) = ($1, "\n$2"); # ++ysth $config{$name} = <CONFIG>; chomp $config{$name}; # as etcshadow points out. } elsif (/(\w+)\s*=\s*(.*?)\s*$/) { # regular pair $config{$1}=$2; } else { warn "Ptooey: Could not parse config line: $_\n"; } }
This does not handle the sorts of heredocs where the type of quoting is specified (e.g., <<'HEREDOC'), however. That could be a future improvement, if you need it.
if you have a regex could you explain how it works?
The first couple are pretty basic, assuming you know that \s matches whitespace (spaces, tabs, and so forth), so I'll let you figure those out on your own. The other two bear more explaining... I'll start with the last one:
/(\w+)\s*=\s*(.*?)\s*$/\w matches a word character (letters, numbers, underscore, ...). + means one or more, and the parens capture those word characters to $1. Then you have an equal sign (possibly surrounded by zero or more whitespace characters). After that, this variation slurps forward, taking as few characters as possible (that's what the ? is for, to make it non-greedy) for $2, until it encounters the whitespace at the end of the line.
The one you're probably most interested in is the one that does the here document:
} elsif (/(\w+)\s*=\s*[<]{2}(\w+)/) { # heredoc (my $name, local $/) = ($1, $2); $config{$name} = <CONFIG>;
The first part is the same, matching the name of the config option and the equal sign, with any surrounding whitespace. I put the less-than symbol in a character class because I couldn't remember whether it's a special character in the main part of a regular expression. (I don't think so, but I wanted to be safe and give you code I knew would work.) the {2} is just a quantifier, telling how many times we want to match that preceding atom, so basically that all matches two less-than symbols in a row. Then, as before, it matches a series of one or more word characters. Now, the trick is that I didn't use the regex to match the rest of the here document: I grabbed the key from the regex and also the string used to mark the end of the here document, then I set the input record separator ($/), which causes any read on the filehandle to go forward until it hits that point. This does have a weakness, in that a true here document can have that string in the document as long as it's not on a line by itself, but for config file purposes I figured I'd take the shortcut. The local qualifier on the assignment to $/ ensures that when the elsif block is exited the input record separator returns to its normal state, so that subsequent reads on the filehandle work as per normal.
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
In reply to Re: regex for here doc?
by jonadab
in thread regex for here doc?
by smackdab
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |