in reply to Re: Edit complex file
in thread Edit complex file
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Edit complex file
by johngg (Canon) on May 03, 2006 at 19:15 UTC | |
The next construct is the tricky bit. The (?= ... ) is called a zero-width positive look-ahead assertion; I think I've got that right. Basically, the regular expression engine keeps track of where it has reached in the string; the look-ahead says to the engine, staying where you are, look further along from this point to see if you can find whatever. In our case we are looking for one of two things; one or more digits followed by a tab (the \d+\t) or the end of the string (the \z), in effect EOF. The (?: ... ) uses the '(' and ')' to group the alternations ('|' is the regular expression or) and the ?: switches off regular expression memory because we aren't interested in what the look-ahead finds, only that it has found it. The line
does a couple of things. It uses our previously constructed regular expression and matches it against $_ which is the default behaviour. The thing to note is that the match is done globally with the / ... /g flag. Because of global, the expression keeps going along the string finding matches and because we have used regular expression memory, what it matches is assigned to the @items list, all in one fell swoop. As an aside, if we had slurped the file into a lexical variable like this
you can't rely on the default matching against $_ so you would do this
We now have each data item in it's own element in the list but the items still contain the unwanted newlines that you wish to turn into tabs. We can again use a look-ahead assertion, this time in a substitution. We want to replace a newline only if it is followed by another character, it doesn't matter what character. We don't want to touch the last newline in the data item as we want that in our modified data file and that will not be followed by anything else. The \n(?=.) says a newline followed by some single character and because the look-ahead consumes no characters leaving the pointer behind the newline, only the newline gets replaced. The
iterates over @items aliasing each element in turn to $_ and then doing a global substitution of any newline in the middle of the data item with a tab. I hope this makes things clearer for you. Cheers, JohnGG | [reply] [d/l] [select] |
by Anonymous Monk on May 03, 2006 at 21:16 UTC | |
| [reply] |