in reply to Wierd funky problems with split

split takes a regex as its first argument. The | characters are thus being interpreted as regex alternation characters. This works:

use strict; my ($pointer, $id, $title); while (<DATA>) { chomp; ($pointer, $id, $title) = split(/\|\|/); print "pointer=$pointer, id=$id, title=$title\n" } __DATA__ 23||record1||The Title 1054||record2||The Title #2 2023||record3||The Title #3

But note that I used the / quotes -- not ". This is one I don't quite understand. Using " you need to escape with two backslashes; presumably because there's an extra interpolation pass or something? In other words, why do I need this using double quotes?

$pointer, $id, $title) = split("\\|\\|");

I should know this but don't off the top of my head. Looking it up now ...

Without escaping the | characters you're telling split to split on single characters. The result I get when I print using your example is the first three characters of each line as expected.

Replies are listed 'Best First'.
Re: Re: Wierd funky problems with split
by steves (Curate) on Jan 02, 2003 at 04:38 UTC

    To try and answer my own question, I believe section 5.3 of Programming Perl, 3rd Edition answers this in the following paragraphs:

    There is some amount of overlap between the characters that a pattern can match and the characters an ordinary double-quoted string can interpolate. Since regexes undergo two passes, it is sometimes ambiguous which pass should process a given character. When there is ambiguity, the variable interpolation pass defers the interpretation of such characters to the regular expression parser.

    But the variable interpolation pass can only defer to the regex parser when it knows it is parsing a regex. You can specify regular expressions as ordinary double-quoted strings, but then you must follow normal double-quote rules. Any of the previous metasymbols that happen to map to actual characters will still work, even though they're not being deferred to the regex parser. But you can't use any of the other metasymbols in ordinary double quotes (or in any similar constructs such as `...`, qq(...), qx(...), or the equivalent here documents). If you want your string to be parsed as a regular expression without doing any matching, you should be using the qr// (quote regex) operator.

    In other words, with double quotes there is no clue up front it's going to be used as a regex, so normal double quote interpolation occurs right off. In the case of "\|\|", that pass produces "||", which is then passed to the regex parser. By using "\\|\\|" that first pass instead produces "\|\|" which the regex parser interprets as literal (escaped) pipes. I believe using /\|\|/ causes it to be treated as a regex up front, bypassing that initial removal of the backslashes.