but it doesnt work right.
I got burned by this a couple of months ago. The first argument to split, with one exception, is a regular expression, whether it looks like one or not. And '|' has a special meaning in regular expressions. Change
($pointer, $id, $title) = split("||");
to
($pointer, $id, $title) = split "\|\|";
and you'll get better results.
| [reply] [d/l] [select] |
Your answer is not quite right. test with this:
$a = "12||23||34||45";
@a = split("\|\|", $a);
print join(",", @a);
Instead of giving you,
12,23,34,45
It gives you
1,2,|,|,2,3,|,|,3,4,|,|,4,5
You have to say:
$a = "12||23||34||45";
@a = split(/\|\|/, $a);
print join(",", @a);
Or, if you want to use quots, say:
$a = "12||23||34||45";
@a = split("\\|\\|", $a);
print join(",", @a);
The reason is simple:
- For that "|", you need to escape it within regexp, but not within quots.
- For that "\", you have to escape it within quots.
So if you say "\\|",
- First it is being intepreted by the quots as "\|",
- Then it is being further interpreted by the regexp as "|".
| [reply] [d/l] [select] |
| [reply] [d/l] |
split takes a regex as its first argument. The | characters are thus being interpreted as regex alternation characters. This works:
use strict;
my ($pointer, $id, $title);
while (<DATA>) {
chomp;
($pointer, $id, $title) = split(/\|\|/);
print "pointer=$pointer, id=$id, title=$title\n"
}
__DATA__
23||record1||The Title
1054||record2||The Title #2
2023||record3||The Title #3
But note that I used the / quotes -- not ". This is one I don't quite understand. Using " you need to escape with two backslashes; presumably because there's an extra interpolation pass or something? In other words, why do I need this using double quotes?
$pointer, $id, $title) = split("\\|\\|");
I should know this but don't off the top of my head. Looking it up now ...
Without escaping the | characters you're telling split to split on single characters. The result I get when I print using your example is the first three characters of each line as expected.
| [reply] [d/l] [select] |
To try and answer my own question, I believe section 5.3 of Programming Perl, 3rd Edition answers this in the following paragraphs:
There is some amount of overlap between the characters that a pattern can match and the characters an ordinary double-quoted string can interpolate. Since regexes undergo two passes, it is sometimes ambiguous which pass should process a given character. When there is ambiguity, the variable interpolation pass defers the interpretation of such characters to the regular expression parser.
But the variable interpolation pass can only defer to the regex parser when it knows it is parsing a regex. You can specify regular expressions as ordinary double-quoted strings, but then you must follow normal double-quote rules. Any of the previous metasymbols that happen to map to actual characters will still work, even though they're not being deferred to the regex parser. But you can't use any of the other metasymbols in ordinary double quotes (or in any similar constructs such as `...`, qq(...), qx(...), or the equivalent here documents). If you want your string to be parsed as a regular expression without doing any matching, you should be using the qr// (quote regex) operator.
In other words, with double quotes there is no clue up front it's going to be used as a regex, so normal double quote interpolation occurs right off. In the case of "\|\|", that pass produces "||", which is then passed to the regex parser. By using "\\|\\|" that first pass instead produces "\|\|" which the regex parser interprets as literal (escaped) pipes. I believe using /\|\|/ causes it to be treated as a regex up front, bypassing that initial removal of the backslashes.
| [reply] |
you could also just change the record delimiter to a tab or : or = and split on \t : or = :) and avoid the extra typing
:-) | [reply] |
I'm going to hazard a guess that this is an Ultimate Bulletin Board file he's trying to parse - that's certainly what it looks like. If this is the case, switching the separator is non-trivial (mostly because UBB, or at least each of the versions I've used, is a horrible, multi-file, spaghetti mess). Why they chose || as a separator I'll never know.
| [reply] |
| [reply] |
thanks to everyone that helped, your solutions worked fine :)
i feel pretty stupid now, but i got it :)
| [reply] |
($pointer, $id, $title) = split ("||");
I'm not a big fan of leaning toothpicks, so in this situation I'd probably use '\|\|' rather than /\|\|/ or "\\|\\|" but there isn't much chance of making this statement beautiful given the bizarre choice of field separator. Use of quote operators such as q() only clutters things up more. You would be wise to write yourself some clarifying comments here. :-) | [reply] |