toonski has asked for the wisdom of the Perl Monks concerning the following question:

okay, i'm using an index file to store information about a title, and it looks kinda like this:

23||record1||The Title 1054||record2||The Title #2 2023||record3||The Title #3

so i open the file index and store the data in a hash (associative array) like this:

open (FILE, "file.index"); while (<FILE>) { chomp; ($pointer, $id, $title) = split ("||"); $fileid{$id} = $pointer; $filetitle{$id} = $title; }

but it doesnt work right. and when i actually print the $id, $pointer, $title, it returns the length of the split string rather than the split string itself. it's wierd.

Replies are listed 'Best First'.
Re: Wierd funky problems with split
by dws (Chancellor) on Jan 02, 2003 at 03:46 UTC
    but it doesnt work right.

    I got burned by this a couple of months ago. The first argument to split, with one exception, is a regular expression, whether it looks like one or not. And '|' has a special meaning in regular expressions. Change   ($pointer, $id, $title) = split("||"); to   ($pointer, $id, $title) = split "\|\|"; and you'll get better results.

      Your answer is not quite right. test with this:
      $a = "12||23||34||45"; @a = split("\|\|", $a); print join(",", @a);
      Instead of giving you,
      12,23,34,45
      It gives you
      1,2,|,|,2,3,|,|,3,4,|,|,4,5
      You have to say:
      $a = "12||23||34||45"; @a = split(/\|\|/, $a); print join(",", @a);
      Or, if you want to use quots, say:
      $a = "12||23||34||45"; @a = split("\\|\\|", $a); print join(",", @a);
      The reason is simple:
      1. For that "|", you need to escape it within regexp, but not within quots.
      2. For that "\", you have to escape it within quots.
      So if you say "\\|",
      1. First it is being intepreted by the quots as "\|",
      2. Then it is being further interpreted by the regexp as "|".
      Right on! And just to emphasize it in your own mind (and that of those who will come after you), you should always write it like a regex, e.g. split /\|\|/;

      Just a stylistic point...

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Wierd funky problems with split
by steves (Curate) on Jan 02, 2003 at 04:03 UTC

    split takes a regex as its first argument. The | characters are thus being interpreted as regex alternation characters. This works:

    use strict; my ($pointer, $id, $title); while (<DATA>) { chomp; ($pointer, $id, $title) = split(/\|\|/); print "pointer=$pointer, id=$id, title=$title\n" } __DATA__ 23||record1||The Title 1054||record2||The Title #2 2023||record3||The Title #3

    But note that I used the / quotes -- not ". This is one I don't quite understand. Using " you need to escape with two backslashes; presumably because there's an extra interpolation pass or something? In other words, why do I need this using double quotes?

    $pointer, $id, $title) = split("\\|\\|");

    I should know this but don't off the top of my head. Looking it up now ...

    Without escaping the | characters you're telling split to split on single characters. The result I get when I print using your example is the first three characters of each line as expected.

      To try and answer my own question, I believe section 5.3 of Programming Perl, 3rd Edition answers this in the following paragraphs:

      There is some amount of overlap between the characters that a pattern can match and the characters an ordinary double-quoted string can interpolate. Since regexes undergo two passes, it is sometimes ambiguous which pass should process a given character. When there is ambiguity, the variable interpolation pass defers the interpretation of such characters to the regular expression parser.

      But the variable interpolation pass can only defer to the regex parser when it knows it is parsing a regex. You can specify regular expressions as ordinary double-quoted strings, but then you must follow normal double-quote rules. Any of the previous metasymbols that happen to map to actual characters will still work, even though they're not being deferred to the regex parser. But you can't use any of the other metasymbols in ordinary double quotes (or in any similar constructs such as `...`, qq(...), qx(...), or the equivalent here documents). If you want your string to be parsed as a regular expression without doing any matching, you should be using the qr// (quote regex) operator.

      In other words, with double quotes there is no clue up front it's going to be used as a regex, so normal double quote interpolation occurs right off. In the case of "\|\|", that pass produces "||", which is then passed to the regex parser. By using "\\|\\|" that first pass instead produces "\|\|" which the regex parser interprets as literal (escaped) pipes. I believe using /\|\|/ causes it to be treated as a regex up front, bypassing that initial removal of the backslashes.

Re: Wierd funky problems with split
by JamesNC (Chaplain) on Jan 02, 2003 at 05:06 UTC
    you could also just change the record delimiter to a tab or : or = and split on \t : or = :) and avoid the extra typing :-)

      I'm going to hazard a guess that this is an Ultimate Bulletin Board file he's trying to parse - that's certainly what it looks like. If this is the case, switching the separator is non-trivial (mostly because UBB, or at least each of the versions I've used, is a horrible, multi-file, spaghetti mess). Why they chose || as a separator I'll never know.

        lol; i'm actually writing online versions of those old quickhelp files for dos.

        http://perl.qb45.com/qboho/qb45hlp.cgi

      thanks to everyone that helped, your solutions worked fine :) i feel pretty stupid now, but i got it :)
Re: Wierd funky problems with split
by virtualsue (Vicar) on Jan 02, 2003 at 21:42 UTC
    ($pointer, $id, $title) = split ("||");

    I'm not a big fan of leaning toothpicks, so in this situation I'd probably use '\|\|' rather than /\|\|/ or "\\|\\|" but there isn't much chance of making this statement beautiful given the bizarre choice of field separator. Use of quote operators such as q() only clutters things up more. You would be wise to write yourself some clarifying comments here. :-)