stalepretzel has asked for the wisdom of the Perl Monks concerning the following question:

I am writing code that works with files, and I need to use the Input Record separator, $/. $/ is, obviously, set to a default of "\n". When I try to set it to another value, such as "ENDOFLINE" it stops working. That is when I try to print a single line it prints the entire file. It's as if it's take the value: undef . However, the interesting part is that when I print the actual VALUE of $/ it still prints ENDOFLINE

SUMMARY: 1) $/ has a value of "\n" and works perfectly UNTIL:
2) $/ is set to a new value. (ie "ENDOFLINE")
3) $/ appears to take the value of undef.
4) When $/ is read as a string, it appears to be correct! Ex: print "$/"; outputs the desired value: "ENDOFLINE" ALSO (just remembered) $/ functions as expected when it is set to a string of length 1 (such as "d" or "7")

I've asked on thescripts.com, and webdeveloper.com and nobody could find a problem with my code. I rewrote it, still didn't work. I took code from the blogs.perl.org tutorial that i learned perl from, and I experienced the same problem. My conclusion: $/ DOESN'T WORK!!! (?) Does anyone have any advice? My next step seems to be to uninstall and reinstall my perl engine. I thought I'd ask, first.

Thanks so much!

The code follows:

Well, here's the code. I figured that it hadn't helped so far, but you're right, rgiskard, i always helps.

Perl:

#!usr/bin/perl use warnings; use strict; open(XMLFILE, "students.xml"); $/ = "</student>"; for (<XMLFILE>){ print $_; print "ALALALALALALALA\n"; } close(XMLFILE);

students.xml: (in the same directory, of course)

<?xml version="1.0" encoding="UTF-8"?> <studentlist> <class id="Geography"> <student> <name>Lexi</name> <gender>F</gender> </student> <student> <name>Nelle</name> <gender>F</gender> </student> <student> <name>Josh</name> <gender>M</gender> </student> <student> <name>Jason</name> <gender>M</gender> </student> <student> <name>Ben</name> <gender>M</gender> </student> <student> <name>Larry</name> <gender>M</gender> </student> </class> <class id="English"> <student> <name>Caleb</name> <gender>M</gender> </student> <student> <name>Emily</name> <gender>F</gender> </student> <student> <name>Adelle</name> <gender>F</gender> </student> <student> <name>Mike</name> <gender>M</gender> </student> </class> </studentlist>

OUTPUT:

imac:~/begperl J$ perl abc.plx ??<?xml version="1.0" encoding="UTF-8"?> <studentlist> <class id="Geography"> <student> <name>Lexi</name> <gender>F</gender> </student> <student> <name>Nelle</name> <gender>F</gender> </student> <student> <name>Josh</name> <gender>M</gender> </student> <student> <name>Jason</name> <gender>M</gender> </student> <student> <name>Ben</name> <gender>M</gender> </student> <student> <name>Larry</name> <gender>M</gender> </student> </class> <class id="English"> <student> <name>Caleb</name> <gender>M</gender> </student> <student> <name>Emily</name> <gender>F</gender> </student> <student> <name>Adelle</name> <gender>F</gender> </student> <student> <name>Mike</name> <gender>M</gender> </student> </class> </studentlist>ALALALALALALALA imac:~/begperl J$

The "perl -v" command returns: This is perl, v5.8.6 built for darwin-thread-multi-2level

Replies are listed 'Best First'.
Re: Input Record Separator WON'T WORK!
by jdporter (Paladin) on Jan 25, 2008 at 00:33 UTC

    Strange... It works just fine for me. But I have some comments:

    • You should use while(<>), not for(<>). The latter reads the whole file into memory first, which you say (rightly) that you don't want to do.
    • Are you really wanting to process an XML file? If so, then this is a pretty bad way to do it. XML::Simple and several others are readily available, and will Do The Right Thing without you having to think twice about it.

    use XML::Simple; my $ds = XMLin "664168.xml"; use Data::Dumper; print Dumper $ds;
    A word spoken in Mind will reach its own level, in the objective world, by its own weight
Re: Input Record Separator WON'T WORK!
by rgiskard (Hermit) on Jan 25, 2008 at 00:07 UTC

    It would be great if you could include some of your code to look at; you'll find it helps a lot.

    Otherwise, everybody (myself included) will probably suggest that you start using regular expressions to solve your tokenizing dilemma.

    Update Here's an example of code using that character you keep talking about. Lemme know if it solves your problem; otherwise post some code.
    my $schtuff = "blah ENDOFLINE blah ENDOFLINE blah ENDOFLINE\n"; open my $DEADEND, ">schtuff.txt" or die 'try again'; { print $DEADEND $schtuff; } close $DEADEND; open my $moreSchtuff, "<schtuff.txt" or die $!; { local $/ = 'ENDOFLINE'; while (<$moreSchtuff>) { print $_ . "<-- SHAZAM! properly tokenized!\n"; } } close $moreSchtuff;
    5.8.8 Output
    blah ENDOFLINE<-- SHAZAM! properly tokenized! blah ENDOFLINE<-- SHAZAM! properly tokenized! blah ENDOFLINE<-- SHAZAM! properly tokenized!
      I'm not sure I understand.... I mean... I'll need to use regular expressions to WORK with the data that I read, but this file has the potential to be HUGE, and it would be very memory intensive if I had to read the whole file at once. Could you clarify?
      Sigh..... It works, and I can't even be excited because I'm still not sure why... but thanks for your help. I'll use all of your suggestions if it stops working again....
Re: Input Record Separator WON'T WORK!
by toolic (Bishop) on Jan 25, 2008 at 00:35 UTC
    rgiskard did a good job of demonstrating the record separator issue.

    But, it looks like your real problem is that you want to parse an XML file. Have you tried using any of the modules available on CPAN, such as XML::Simple?

      Okay, everybody, it works! Here's the lowdown. It was a text-encoding problem. UTF-8 and UTF-16 weren't cooperating. Duh. also explains why just ONE character worked.... Seems like perl could deal with 0000000012345678 as an 8bit value, but a STRING of those (000000001234567800000000685708540000000064739567) was thought to be 00000000, 12345678, 00000000, 68570854, etc, which obviously DIDN'T match. I'm happy to have resovled this one. That's correct about the for/vs while. Oops! Thanks. And yes, I'm constantly reminded to consult CPAN. But, I'm just learning perl. I'm not using modules; I'm writing everything myself. The more code you write, the more you learn. I know it's reinventing the wheel. That's how you learn! "Good programmers write good programs, great programmers steal great programmers." Obviously. I'm trying to learn to be a good programmer before I become a great programmer. Thanks for your help everyone!

        Using the IRS fails in the following situations:

        <student><!-- </student> -->...</student>
        <student>...</student >
        <student />

        And possibly more. (charset? entities?)

        Also, it fails miserably (i.e. reads the entire file into memory) at detecting errors.

        If you really do wish to reinvent the wheel, time to learn and start using something else.