archer has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to use regular expressions in the input separator "$/"? I have changed the input separator from the default "\n" to some other text and it worked fine but when i assigned a regular expression to the input separator, it dint work? Am i missing something here? Provide your ideas!!!

$/ = "Separator" (worked) $/ = "Separator \d"(dint work)

Replies are listed 'Best First'.
Re: Usage of regular expressions in input separator
by Eliya (Vicar) on Dec 30, 2011 at 12:32 UTC

    No, this is not possible — as mentioned in the docs:

    "Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)"
Re: Usage of regular expressions in input separator
by JavaFan (Canon) on Dec 30, 2011 at 12:51 UTC
    No, it's not possible.

    I expect the reason to be is that you may have to read the entire input stream up to EOF with some patterns - and put them back on the input stream if there's no longer match. (Suppose your delimiter is $/ = /(.).*\1/), and that may be a costly (specially in memory usage) operation -- or your process could just "hang" forever (if it's trying to read all your standard input, or reading from a (bidirectional) pipe or network socket).

    I can see the point, but I would be willing to pay the price. Sure, in degenerated cases it would be costly (so, don't do that), in practice, people would use patterns (like the one you gave), that only requires a limited lookahead.

    But it's too late in the game to change $/ from a fixed string to a pattern -- not does it seem to be an itch of any of the active porters. So, I don't expect this to change any time soon.

Re: Usage of regular expressions in input separator
by NetWallah (Canon) on Dec 30, 2011 at 14:49 UTC
    Depending on how complex your requirements are, you may be able to use Stream::Reader to match one of multiple delimiters, and accomplish your task.

    Although it does not support regular expressions, in your example case, you could use

    map {"Separator $_"} 0..9
    as your delimiter list.

                "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
            -Friedrich Nietzsche: A Dynamic Translation

Re: Usage of regular expressions in input separator
by hyvatti (Initiate) on Nov 27, 2024 at 07:43 UTC

    With PerlIO::via you can add a layer that converts whatever you want to line feeds. For example, if you want to accept CR and LF as line feeds:

    package PerlIO::via::normeol; sub FILL { my ($obj,$fh) = @_; my ($c); my $n = read ($fh, $c, 1); return undef unless $n; $c =~ tr/\r/\n/; return $c; } 1; use PerlIO::via::normeol; open (A, "<:via(normeol)", "foo.bar"); while (<A>) { ...
Re: Usage of regular expressions in input separator
by AnomalousMonk (Archbishop) on Dec 31, 2011 at 19:11 UTC
Re: Usage of regular expressions in input separator
by ww (Archbishop) on Dec 30, 2011 at 15:45 UTC

    Since you, Robin Hood, provided no sample data nor indication of your required output, this is a WAG... but it may provide a workaround... or some ideas for one.

    #!/usr/bin/perl use Modern::Perl; use Data::Dumper; #945627 Workaround if the distinction among elements in each data # segment need not be retained; if retention # is required, read DATA into a HoA with the # separator-and-its-following-digit(s) as keys. say "\n\t \$/ is a string, not a regex," . "\n\t so, using an input_separator without any regex metachar \n"; $/ = "FOO"; my @newarr; my @arr = <DATA>; for my $item(@arr) { $item =~ s/\n//sg; if ( $item =~ /^\d+(.+?)(?:FOO)*$/s ) { my $out = $1; push @newarr, $out; } else { say "\t Disgarding $item (ie, \$arr[1])"; # discarding the in +itial "FOO" in $arr[1] } } print Dumper @newarr; =head OUTPUT $/ is a string, not a regex, so, using an input_separator without any regex metachar Disgarding FOO (ie, $arr[1]) $VAR1 = 'abcdefghi'; $VAR2 = 'jkl-123-'; $VAR3 = 'mnopqrstu'; $VAR4 = 'vwxyz'; =cut __DATA__ FOO0 abc def ghi FOO1 jkl -123- FOO2 mno pqr stu FOO3 vwxy z
    Of course, it's also possible that this has no bearing on your problem... :-(
Re: Usage of regular expressions in input separator
by TJPride (Pilgrim) on Dec 30, 2011 at 16:06 UTC
    You could read in the whole file and regex on that, but I'm assuming that's not something you want to do. You could read in chunks and look for the separator that way, but what if the separator crosses the chunk barrier? For instance, if you're matching on Separator \d+ and the barrier splits it into Separator 2|3 instead of Separator 23. That's no good. Lastly, if your file is in multiple lines, this is fairly easy using a line-by-line technique:

    use strict; use warnings; my ($data, @records); open (FH, 'data.txt') || die; while (<FH>) { $data .= $_; push @records, $1 while $data =~ s/(.*?)Separator \d+//s; } push @records, $data; use Data::Dumper; print Dumper(\@records);

    Data:

    Record A Separator 9 Record B Separator 10 Record C Separator 11 Record D

    Output:

    $VAR1 = [ 'Record A ', ' Record B ', ' Record C ', ' Record D' ];
Re: Usage of regular expressions in input separator
by jdrago999 (Pilgrim) on Dec 30, 2011 at 22:31 UTC

    It would be slick if we could:

    $/ = sub { my ($line) = @_; $line =~ m{Separator\s+\d+}; };
      It would be slick if we could:
      $/ = sub { my ($line) = @_; $line =~ m{Separator\s+\d+}; };

      Yes, but the whole point of $/ is to make lines or records from the bits in a file. So there is no "line" before $/ ...

      You could feed the code ref with chunks of a file, but even that would not be sufficient. Imagine a file with a two-byte record separator (e.g. that old CR-LF from DOS). The first chunk ends with the first byte of the record separator (i.e. CR), the second chunk begins with the second byte of the record separator (i.e. LF). Unless you manage to maintain some state information, you would not be able to detect the record separator.

      That state information has to be per file handle, or else you mix data from different files. So you can not use global or state variables, unless you also pass the handle to the code ref and use it to index arrays or hashes with status data.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Thanks for giving my "wouldn't it be cool if..." the full treatment.

        In the meantime, I suppose we'll have to:

        open my $ifh, '<', $filename or die "Cannot open '$filename' for reading: $!"; local $/; foreach my $chunk ( split /Separator\s+\d+/, scalar(<$ifh>) ) { # yay chunk! }

        Unfortunately this will not do well for very large files. We'd have to check against the regexp as each byte is read into memory.

        # I might be way off-base here: no warnings 'uninitialized'; my $pattern = qr{Separator\s\d+}; my $callback = sub { warn "Chunk: @_" }; binmode($ifh); my $offset = 0; my $buffer = ''; while( sysread($ifh, my $byte, 1, $offset++) ) { $buffer .= $byte; if( $buffer =~ $pattern ) { $callback->( $buffer ); $buffer = ''; } }

        Even that won't work correctly, and it would be really, really slow.

        I only wrote it here for the sake of discussion.