Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Usage of regular expressions in input separator

by jdrago999 (Pilgrim)
on Dec 30, 2011 at 22:31 UTC ( [id://945692]=note: print w/replies, xml ) Need Help??


in reply to Usage of regular expressions in input separator

It would be slick if we could:

$/ = sub { my ($line) = @_; $line =~ m{Separator\s+\d+}; };

Replies are listed 'Best First'.
Re^2: Usage of regular expressions in input separator
by afoken (Chancellor) on Dec 31, 2011 at 13:13 UTC
    It would be slick if we could:
    $/ = sub { my ($line) = @_; $line =~ m{Separator\s+\d+}; };

    Yes, but the whole point of $/ is to make lines or records from the bits in a file. So there is no "line" before $/ ...

    You could feed the code ref with chunks of a file, but even that would not be sufficient. Imagine a file with a two-byte record separator (e.g. that old CR-LF from DOS). The first chunk ends with the first byte of the record separator (i.e. CR), the second chunk begins with the second byte of the record separator (i.e. LF). Unless you manage to maintain some state information, you would not be able to detect the record separator.

    That state information has to be per file handle, or else you mix data from different files. So you can not use global or state variables, unless you also pass the handle to the code ref and use it to index arrays or hashes with status data.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks for giving my "wouldn't it be cool if..." the full treatment.

      In the meantime, I suppose we'll have to:

      open my $ifh, '<', $filename or die "Cannot open '$filename' for reading: $!"; local $/; foreach my $chunk ( split /Separator\s+\d+/, scalar(<$ifh>) ) { # yay chunk! }

      Unfortunately this will not do well for very large files. We'd have to check against the regexp as each byte is read into memory.

      # I might be way off-base here: no warnings 'uninitialized'; my $pattern = qr{Separator\s\d+}; my $callback = sub { warn "Chunk: @_" }; binmode($ifh); my $offset = 0; my $buffer = ''; while( sysread($ifh, my $byte, 1, $offset++) ) { $buffer .= $byte; if( $buffer =~ $pattern ) { $callback->( $buffer ); $buffer = ''; } }

      Even that won't work correctly, and it would be really, really slow.

      I only wrote it here for the sake of discussion.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://945692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-03-28 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found