magawake has asked for the wisdom of the Perl Monks concerning the following question:

I have a file like this:
$ cat file *start*of*junk*data*this is all junk data 01*end*of*junk*data*binary d +ata binary data binary data *start*of*junk*data*this is all junk data 02*end*of*junk*data*binary d +ata binary data binary data *start*of*junk*data*this is all junk data 03*end*of*junk*data*binary d +ata binary data binary data *start*of*junk*data*this is all junk data 04*end*of*junk*data*binary d +ata binary data binary data
My intention is to get only the binary data. Therefore, I am using the range operator (..) to go from "end*of*junk*data" to "*start*of*junk*data" Ofcourse, once I get the binary data I will use unpack() to view the data. Here is what I have so far.
#!/usr/bin/perl -w use strict; my $file="file"; open FILE, $file or die $!; while (<FILE>) { if (m{"*end*of*junk*data*".*?}i .. m{"*start*of*junk*data*".*?}) { p +rint; } } close (FILE);
For some reason I am not able to get the "range" operator to only display binary data for me. Any ideas? TIA

Replies are listed 'Best First'.
Re: Range question
by ikegami (Patriarch) on Apr 10, 2009 at 03:41 UTC

    In scalar context, it's usually called the flip-flop operator.

    What it's doing is printing all the lines that contains *end* and all the lines containing *start*, and every line in between. If *start* or *end* occurs in the middle of a line, so be it. If multiple *start* or *end* tags exist in one line, so be it.

    The whole concept of line is kinda odd when dealing with binary data anyway. Usually one reads fixed-width blocks of data when dealing with binary data because you'll never know how long a line will be. (0 bytes? 1 bytes? the entire file?)

    Assuming fixed-width delimiters, your code should look like this:

    #!/usr/bin/perl -w use strict; use constant BLK_SIZE => 64*1024; sub process { my ($data) = @_; print($data); # Or whatever } sub read_more { my $fh = shift; my $rv = read($fh, $_[0], BLK_SIZE, length($_[0])); die $! if !defined($rv); return $rv; } my $start = '*start*of*junk*data*'; my $end = '*end*of*junk*data*'; my $qfn = "file"; open( my $fh, '<:raw:perlio', $qfn ) or die("open $qfn: $!\n"); my $buf = ''; my $in_junk = 0; while (read_more($fh, $buf)) { if ($in_junk) { my $pos = index($buf, $start); if ($pos >= 0) { process(substr($buf, 0, $pos, '')); substr($buf, 0, length($start), ''); $in_junk = 1; redo; } else { process(substr($buf, 0, -length($start)+1, '')); } } else { my $pos = index($buf, $end); if ($pos >= 0) { substr($buf, 0, $pos+length($end), ''); $in_junk = 0; redo; } else { substr($buf, 0, -length($end)+1, ''); } } } process($buf) if !$in_junk;

      Assuming constant delimiters, a simpler solution:

      #!/usr/bin/perl -w use strict; sub process { my ($data) = @_; print($data); # Or whatever } my $start = '*start*of*junk*data*'; my $end = '*end*of*junk*data*'; my $qfn = "file"; open( my $fh, '<:raw:perlio', $qfn ) or die("open $qfn: $!\n"); for (;;) { $/ = $start; my $good = <$fh>; last if !defined($good); chomp($good); process($good); $/ = $end; my $junk = <$fh>; last if !defined($junk); }
Re: Range question
by jwkrahn (Abbot) on Apr 10, 2009 at 03:41 UTC

    You are not using the range operator, you are using the flip-flop operator.   .. in list context is the range operator while .. in scalar context (like in your code example) is the flip-flop operator.

      Why don't you turn that nit into a patch for perlop which calls it the range operator in both instances?
Re: Range question
by CountZero (Bishop) on Apr 10, 2009 at 09:36 UTC
    Are you absolutely sure that your binary data cannot contain (an) EOL character(s)? Because if it does you, cannot read the file on a line-by-line basis.

    This leads me to think of an alternative solution.

    Read your whole file into a scalar and use

    split /\*start\*of\*junk\*data\*.*?\*end\*of\*junk\*data\*/, $my_whole +_file
    to get at your binary data. Of course this will only work if your whole file can fit within one scalar.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James