Hi Monks

I'm onto move advanced regular expressions now and I'm playing with (?{...})

As a learning exercise I'm attempting to parse PNG files with a regex. Again, emphasis on the learning exercise :)

After an 8 byte header, PNG files contains a sequential list of chunks. Each chunk consists of a 32-bit big endian length field, 32-bit ASCII tag, n bytes of data, where n is the value set in the length field, and a 32-bit big endian CRC. The full specification, should you require it can be found here on w3.org

I have created the following test code, which loads a PNG from __DATA__, skips the header and attempts to extract chunks with my regex.

I am successfully unpacking the length field within the (?{...})construct and extract the data to a scalar. I am also attempting to move the position with the regex by assigning to pos(). While the assignment seems to work, as in there are no errors, it seems to have no affect. I am extracting the first four bytes of the data with the CRC value.

Is it not possible to move the position within the regex within a (?{...}) block? I originally thought to extract the data with a (?<data>.{$len}) construct, where $len is set within the (?{...}) but this also didn't work. I assume this is because the value of $len is set during regex compilation? (which would be undef)

I've also attempted to make a version which uses the (??{...}) construct, but this is very new to me. This seems simpler as you can unpack the length field and return a string which matches that many bytes, but it has an odd issue with matching the last chunk. Not too sure if I should include it within this node, but advice for why this doesn't work is also much appreciated.

Many thanks

(?{...}) version:

#!/usr/bin/env perl use strict; use warnings; use 5.016; # Parsing vars my ($data); # Rebuild PNG image local $/; my $input = <DATA>; chomp $input; $input = pack "H*", $input; # Skip PNG Header $input = substr( $input, 8 ); while ($input =~ /# four byte length (?<len>....) # four byte tag (?<tag>....) # variable length data (?{ # unpack length my $len = unpack "N", $+{len}; # extract data $data = substr($_, pos(), $len); # update position pos() += $len}) # four byte crc (?<crc>....)/mgx) { say "Chunk len:", unpack "N", $+{len}; say "Chunk tag:", $+{tag}; say "Chunk data:", unpack "H*", $data; say "Chunk data len:", length $data; say "Chunk CRC: ", unpack "H*", $+{crc}; say "---"; } __END__ __DATA__ 89504e470d0a1a0a0000000d4948445200000010000000100803000000282d0f530000 +015c504c544547704c4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb5 +4f7eb54f7eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb34f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb44f7 +eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb5 +4f7eb54f7eb54f7eb54f7eb34f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7 +eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb4 +4f7eb54f7eb54f7eb5a29b27fc0000007374524e5300242d75280b02fa8bae61fdf7f +bf5e266d109dd6462faf9fc6b0dde6571b3d9b0031f01e0cd83d64505928fc51eb405 +582a426d052f12e5840e63ecea046ab248d0f6327c3e0c4ff311b67960822ef295f8a +093af76e3683c3d7213eb02192cbb8106f190db5ad42cc19ba7504c36150ec28de169 +2647000000cb4944415418d3636040005b264b0614e0e7e50c24f595617ccf201e5f3 +706062e16289f8d39b0d83fc09521980bc237737011e6f370b260080fb11390111195 +55b0b6f7f671b7d2658808e3e665176397282e2e766404298de229860173539040226 +f71b18db43848401eac222db9983d9445925385bb584e0a2490915a1c13c9c0c0aac1 +5fccc10612c8e1e74b8806d28682eacc60fb8b84d3536281b4910ea731588043303b3 +30e481ba8ea690b8104720b05b2e2813413879a96124840282fbf2009c4d034615504 +520015a424be37c5c7be0000000049454e44ae426082

(??{...}) version:

#!/usr/bin/env perl use strict; use warnings; use 5.016; our $VERSION = 0.1; # Rebuild PNG image local $/; my $input = <DATA>; chomp $input; $input = pack "H*", $input; # Skip PNG Header $input = substr( $input, 8 ); while ( my ( $len, $tag, $data, $crc ) = $input =~ / # four byte length (....) # four byte tag (....) # variable length data (??{ # unpack length my $len = unpack "N", $1; print STDERR "$2 $len\n"; return "(.){$len}"}) # four byte crc (....)/gx ) { say "Chunk len:", unpack "N", $len; say "Chunk tag:", $tag; say "Chunk data:", unpack "H*", $data; say "Chunk data len:", length $data; say "Chunk CRC: ", unpack "H*", $crc; } __DATA__ 89504e470d0a1a0a0000000d4948445200000010000000100803000000282d0f530000 +015c504c544547704c4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb5 +4f7eb54f7eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb34f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb44f7 +eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb5 +4f7eb54f7eb54f7eb54f7eb34f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7 +eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb4 +4f7eb54f7eb54f7eb5a29b27fc0000007374524e5300242d75280b02fa8bae61fdf7f +bf5e266d109dd6462faf9fc6b0dde6571b3d9b0031f01e0cd83d64505928fc51eb405 +582a426d052f12e5840e63ecea046ab248d0f6327c3e0c4ff311b67960822ef295f8a +093af76e3683c3d7213eb02192cbb8106f190db5ad42cc19ba7504c36150ec28de169 +2647000000cb4944415418d3636040005b264b0614e0e7e50c24f595617ccf201e5f3 +706062e16289f8d39b0d83fc09521980bc237737011e6f370b260080fb11390111195 +55b0b6f7f671b7d2658808e3e665176397282e2e766404298de229860173539040226 +f71b18db43848401eac222db9983d9445925385bb584e0a2490915a1c13c9c0c0aac1 +5fccc10612c8e1e74b8806d28682eacc60fb8b84d3536281b4910ea731588043303b3 +30e481ba8ea690b8104720b05b2e2813413879a96124840282fbf2009c4d034615504 +520015a424be37c5c7be0000000049454e44ae426082

output:

IHDR 13 PLTE 348 tRNS 115 IDAT 203 DAT 52041 Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/ +(.){ <-- HERE 52041}/ at regex2.pl line 17, <DATA> chunk 1.

To me, this looks like the IDAT chunk match is failing so it's backtracking and seeking forwards


In reply to Modifying pos() from within (?{...}) by mxb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.