mxb has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks
I'm onto move advanced regular expressions now and I'm playing with (?{...})
As a learning exercise I'm attempting to parse PNG files with a regex. Again, emphasis on the learning exercise :)
After an 8 byte header, PNG files contains a sequential list of chunks. Each chunk consists of a 32-bit big endian length field, 32-bit ASCII tag, n bytes of data, where n is the value set in the length field, and a 32-bit big endian CRC. The full specification, should you require it can be found here on w3.org
I have created the following test code, which loads a PNG from __DATA__, skips the header and attempts to extract chunks with my regex.
I am successfully unpacking the length field within the (?{...})construct and extract the data to a scalar. I am also attempting to move the position with the regex by assigning to pos(). While the assignment seems to work, as in there are no errors, it seems to have no affect. I am extracting the first four bytes of the data with the CRC value.
Is it not possible to move the position within the regex within a (?{...}) block? I originally thought to extract the data with a (?<data>.{$len}) construct, where $len is set within the (?{...}) but this also didn't work. I assume this is because the value of $len is set during regex compilation? (which would be undef)
I've also attempted to make a version which uses the (??{...}) construct, but this is very new to me. This seems simpler as you can unpack the length field and return a string which matches that many bytes, but it has an odd issue with matching the last chunk. Not too sure if I should include it within this node, but advice for why this doesn't work is also much appreciated.
Many thanks
(?{...}) version:
#!/usr/bin/env perl use strict; use warnings; use 5.016; # Parsing vars my ($data); # Rebuild PNG image local $/; my $input = <DATA>; chomp $input; $input = pack "H*", $input; # Skip PNG Header $input = substr( $input, 8 ); while ($input =~ /# four byte length (?<len>....) # four byte tag (?<tag>....) # variable length data (?{ # unpack length my $len = unpack "N", $+{len}; # extract data $data = substr($_, pos(), $len); # update position pos() += $len}) # four byte crc (?<crc>....)/mgx) { say "Chunk len:", unpack "N", $+{len}; say "Chunk tag:", $+{tag}; say "Chunk data:", unpack "H*", $data; say "Chunk data len:", length $data; say "Chunk CRC: ", unpack "H*", $+{crc}; say "---"; } __END__ __DATA__ 89504e470d0a1a0a0000000d4948445200000010000000100803000000282d0f530000 +015c504c544547704c4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb5 +4f7eb54f7eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb34f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb44f7 +eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb5 +4f7eb54f7eb54f7eb54f7eb34f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7 +eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb4 +4f7eb54f7eb54f7eb5a29b27fc0000007374524e5300242d75280b02fa8bae61fdf7f +bf5e266d109dd6462faf9fc6b0dde6571b3d9b0031f01e0cd83d64505928fc51eb405 +582a426d052f12e5840e63ecea046ab248d0f6327c3e0c4ff311b67960822ef295f8a +093af76e3683c3d7213eb02192cbb8106f190db5ad42cc19ba7504c36150ec28de169 +2647000000cb4944415418d3636040005b264b0614e0e7e50c24f595617ccf201e5f3 +706062e16289f8d39b0d83fc09521980bc237737011e6f370b260080fb11390111195 +55b0b6f7f671b7d2658808e3e665176397282e2e766404298de229860173539040226 +f71b18db43848401eac222db9983d9445925385bb584e0a2490915a1c13c9c0c0aac1 +5fccc10612c8e1e74b8806d28682eacc60fb8b84d3536281b4910ea731588043303b3 +30e481ba8ea690b8104720b05b2e2813413879a96124840282fbf2009c4d034615504 +520015a424be37c5c7be0000000049454e44ae426082
(??{...}) version:
#!/usr/bin/env perl use strict; use warnings; use 5.016; our $VERSION = 0.1; # Rebuild PNG image local $/; my $input = <DATA>; chomp $input; $input = pack "H*", $input; # Skip PNG Header $input = substr( $input, 8 ); while ( my ( $len, $tag, $data, $crc ) = $input =~ / # four byte length (....) # four byte tag (....) # variable length data (??{ # unpack length my $len = unpack "N", $1; print STDERR "$2 $len\n"; return "(.){$len}"}) # four byte crc (....)/gx ) { say "Chunk len:", unpack "N", $len; say "Chunk tag:", $tag; say "Chunk data:", unpack "H*", $data; say "Chunk data len:", length $data; say "Chunk CRC: ", unpack "H*", $crc; } __DATA__ 89504e470d0a1a0a0000000d4948445200000010000000100803000000282d0f530000 +015c504c544547704c4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb5 +4f7eb54f7eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb34f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb44f7 +eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb5 +4f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7 +eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb44f7eb5 +4f7eb54f7eb54f7eb54f7eb34f7eb54f7eb54f7eb54f7eb54f7eb44f7eb54f7eb54f7 +eb44f7eb54f7eb54f7eb44f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb54f7eb4 +4f7eb54f7eb54f7eb5a29b27fc0000007374524e5300242d75280b02fa8bae61fdf7f +bf5e266d109dd6462faf9fc6b0dde6571b3d9b0031f01e0cd83d64505928fc51eb405 +582a426d052f12e5840e63ecea046ab248d0f6327c3e0c4ff311b67960822ef295f8a +093af76e3683c3d7213eb02192cbb8106f190db5ad42cc19ba7504c36150ec28de169 +2647000000cb4944415418d3636040005b264b0614e0e7e50c24f595617ccf201e5f3 +706062e16289f8d39b0d83fc09521980bc237737011e6f370b260080fb11390111195 +55b0b6f7f671b7d2658808e3e665176397282e2e766404298de229860173539040226 +f71b18db43848401eac222db9983d9445925385bb584e0a2490915a1c13c9c0c0aac1 +5fccc10612c8e1e74b8806d28682eacc60fb8b84d3536281b4910ea731588043303b3 +30e481ba8ea690b8104720b05b2e2813413879a96124840282fbf2009c4d034615504 +520015a424be37c5c7be0000000049454e44ae426082
output:
IHDR 13 PLTE 348 tRNS 115 IDAT 203 DAT 52041 Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/ +(.){ <-- HERE 52041}/ at regex2.pl line 17, <DATA> chunk 1.
To me, this looks like the IDAT chunk match is failing so it's backtracking and seeking forwards
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Modifying pos() from within (?{...})
by haukex (Archbishop) on Apr 26, 2018 at 11:56 UTC | |
by mxb (Pilgrim) on Apr 26, 2018 at 13:18 UTC | |
by haukex (Archbishop) on Apr 26, 2018 at 13:50 UTC | |
Re: Modifying pos() from within (?{...})
by tybalt89 (Monsignor) on Apr 26, 2018 at 12:46 UTC | |
by mxb (Pilgrim) on Apr 26, 2018 at 13:07 UTC | |
Re: Modifying pos() from within (?{...})
by Anonymous Monk on Apr 26, 2018 at 13:42 UTC | |
by mxb (Pilgrim) on Apr 26, 2018 at 14:23 UTC | |
by Eily (Monsignor) on Apr 27, 2018 at 15:10 UTC | |
by AnomalousMonk (Archbishop) on Apr 27, 2018 at 16:31 UTC | |
by Anonymous Monk on Apr 26, 2018 at 13:55 UTC |