borgis has asked for the wisdom of the Perl Monks concerning the following question:

I have some hex data that I am trying to locate the starting and ending point of some text data (there could be one or many text areas). x0002000000 identifies the start and x00020400 identifies the end What is the best way to loop through this hex extracting the text, using the above start and end values. Thanks. 045a010000020000004c4553 4f4e4452412043414c4c4544 204241434b2c204920414456 495345442054484154205745

Replies are listed 'Best First'.
Re: Reading hex data
by Zaxo (Archbishop) on Aug 09, 2005 at 00:47 UTC

    We need to know how those are encoded in your data. Are they octets representing ASCII characters or are they 32-bit binary words? Your expressions for the start and stop strings contain ten and eight characters, respectively, so assumptions are hard to make.

    Take a look at the index function. It could work in either case, so long as the search substring used the same encoding as the data.

    After Compline,
    Zaxo

Re: Reading hex data
by saintmike (Vicar) on Aug 09, 2005 at 01:19 UTC
    Find the relevant text using a regular expression, then go through it two characters at a time and use hex() and chr() to decode (added a closing 00020400 to your data):
    my $raw = join '', <DATA>; $raw =~ s/\n//g; if(my($encoded) = ($raw =~ /0002000000(.*?)00020400/)) { while($encoded =~ /(..)/g) { print chr (hex($1)), "\n"; } } __DATA__ 045a010000020000004c45534f4e 4452412043414c4c454420424143 4b2c204920414456495345442054 48415420574500020400
    cracks your 'code':
    L E S O N D R A C A L L E D B A C K ,

      The same, but with an unpack:

      my $raw = do { local $/; <DATA> }; # inline slurp $raw =~ s/\s//g; print map { chr hex } unpack '(A2)*', $1 if $raw =~ m/0002000000(.*?)00020400/;

      Perhaps a bit too idiomatic, but I prefer using unpack to split characters into fixed widths. It is a tiny bit faster, though not blazingly so:

      Rate regex unpack regex 953336/s -- -38% unpack 1549170/s 62% --
Re: Reading hex data
by GrandFather (Saint) on Aug 09, 2005 at 01:13 UTC

    Taking the simplest interpretation of your problem description and modifying the sample data somewhat, the following may be what you want:

    use warnings; use strict; my $start = '0020'; my $end = '0204'; my $data = join "", <DATA>; pos ($data) = 0; while (pos ($data) < length ($data)) { last if $data !~ /\G.*?($start)/gis; my $begin = pos ($data); last if $data !~ /\G.*?($end)/gis; my $end = pos ($data) - length ($end); print ((substr $data, $begin, $end - $begin) . "\n"); } __DATA__ 045a010000020000004c4553 4f4e4452412043414c4c4544 204241434b2c20492041 +4456 00020400 495345442054484154205745 prints: 000004c4553 4f4e4452412043414c4c4544 204241434b2c204920414456 00
    Update:Using index per Zaxo's suggestion would probably be fast and cleaner than this.

    Perl is Huffman encoded by design.
Re: Reading hex data
by GrandFather (Saint) on Aug 09, 2005 at 00:43 UTC

    There is no x00020400 in the input data supplied. Should there be more data in your sample?


    Perl is Huffman encoded by design.
Re: Reading hex data
by anonymized user 468275 (Curate) on Aug 09, 2005 at 09:56 UTC
    There is a start token in the sample data, but there are 8 hexadigits in front of it. Therefore IMO the OP has made a typo and means to suggest that

    - the start token is x00020000

    - his data is 32 bit (a.k.a 8 decoded nybbles)

    The second trap to avoid is matching across 8 byte boundaries, which, as can be seen in the code of one of the replies, but was expressed as a preference rather than a necessity, can be done using unpack.

    One world, one people