Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output

by etj (Deacon)
on Feb 18, 2022 at 12:23 UTC ( [id://11141464]=CUFP: print w/replies, xml ) Need Help??

PDL has a debug mode which tells you in some detail what it's doing, including giving memory addresses (the joy of working in C). I'm currently tracking down the underlying cause of https://github.com/PDLPorters/pdl/issues/356, and have narrowed it down to a small repro case where a command-line switch makes it either croak, or not. Either mode produces several hundred lines of debug output. Diffing the two cases is useless because the addresses get randomised by https://en.wikipedia.org/wiki/Address_space_layout_randomization. If only there were a tool that could consistently pseudonymise those addresses so they get replaced by ADDR1 for the first one, etc, for easier diffing.

Perl to the rescue!

#!/usr/bin/env perl # address-pseudonymise [file] or read STDIN use strict; use warnings; my (%addr2number, $i); while (<>) { s:^==\d+==:==[PID]==:; # if you used valgrind, replace process ID s:0x([0-9a-f]+): '[ADDR'.($addr2number{$1} //= ++$i).']' :gie; print; }
  • Comment on Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output
  • Download Code

Replies are listed 'Best First'.
Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output
by Fletch (Bishop) on Feb 18, 2022 at 16:29 UTC

    Similar techniques are also useful for strace output; I've done similar things to timestamps (converting them all to be relative to the start time) and to addresses in return values (like what's done here).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Might be useful to extract into a little App::something that could take various REs and apply a specified pseudonymisation to them, e.g. subtract a value, or just replace with a label+sequence.
Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output
by LanX (Saint) on Feb 18, 2022 at 13:46 UTC
    Pardon my ignorance.

    Are you saying the anonymizing of addresses - which is done for security - is so weak that a simple algorithm can decrypt them?

    Or am I missing something?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      ISTM that etj's code simply replaces each address in turn of each run by a numbered placeholder so them being different addresses doesn't ruin his diffs. The addresses within each run are consistent but between runs are not. He isn't decrypting anything, just saying the first address seen will always be ADDR1 and the second ADDR2, etc.

      Is that clearer or even more confusing?


      🦛

        IOW he's identifying identical addresses by the order given in the trace's output to rename them consistently?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output
by MikeKethy (Initiate) on Feb 20, 2022 at 13:57 UTC
    Thank you so much for the explanation. I'm so much glad to read it here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://11141464]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-19 07:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found