Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output

PDL has a debug mode which tells you in some detail what it's doing, including giving memory addresses (the joy of working in C). I'm currently tracking down the underlying cause of https://github.com/PDLPorters/pdl/issues/356, and have narrowed it down to a small repro case where a command-line switch makes it either croak, or not. Either mode produces several hundred lines of debug output. Diffing the two cases is useless because the addresses get randomised by https://en.wikipedia.org/wiki/Address_space_layout_randomization. If only there were a tool that could consistently pseudonymise those addresses so they get replaced by ADDR1 for the first one, etc, for easier diffing.

Perl to the rescue!

#!/usr/bin/env perl

# address-pseudonymise [file] or read STDIN

use strict;
use warnings;

my (%addr2number, $i);
while (<>) {
  s:^==\d+==:==[PID]==:; # if you used valgrind, replace process ID
  s:0x([0-9a-f]+):
    '[ADDR'.($addr2number{$1} //= ++$i).']'
  :gie;
  print;
}
[download]

Comment on Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output Download Code

Replies are listed 'Best First'.
Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by Fletch (Bishop) on Feb 18, 2022 at 16:29 UTC
Similar techniques are also useful for `strace` output; I've done similar things to timestamps (converting them all to be relative to the start time) and to addresses in return values (like what's done here). The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
Re^2: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by etj (Deacon) on Feb 19, 2022 at 09:00 UTC
Might be useful to extract into a little App::something that could take various REs and apply a specified pseudonymisation to them, e.g. subtract a value, or just replace with a label+sequence.	[reply]
Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by LanX (Saint) on Feb 18, 2022 at 13:46 UTC
Pardon my ignorance. Are you saying the anonymizing of addresses - which is done for security - is so weak that a simple algorithm can decrypt them? Or am I missing something? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^2: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by hippo (Bishop) on Feb 18, 2022 at 14:12 UTC
ISTM that etj's code simply replaces each address in turn of each run by a numbered placeholder so them being different addresses doesn't ruin his diffs. The addresses within each run are consistent but between runs are not. He isn't decrypting anything, just saying the first address seen will always be ADDR1 and the second ADDR2, etc. Is that clearer or even more confusing? 🦛	[reply]
Re^3: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by LanX (Saint) on Feb 18, 2022 at 15:10 UTC
IOW he's identifying identical addresses by the order given in the trace's output to rename them consistently? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by hippo (Bishop) on Feb 18, 2022 at 15:15 UTC
Re^5: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by LanX (Saint) on Feb 18, 2022 at 18:19 UTC
Re: Hex numbers (e.g. memory addresses) pseudonymising for comparable logging output by MikeKethy (Initiate) on Feb 20, 2022 at 13:57 UTC
Thank you so much for the explanation. I'm so much glad to read it here.	[reply]


Perl: the Markov chain saw
	PerlMonks