Interactive scripting with debugger

My sampling may be very skewed but I am surprised at how few Perl programmers in my direct acquaintance take advantage of the Perl debugger as a coding (as opposed to debugging) tool. In combination with Emacs's shell mode it enables me to write my scripts interactively. Maybe there are better alternatives to what I illustrate below (and I am eager to learn of them), but if not, I hope at least some of you will find the following technique useful.

NB: the Emacs stuff below is not essential to the technique I want to illustrate here, just convenient; all the interaction with the Perl debugger can be done directly by invoking it from a regular shell instead of an Emacs shell. On the other hand, I have never run the Perl debugger on Windows, so I can't say how much of what I illustrate below applies there.

For example, suppose that I want to write a script to munge a large-ish text file, mongo.tab, whose structure/constraints are not entirely clear to me. I know that its first line contains headers, and that the fields on each line are separated by tabs, but I still have questions such as, are the entries in the first column unique?; does this or that regular expression capture all rows I am interested in?; are there empty cells, and if so what fraction of all the cells are these? Etc.

So I start by writing the first part of the script:

use strict;
use warnings;

chomp( my @lines = do { local @ARGV = 'mongo.tab'; <> } );
my @headers = split /\t/, shift @lines;
my @records = map [ split /\t/ ], @lines;
1;
[download]

This gets me to the point where all the lines have been reduced to records of fields, and I'm ready to do some exploring. (The last line, consisting of only "1;" is a "breakpoint hook" for the debugger, as I'll show in a minute.)

Then, right from within my editor, Emacs, I split the window in top and bottom halves (C-x 2), switch to the lower one (C-x o), start a shell interaction buffer (M-x shell) (the short script listed above remains in the top half), and finally fire up the Perl debugger, right in the Emacs shell buffer, giving it my newborn script as fodder:

% perl -d munge.pl
Loading DB routines from perl5db.pl version 1.23
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(munge.pl:4):    chomp( my @lines = do { local @ARGV = 'mongo.ta
+b'; <> } );
  DB<1>
[download]

The debugger (affectionately known as DB) shows the first executable line of my script and waits for my instructions. In this case I am not interested in debugging my code (I know it's flawless :-) ); I just want to get to the point that I can use Perl interactively to explore the nature of the data I'm dealing with. Therefore, I just use the command "c 7" (c being short for "continue until line number ") to tell DB to go ahead and let the script execute, but stop it at line 7, where I had previously placed the "breakpoint hook" I mentioned earlier. This is basically a "no-op" executable line where the DB can stop my script after all the lines I am interest in have executed.

After a few seconds of digestion, DB tells me where the script has been stopped, and gives me another prompt:

  DB<1> c 7
main::(munge.pl:7):    1;
  DB<2>
[download]

OK, time to find out what we got. First, how many rows and columns do we have? To do this I use the p command (short for print I suppose) to print out the sizes of @records and @headers, for the numbers of rows and columns, respectively:

  DB<2> p scalar @records
16215
  DB<3> p scalar @headers
118
  DB<4>
[download]

OK, about 16K rows and about 100 columns. Let's see if the entries in the first field are unique. To do this I use the entries in this field as keys for a hash, %h, and check whether the number of keys in this hash is equal to the number of records:

  DB<4> $h{ $_ }++ for map $_->[ 0 ], @records
  DB<5> p scalar keys %h
16215
  DB<6>
[download]

The number matches the number of rows we got earlier, meaning that the entries in the first field are indeed unique, otherwise the number of keys in %h would have been smaller than the number of rows (or equivalently the number of records in @records).

Now let's see if the entries in the second field are unique (I happen to know that it is supposed to be a "near synonym" of the first field); we repeat the same trick, which is facilitated by the fact that my DB has readline and history enabled, so I can just step back through my interaction history to get the next-to-last line, and then I can edit that just like I would edit any other line inside an Emacs buffer (if readline and history are enabled this is possible even if the DB session was initiated from any shell). To step back through the history I use M-p (if I had started the DB session from a regular Unix shell such as bash, tcsh, or zsh I would use C-p to step back through the history, but this key combination has a different meaning inside an Emacs buffer, which is the context of the current interaction). OK, so I make some minor changes in my previous line to test the uniqueness of entries in the second field:

  DB<6> $h2{ $_ }++ for map $_->[ 1 ], @records
  DB<7> p scalar keys %h2
9027
[download]

Aha! The entries in the second field are not unique. Let's find out which entry appears most often in the second column. I'll sort the keys of %h2 descendingly by the corresponding values (which record the numbers of times each key was encountered when the hash was initialized):

  DB<8> p ( sort { $h2{ $b } <=> $h2{ $a } } keys %h2 )[ 0 ]

  DB<9>
[download]

Huh? Nothing? It looks like the most common key may be the empty string; let's check:

  DB<10> p $h2{''}
2904
  DB<11>
[download]

OK, so we have almost 3K empty entries in the second column. That's not terribly interesting; what about the second most common entry in the second column? Same trick: I sort descendingly, but this time I pick out the item that comes up in second place:

  DB<12> p ( sort { $h2{ $b } <=> $h2{ $a } } keys %h2 )[ 1 ]
"BBX "
[download]

Hey, waitaminnit! What's that space doing there, after 'BBX'? There is supposed no leading or trailing whitespace in all these entries. It looks like someone goofed at the time of generating the file. No matter, we have to deal with it.

So I switch back to the top half of my editor window and fix the regexp used for splitting the records into fields:

my $re = qr/ *\t */;
my @headers = split /$re/, shift @lines;
my @records = map [ split /$re/ ], @lines;
[download]

I define a regexp object $re that I can use in both splits. Note that I don't define it as /\s*\t\s*/ because this would give me incorrect splitting when a line contained empty fields.

With this change in place, I re-parse the file by restarting the script with the R command (short for Restart):

  DB<13> R
Warning: some settings and command-line options may be lost!

Loading DB routines from perl5db.pl version 1.23
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(munge.pl:4):    chomp( my @lines = do { local @ARGV = 'mongo.ta
+b'; <> } );
  DB<12>
[download]

...and I am ready for some more exploring.

I hope the above example gives you an idea of the power of using DB as what amounts to a "Perl shell". I've only scratched the surface, having barely illustrated only three commands c, p, and R, but this meditation is already getting a bit too long, so I better stop. For more info on DB see perldebug.

the lowliest monk

Janitored by holli - retitled from Interactive scripting with DB (1/13/0)

Comment on Interactive scripting with debugger Select or Download Code

Replies are listed 'Best First'.
Re: Interactive scripting with DB by mugwumpjism (Hermit) on May 26, 2005 at 02:54 UTC
There's sure no better way to "become one" with a program, putting yourself "in its shoes" than running it inside an interactive debugger. You might also like these tips and tricks; Putting the line `kill 2, $$` will cause the debugger to break at that point in the source (on Unix, anyway). This works by emulating pressing Ctrl+C (it's the interrupt signal). This is great for setting breakpoints deep down in your code. If you ever turn strings into real functions via `eval`, then make sure to add a 'magic line number comment': eg, `sub some_auto_generated_function { my $self = shift; # line 1 "some/preprocessedfile.foo" print $self->frop; # line 7 "originalfile.pm" }` [download] If you do this, the emacs debugger will correctly step through the right source files. Sadly, though, the old perl5db.pl interactive debugger doesn't actually display the source properly. Devel::ebug is a brand new debugger that doesn't have a lot of the limitations of the old debugger, in particular the segfaults, but instead brings a whole lot more new Storable related ones! Joy! But, anyway, this debugger is client/server with a very lightweight in-program "server", so you can debug remote programs. If you could write an emacs-compatible communication layer for it, you could bring this debugging to the "next level" of perl debuggers, which might include the debugger for Perl 6. $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n"; [download]	[reply] [d/l] [select]
Re^2: Interactive scripting with DB by tlm (Prior) on May 26, 2005 at 03:42 UTC
Putting the line kill 2, $$ will cause the debugger to break at that point in the source (on Unix, anyway). I had never seen that trick before. Cool. I have been using this one instead: `$DB::single = 1;` [download] That causes the debugger to enter "single-step" mode, which amounts to programmatically setting a breakpoint in the next executable line. This is particularly useful for causing the debugger to stop at places that happen before the first executable line, e.g.: `BEGIN { $DB::single = 1; } use Foo;` [download] The code above will enable stepping through the loading of 'Foo', which otherwise would happen before the place at which DB normally starts (i.e. the first executable line). I see that `kill 2, $$` works well for this too. One more trick for the bag. the lowliest monk	[reply] [d/l] [select]
Re: Interactive scripting with DB by perrin (Chancellor) on May 26, 2005 at 02:44 UTC
In my experience, most Perl programmers (in fact, most programmers) don't use the debugger at all.	[reply]
Re^2: Interactive scripting with DB by tlm (Prior) on May 26, 2005 at 04:30 UTC
It sure looks that way. I find it inexplicable... It's like writing Perl programs without ever using hashes: it is certainly possible, but why forgo the benefits of such a powerful tool? the lowliest monk	[reply]
Re^3: Interactive scripting with DB by demerphq (Chancellor) on May 26, 2005 at 08:35 UTC
You can count me in the "rarely use the debugger" school. I've always found using the Perl debugger is a process of frustration, I use win32 so it tends not to play nicely, but even when it isn't playing silly-buggers I still find it annoying. Its like inspecting a house through a keyhole, you never quite know how what you are looking at fits into the big picture. Ive also had far to much experience with code that has been "patched" from the debugger, and not well. I think the issue is that the debugger tells us "what" and usually the real problem involves a lot of "why". Debugging with print statements IMO tends to promote a holistic view of the program. Its necessary to have a good understanding of the program flow and processes to debug this way, and I think overall this promotes a better quality of programming. If you need to step through your code in a debugger to understand why it isnt doing the right thing then I think its not unreasonable to argue that your code is too complex and needs to be rethought. And such rethinking wont happen while you are in the debugger. I leave you with a quote by a famous programmer about why he doest much like debuggers. Admittedly the debugger/debugging he is talking about is kernal debugging, which is a somewhat specialized area but i think the points are valid nonetheless. I happen to believe that not having a kernel debugger forces people to think about their problem on a different level than with a debugger. I think that without a debugger, you don't get into that mindset where you know how it behaves, and then you fix it from there. Without a debugger, you tend to think about problems another way. You want to understand things on a different _level_. It's partly "source vs binary", but it's more than that. It's not that you have to look at the sources (of course you have to - and any good debugger will make that _easy_). It's that you have to look at the level _above_ sources. At the meaning of things. Without a debugger, you basically have to go the next step: understand what the program does. Not just that particular line. which is an excerpt from this posting by Linus Torvalds. --- $world=~s/war/peace/g	[reply]
Re^4: Interactive scripting with DB by data64 (Chaplain) on May 26, 2005 at 21:23 UTC
Re^3: Interactive scripting with DB by adrianh (Chancellor) on May 26, 2005 at 09:42 UTC
It's like writing Perl programs without ever using hashes: it is certainly possible, but why forgo the benefits of such a powerful tool? Because some people find they get more benefit using other techniques. See Are debuggers good? for a long vitriolic thread on this very topic :-) For example I personally find doing TDD a far more effective use of my time than spending time in the debugger. The mere fact that I need to drop into the debugger is a sign that I've fouled up earlier since I've obviously written code that is too hard for me to understand. The incidence of me using the debugger on my own code is as close to zero as makes no difference. About the only time I use the debugger now is when poking at other peoples code and doing exploratory testing, or maybe trying out a one liner.	[reply]
Re^4: Interactive scripting with DB by tye (Sage) on May 26, 2005 at 15:59 UTC
Re^5: Interactive scripting with DB by adrianh (Chancellor) on May 27, 2005 at 00:45 UTC
Re^5: Interactive scripting with DB (effective ways?) by demerphq (Chancellor) on May 27, 2005 at 07:30 UTC
Some notes below your chosen depth have not been shown here
Re^3: Interactive scripting with DB by Anonymous Monk on May 26, 2005 at 04:50 UTC
There are no benefits. You're a dyslexic wondering why everyone isn't dyslexic.	[reply]
Re: Interactive scripting with DB by Mutant (Priest) on May 26, 2005 at 09:11 UTC
I think the "you should be using the debugger" sentiment misses one really important point: TIMTOWTDI If the debugger works for you, then great. Personally, I find print/warn is all I need (and usually feeding input with Data::Dumper). I have used the debugger in the past, but like demerphq I find it usually operates on a lower level than my thinking is working at.	[reply]
Re: Interactive scripting with DB by mattk (Pilgrim) on May 26, 2005 at 02:28 UTC
While it's not as powerful as using the debugger itself, I use a small block based around eval to achieve mostly the same thing: `while (1) { print "$0> "; chomp(my $input = scalar <STDIN>); last if $input =~ m/^(q\|quit)$/; my $res = eval $input; chomp(my $out = $@ ? $@ : $res); print "$out\n"; }` [download]	[reply] [d/l]
Re: Interactive scripting with DB by bmann (Priest) on May 26, 2005 at 19:03 UTC
Another option is ptkdb, a tk based point-and-click debugger. It allows quick variable inspection, break points are easily set (with or without conditions), watch lists, sub lists, break on warnings, etc. It offers nothing that can't be done with DB, but the learning curve is a little less challenging.	[reply]
Re: Interactive scripting with debugger by redhotpenguin (Deacon) on May 27, 2005 at 07:11 UTC
I use the perl debugger every day. Thanks for showing me some more cool things to do with it. It's great for stepping through unit tests, similar to the example you have shown here.	[reply]
Re: Interactive scripting with debugger by bsb (Priest) on May 31, 2005 at 03:39 UTC
I regularly use the debugger to quickly explore a module or programming environment. Often just running through the synopsis of a module in the debugger gives you a good idea of how it works (and where it breaks). In my own projects I'll have a "dbg" script which loads everything, sets up a few objects and leaves me at the debugger prompt: `#!/usr/bin/perl -dl BEGIN { DB::parse_options("NonStop=1"); } use B::Deparse; $deparse = B::Deparse->new(); sub code { return $deparse->coderef2text($_[0]); } # p code ( \&Some::sub ) # Setup up env here print <<'TIPS'; ----------------------------------------------------- instructions and defined variables here ----------------------------------------------------- TIPS DB::parse_options("NonStop=0");` [download]	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks