Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Parallel maintenance on many projects, part I

by brian_d_foy (Abbot)
on Sep 01, 2004 at 22:04 UTC ( [id://387725]=perlmeditation: print w/replies, xml ) Need Help??

I have a medium-sized directory where I store all my CVS working copies. Everything is in one place, even though they might come from different servers (or even different source control products). I want to do a lot of parallel maintenance in these directories, and as I go about this, I am going to write this article in parallel too. I won't skimp on the details, or hide the stupid mistakes I make. You get to see how I actually do something, including the false starts, momentary digressions, and V-8 moments ("D'oh! I shoulda used a command line!").

I start in my Dev directory, which looks like it should mean "device" but really means "development". If the computer and I agree to stay out of each other's file systems I don't think this will a problem.

[1246]$ pwd /Users/brian/Dev [1247]$ ls Apache NPR iCab Articles Net iphoto_shell Business Netscape ora-weblog-stats CGI_MetaFAQ Object orn-weblogs CVSROOT-bdf Palm pause_id CVSROOT-ns Pod pausecgi CVSROOT-panix Polyglot pdf-rotate CVSROOT-rss Test perlbrowser ConfigReader Thoth perlfaq Configs Tie release Courses Tk scrathpad Data XML scriptdist Devel YAPJ text_density HTML bin-personal use.perl HTTP cpan weather Mac doc2png weblint++ MacOSX grepurl weblogs.d Module httpsniffer webreaper

In that list are new projects, like "Palm", and projects I haven't bothered to look at in a couple years, like "CGI_MetaFAQ" and whatever I might have in "Devel" (as I write this, I don't remember what that is).

I have two things I need to do, and would normally be manageable if I didn't have my finger in some many things. Most of these directories look like Perl distributions, and I want to check each distribution against a list of things I think should be true (e.g. CVS is up to date, has a README, has a META.yml, has a pod.t, and so on).

The other task involves cleaning up the CVS/ROOT files. SourceForge used to give out CVS server names like because it could do wildcard sub-domains. When they upgraded BIND, they lost that feature. However, I'm stuck with a bunch of CVS/ROOT files that have the old host names.

I can't successfully run cvs update in those directories since they have no valid host to connect to.

I'm going to handle the CVS/ROOT problem first, because it should be easier, and once I fix it, I shouldn't have to do it again. Even before I start I predict that 80% of the work is already done, meaning that 80% of the directories are up-to-date in the CVS repository, so I only need to delete my working copy and check out the current HEAD. That updates the CVS/ROOT files automatically. How do I find out which directories qualify?

I reflexively pull out Perl and type "use File::Find", but even though that might be faster than find(1), I have a small corpus and programmer time is more important. If find(1) takes twice as long, I don't care. I also make a note that I need to start my new File::Find::Functions project: collect subroutines that people can shove right into File::Find::find.

How many directories do I need to look at? I use find(1) to list them.

[1276]$ find . -name ROOT ./cpan/CVS/ROOT ./cpan/t/CVS/ROOT ./HTTP/Size/t/CVS/ROOT ./Test/CVS/ROOT ./Test/Data/CVS/ROOT ./Test/File/CVS/ROOT ./Test/File/lib/CVS/ROOT ./Test/ISBN/CVS/ROOT ./Test/ISBN/lib/CVS/ROOT ./Test/ISBN/t/CVS/ROOT

Huh? What's up with that? I should have a bunch of those files. I was expecting most of those to be CVS working copies. I look in some of the other directories. Most of the time the file is actually "CVS/Root". I probably knew that.

I modify my find(1) and count the lines of output.

[1280]$ find . -name R[oO][oO][tT] | wc -l 347

So how many do I need to fix? A good CVS/Root file should have "" as the host. Hosts like and are stale. I take the output from find(1) and pipe it to xargs, where I simply cat(1) the file then use grep(1)'s -v switch to match things that don't match "cvs.sourceforge". Before I do that, I realize I have another candidate for Randal Schwartz's "Useless use of cat" Award. I can give grep(1) the filenames directory, and to much better advantage: grep(1) will prepend the file name to the output, like in this excerpted output.

[1286]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" ./Business/ISBN/CVS/ +sroot/perl-isbn ./Business/ISBN/scripts/CVS/Root::ext:comdog@cvs.perl-isbn.sourceforge ./Business/ISBN/t/CVS/ +cvsroot/perl-isbn ./Business/ISBN-Data/CVS/ +t:/cvsroot/perl-isbn ./Business/ISBN-Data/t/CVS/Root::ext:comdog@cvs.perl-isbn.sourceforge. +net:/cvsroot/perl-isbn

Now I want to a list of those directories. My first try doesn't work.

find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2>/dev/nu +ll | perl -lnaF: -e "$F[0] =~ s{^./}{}; print $F[0]" Can't modify single ref constructor in substitution (s///) at -e line +1, near "s{^./}{};" Execution of -e aborted due to compilation errors.

What the heck is this? Who's the "single ref constructor? I try it without the substitution.

[1303]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne "print $F[0]" ARRAY(0x800368) ARRAY(0x800368) ARRAY(0x800368)

Why is $F[0] an anonymous array? Am I going crazy? I look at all of @F. It certainly looks like a normal array. I go back to the first place I learned about -F: Randal's first Unix Review article.

[1305]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne "$, = ' <<<>>> '; print @F" ./Tie/Toggle/t/CVS/Root <<<>>> <<<>>> [snip...]

I go back a step, and print out the dereference first element in @F. It's even odder.

[1310]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne "print @{$F[0]}" 0

Have you spotted the error yet? I should have, because I make it often enough, and it's not a Perl error. That 0 is very telling. Compare the previous versions with the correctly working version. Now I'm sure that I'm a bonehead. For bonus points, figure out where the anonymous array came from (now I know). The excerpted output shows just the directory names.

[1313]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne '$F[0] =~ s{^./}{}; print $F[0]' weather/CVS/Root weather/t/CVS/Root

Now I want to know which of those directories need to check changes into CVS. Those are going to be the ones that need attention. Rather than continue this already too long command line, I redirect that output into a file, but before I do that I modify the command line to remove the trailing /CVS/ROOT from the directories. I end up with a list of directories I want to run cvs update in.

[1319]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne '$F[0] =~ s{^./|/CVS/ROOT$}{}ig; print $F +[0]' > bad_cvs_root.txt

For the next step I want to change into each of those directories and run cvs update. I need to look at the output. I realize that the network traffic is going to take a while, so I should minimize it by looking at the highest directory possible in each working copy. I then realize if I wanted to do that, I could make my list with ls -1. Oh well. Had I done that I wouldn't have reminded myself about proper shell syntax. So let's give it a go, but first, I need to go to the kitchen to get some water.

My first pass is very simple. I figure I will use this program again, perhaps as a nightly check to see what I've forgotten to check in.

I don't let myself use all the quick and dirty tricks, and I throw in File::Spec::Functions with an eye toward portability.

#!/usr/bin/perl use warnings; use strict; use File::Spec::Functions qw(catfile); my $Base = "/Users/brian/Dev"; @ARGV = catfile( $Base, 'bad_cvs_root.txt' ); while( <> ) { chomp; my $dir = catfile( $Base, $_ ); print "checking $dir ...\n"; chdir $dir or do { warn "Could not chdir $dir: $!"; next }; my $output = `cvs update 2>&1`; print $output; print "-" x 73, "\n"; }

Well, it only 20% works. Remember all those bad CVS/ROOT files? I never fixed them, although I am changing into each directory to check cvs. I get a lot of bad host errors because I still have the problem I started with. I need to fix that first. Remember when I went to the kitchen to get water? When I came back I got a bit ahead of myself. Taking the break got me out of the flow.

First (again), I need a list of all the bad CVS/ROOT files. In my last command line, I stripped off the CVS/ROOT portion. Now I remember why they were there. No worries, though. I just go back a bit in the shell's history and try again.

[1329]$ find . -name R[oO][oO][tT] | xargs grep -v "cvs.sourceforge" 2 +>/dev/null | perl -aF: -lne '$F[0] =~ s{^./}{}ig; print $F[0]' > bad_ +cvs_root_no_really.txt

I modified my previous program to go through the files for me, and I give Ingy's IO::All module a spin so I don't have to worry about open() and friends (and IO::All is just cool). I realized once I ran this program that it's really just another perl command line if I use in-place editing, but I'm a bit leary of really screwing up, so I mollify myself with my appropriate caution. I first print what I want to put back into the files before I actually do it, and I save the original file contents in cvs-root-originals.txt. If I mess up I can recreate the files, at least. Check out that spiffy IO::All append mode.

#!/usr/bin/perl use warnings; use strict; use IO::All; use File::Spec::Functions qw(catfile); my $Base = "/Users/brian/Dev"; @ARGV = catfile( $Base, 'bad_cvs_root_no_really.txt' ); while( <> ) { chomp; my $file = catfile( $Base, $_ ); my $contents < io( $file ); "$_^^^$contents" >> io( "cvs-root-originals.txt" ); $contents =~ s{comdog\@cvs\.(.*\.)sourceforge} {comdog\@cvs\.sourceforge}i; # $contents > io( $file ); print "$contents", "-" x 73, "\n"; }

To do the real thing, I uncomment the line to write the information back to the file. After I really run it, I look at one of the files to ensure it worked, but I get that little shot of adrenaline when I think I've really screwed up. By now you might have the idea that I'm as bad a coder as anyone else. I think you would be generous saying that. Remember, an expert is someone who has made every mistake.

[1351]$ more Business/ISBN/scripts/CVS/Root Business/ISBN/scripts/CVS/Root: No such file or directory

Even with the momentary terror, I've made enough stupid mistakes that I know not to panic right away. It turns out I'm just in the wrong directory.

[1352]$ pwd /Users/brian/Desktop [1353]$ more ~/Dev/Business/ISBN/scripts/CVS/Root

Now I go back to my program to check the state of CVS. I don't get any unsuccessful connection attempts this time. I need to determine when the CVS output is interesting (i.e. I need to deal with changes), or when I can ignore it.

In some cases, I just get progress output, and I don't need to do anything.

---------------------------------------------------------------------- +--- checking /Users/brian/Dev/Test/HTTPStatus... cvs update: Updating . cvs update: Updating lib cvs update: Updating t ---------------------------------------------------------------------- +--- checking /Users/brian/Dev/Test/HTTPStatus/lib... cvs update: Updating . ---------------------------------------------------------------------- +--- checking /Users/brian/Dev/Test/HTTPStatus/t... cvs update: Updating .

In a lot of cases, I need to do something. While my script is running, I can go through the output and start fixing things. This script is a bit rough, and it's checking a lot of directories more than once. When it checks Business/ISBN, for instance, it also checks Business/ISBN/t because CVS descends into sub-directories. Indeed, this all starts as Business, which is the top directory in this tree.

checking /Users/brian/Dev/Business/ISBN ... ? t/untitled text 6 ? t/xisbn.t cvs update: Updating . cvs update: Updating scripts cvs update: Updating t C t/load.t C t/pod.t cvs update: move away t/xisbn.t; it is in the way C t/xisbn.t ---------------------------------------------------------------------- +--- checking /Users/brian/Dev/Business/ISBN/scripts... cvs update: Updating . ---------------------------------------------------------------------- +--- checking /Users/brian/Dev/Business/ISBN/t... ? untitled text 6 ? xisbn.t cvs update: Updating . C load.t C pod.t cvs update: move away ./xisbn.t; it is in the way C xisbn.t

This gets back to the first task. Remember way back at the beginning when I said I had two things to do. Now I'm on the first one: ensuring things are as they should be. Before I start mucking around in the directories I want to ensure I'm working with the up-to-date working copies.

I need to modify my program to check only the top level directory. I add a %Seen hash to watch which directories I check. If I run into a directory I have already checked, I skip to the next one. The speed up is very noticeable. I make about 50 network connections instead of 300.

And, since I'm trying to cut out as much uninteresting output as possible, I want to output nothing if the working copy is up-to-date. In my release(1) program, I have code that already does this. Stolen directly from the release(1) source (I could use Module::Release which made this a function, too, but I'm going to change a lot of it), I add a parse_cvs() subroutine to my program.

#!/usr/bin/perl use warnings; use strict; use File::Spec::Functions qw(catfile); my $Base = "/Users/brian/Dev"; @ARGV = catfile( $Base, 'bad_cvs_root.txt' ); my %Seen = (); while( <> ) { chomp; my $top_level = (split m|/|, $_)[0]; # ugh, not portable next if exists $Seen{ $top_level }; $Seen{ $top_level } ++; my $dir = catfile( $Base, $top_level ); print "checking $dir ... "; chdir $dir or do { warn "Could not chdir $dir: $!"; next }; my @output = `cvs update 2>&1`; my $message = parse_cvs( @output ); print $message ? "\n$message" : "up-to-date\n"; } sub parse_cvs { my %cvs_state; my %message = ( C => 'These files have conflicts', M => 'These files have not been checked in', U => 'These files were missing and have been updated', A => 'These files were added but not checked in', '?' => q|I don't know about these files|, ); my @cvs_states = keys %message; foreach my $state ( @cvs_states ) { my $regex = qr/^\Q$state /; $cvs_state{$state} = [ map { my $x = $_; $x =~ s/$regex//; $x } grep /$regex/, @_ ]; } local $" = "\t"; my $rule = "-" x 50; my $string = ''; foreach my $key ( sort keys %cvs_state ) { my $list = $cvs_state{$key}; next unless @$list; $string .= sprintf "\t$message{$key}\n\t$rule\n\t@$list\n\n"; } return $string; }

This gets me a very nice report: 279 lines of output (with plenty of whitespace and blank lines). Most of these seem to be new pod.t files. I vaguely remember writing something to replace all the pod.t files with the latest interface (and maybe I should write it again since Andy recently updated it to use taint checking).

checking /Users/brian/Dev/Tie ... These files have conflicts -------------------------------------------------- Cycle/t/pod.t These files have not been checked in -------------------------------------------------- Toggle/t/pod.t checking /Users/brian/Dev/use.perl ... These files have not been checked in -------------------------------------------------- journal_reader/_journal.tmpl checking /Users/brian/Dev/weather ... I don't know about these files -------------------------------------------------- .lwpcookies These files have not been checked in -------------------------------------------------- t/pod.t checking /Users/brian/Dev/weblint++ ... These files have not been checked in -------------------------------------------------- weblint++ t/pod.t

Okay, that's enough for me to work on for now. It's a good first step to getting my act together and doing a lot of needed maintenance on this stuff. But it's time to take a break.

Before I stop writing, though, I have a few ideas on what's next: how about a report that pulls in stuff from RT too? And automatically running all the tests in all the projects? Next time, next time ... :)

brian d foy <>

Replies are listed 'Best First'.
Re: Parrallel maintenance on many projects, part I
by duff (Parson) on Sep 02, 2004 at 14:38 UTC
    Have you spotted the error yet?

    I have to admit, I was perplexed for a good minute or two before I saw it. And the reason I didn't spot it sooner was because I always use single quotes on the command line. There's probably a lesson here about cultivating good habits or something. (Where "good" is platform dependent :-)

    Excellent walk through BTW! I don't think I'd have the courage to publicly divulge all of my false starts and small yet confounding mistakes (there are far too many of them ;-). I look forward to part II.


Re: Parrallel maintenance on many projects, part I
by drewbie (Chaplain) on Sep 02, 2004 at 18:24 UTC
    First, ++ on a great post. I find that I learn the most these days from posts that give me a glimpse into someone else's thought processes. Thinking Outside The Box if you will, where the Box is my brain. :-)

    I'm curious how you put together such a detailed article? Did you use the shell's command history? Did you use cvs to save each script as you modified it? Did you plan on writing up this as a post before hand? Inquiring minds want to know. The key question is how did you make all this reproducible?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://387725]
Approved by kvale
Front-paged by Courage
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-17 07:01 GMT
Find Nodes?
    Voting Booth?

    No recent polls found