comment on

I recently started using git (actually cogito, which is sorta the same thing). I liked it the best out of the twenty or so choices while I was trying to grow out of cvs. It has a lot of really slick features, but one thing that it doesn't have is those lovely little rcs tags (e.g $Id: example.pl,v 1.352 jettero 2003-09-13 19:00:00$).

I wrote a quick little app to go through and replace out the $Id, $Revision$ , and $Date$ tags. I was happy with it. It used File::Find to find all the files, File::Copy to make a temp file and utime to fix the mtime after the search and replace.

I started running into trouble though, because it didn't understand the .gitignore file (which has the same format as the .cvsignore). I began adding clumsy glob support to my app, but it presented more trouble than I whould have thought. I ended up using open my $in, '-|', qw(git ls-files) or die $! to find the applicable files — since git knows how to list the files it's watching as a built in function. This fits well with the overall git model, because it's a collection of scripts that work together; but I'm still interested in how you'd do it in perl.

There's no way to guess the git revision before the commit, so instead of the revision I ended up using a Digest::SHA1 that skips the contents of the rcs tags. Actually, I prepend the number of lines and the number of bytes in the file also. For the $Date$ I used the mtime of the file.

my $h    = $sha1->b64digest;
my $rev  = "$lines.$bytes.$h"; $rev =~ tr/$/_/;
my $date = strftime('%Y-%m-%d %H:%M:%S UTC', gmtime($mtime));
[download]

My main question is concerning the best way to find all the files git finds. Before I thought to use qw(git ls-files) I was going to try opendir and build up a list of globs found in the .gitignore files I found along the way. (Something like the following, which is obviously incomplete.)

use warnings;
use strict;
use Text::Glob qw(glob_to_regex);

sub do_a_dir {
    my $dir   = shift;
    my @globs = @_;
    my @files = ();

    opendir my $dir, $dir or die "hrmph: $!";
    while( my $ent = readdir $dir ) {
        next if $ent =~ m/\A\.\.?\z/;

        if( -d $ent ) {
            push @files, &do_a_dir(@globs) unless &a_glob_matches($ent
+, @globs);

        } elsif( -f $ent ) {
            if( $ent eq ".gitignore" ) {
                open my $in, $ent or die "hrmph: $!";
                while(<$in>) {
                    chomp;
                    push @globs, glob_to_regex( $_ );
                }
                close $in;

            } else {
                push @files, $ent unless &a_glob_matches($ent, @globs)
+;
                # although, this is spurious.  In practice, I would ne
+ed
                # to find all the .gitignores in this dir before I
                # can test the files.
            }

        } else {
            # agahrr, I don't want to think about these monsters right
+ now
            # I probably mean to skip them if I'm using opendir...
        }
    }
    closedir $dir;
}

sub a_glob_matches {
    my $ent = shift;
    my @globs = @_;
    
    for( @globs ) {
        return 1 if $ent =~ $_;
    }

    return 0;
}
[download]

Questions:

How do you find all the appropriate files, skipping the files/dirs that match globs? Is opendir the way to go (doubtful)? Is it better to do these things in the &wanted callback of File::Find — and some kind of local our or global? Anything else of interest?
How do you test strings to see if they match globs? glob doesn't seem to help with that without issuing a bunch of syscalls that aren't necessary. Text::Glob seems to do the job perfectly, are there better choices?
All other input welcome — I'm sure I missed things, it's my way. I'm basically looking for any tools that make the job of replacing rcs tags easier and more robust.

-Paul

In reply to replacing rcs tags in perl by jettero

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.