Re: Help with a faster loop

If you are the anonymonk who posted the reply above about Searching XML files, be aware that it would be prudent to use a proper XML parsing module if you are going to be searching for stuff in xml files.

If you are really familiar with and confident about how your xml files are created, and if the xml markup is simple, then sure, you can tailor a regex solution for your data, and it might be more effective/efficient than using a parsing module. But using a parser is not so very complicated (and not so very slow, either).

Here's a demonstration that ought to do what you want in terms of searching for content in xml files; it includes the good suggestions from the previous replies, and adds a few other tweaks as well. Note that we'll filter out all the irrelevant file names during the readdir phase:

#!/usr/bin/perl

use strict;
use XML::Parser;

my ( $path, $pattern ) = @ARGV;
die "Usage: $0  path pattern\n lists files in path that contain patter
+n\n"
    unless ( length($path) and -d $path and $pattern =~ /\S/ );

my $found_files = process_files( $path, $pattern );
print "the following files in $path contain '$pattern'\n",
    join( "\n", @$found_files ), "\n";

sub process_files
{
    my ( $path, $pattern ) = @_;

    my @found = ();
    my $ignore = qr/\.(?:zip|lfa|txt) | UASTG |
                    defines | sccpch | sms81154 | sms97767
                   /x;

    opendir( D, $path ) or die "opendir failed on $path: $!";

    for my $file ( grep { -f "$path/$_" and !/$ignore/} readdir D ) {
        my $nfound = read_file( $path, $file, $pattern );
        push @found, "$path/$file: $nfound" if ( $nfound );
    }
    closedir D;

    return \@found;
}

sub read_file
{
    my ( $path, $file, $pattern ) = @_;
    my $nfnd = 0;
    if ( open my $fh, "$path/$file" ) {
        my $xml = new XML::Parser( Handlers =>
                                   { Char => sub { $nfnd++ if $_[1] =~
+ /$pattern/ }
                                   } );
        $xml->parse( $fh );
    }
    else {
        warn "open failed on $path/$file: $!\n";
    }
    return $nfnd;
}
[download]

Lots of monks like to recommend other XML modules that are more elaborate or "sophisticated" than the basic XML::Parser, but for your particular case (if I understand it right), this one is a pretty good match.

Comment on Re: Help with a faster loop Download Code

Replies are listed 'Best First'.
Re^2: Help with a faster loop by gzayzay (Sexton) on Mar 02, 2006 at 15:29 UTC
Thanks for this code. It looks very cool. my concern about using the xml parser is that if i want to share my code with another person, I think they will have to have xml parser module installed before they can run the code. If the XML::Parser is a core module, than that will take care of that problem. however, I don't think it is a core module. Correct me if I am wrong. Again, I really like your code, I will be using it for internal purpose since i have xml parser installed on my machine. Edman	[reply]

Replies are listed 'Best First'.

Re^2: Help with a faster loop
by gzayzay (Sexton) on Mar 02, 2006 at 15:29 UTC

Again, I really like your code, I will be using it for internal purpose since i have xml parser installed on my machine.

Edman

[reply]