Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

file.txt <root> <book></book> <name>Syndey</name> </root>
#!/usr/bin/perl -w use strict; our $path_to_dir='/home/path_to_dir/'; sub search_directory($path_to_dir) { my($listFiles) = @_; #Insert / at the end if the / is not specified $listFiles .= '/' if($listFiles !~ /\/$/); #print "The directory path is $path\n"; # Loop all the files in directory for my $checkFile (glob($listFiles.'*')) { ## Check the directory if( -d $checkFile) { ## If the file is directory, them loop again. &search_directory($checkFile); } my $count=0; open(IN,"$checkFile") or die "$!"; while(<IN>){ if ($_ =~ m/<book><\/book>/ and /<name +>*.*Sydney*.*/) { $count++; } } if($count >= 1){ print "filename:$checkFile and count : $count\ +n"; } } } &search_directory($path_to_dir);
How to print the filename if it contains the string
<book></book> and also <name>Sydney</name>
where Sydney must be case insensitive

Replies are listed 'Best First'.
Re: Search in file
by Corion (Patriarch) on Sep 15, 2009 at 07:48 UTC

    I guess your post is a continuation of find the xml files. Why didn't you tell us that?

    You have all parts ready, but you will have to combine them differently. The problem in your code is that you read your file line by line in the

    while (<IN>) { ... }
    loop, and I assume that your XML file contains the empty book tag and Sydney on two different lines.

    There are two approaches to solve this:

    1. For large files: Read the file line by line and remember for each file whether you've seen an empty book tag, and also remember whether you've seen Sydney. After you've read through the file, look whether you've seen both.
    2. For small files: Read the file as a whole into one scalar and check whether the scalar contains both, the book tag and Sydney.
    3. (There's always a third option) Use a proper XML parser, and XPath queries to check whether there are //book[text()=""] tags and //text()="Sydney" nodes. This is the recommended and least fragile solution.

    For the "case insensitive" part, see perlre.