Help regarding a perl script

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a Perl script which recursively reads .JPG files form directories and sub directories and process them by triggering an external application and generate a xml log file for each image and after that it is trying to reading these xml log files and check for certain errors . if error exists it generates the file name and type of error. In my below script the first part is working I mean it is reading image files and creating the log files but the second one reading these log files for errors is not working. Can I get help regarding this logic and the mistakes in the script. Newbie to Perl so tried to some extent but not knowing exactly how to combine these two logic.


use File::Find;
use File::Path;
use XML::Simple;
use File::Basename;
use strict;
use warnings;
use diagnostics;

# Get the directory from the command line  or use the default director
+y


$search = shift || '/home/files/Input_test1';


# Get an array of all subdirectories


find sub { push @dirs, $File::Find::name if -d }, $search ;


for $dir ( @dirs )

{

#   opendir $dh, $dir or do { warn "Cannot open '$dir' $!" ; next ; } 
+;

   opendir( DIR, "$dir" ) ;

   @files = grep( /\.jpg$/, readdir( DIR ) ) ;

   closedir( DIR ) ;

   (my $logdir = $dir)=~s|^\Q$search/\E||i;

   $logdir =~ s|/IMAGES$||i;

  $logdir = "/home/output/logs/$logdir/NEW";

   
     mkdir $logdir;

# Creating a log file with the same filename to generate the logs


   foreach $file ( @files )
   
   {
      print "$dir/$file\n" ;
      $logfilename =$file;

      $logfilename =~s/\.jpg$/.xml/;

      print "$logdir/$logfilename";
      
   
 
      @args = ("/usr/software/fits-0.4.2/fits.sh", "-i", "$dir/$file",
+"/usr/software/fits-0.4.2/fits.sh", "-o", "$logdir/$logfilename");

      system( @args ) == 0 or die "system @args failed: $?";
   
   }

my $outfile = "$logdir"

open OUTF, "> $outfile" or die print " can't create logfile; $!";


my $xml = XML::Simple->new;

  foreach my $fileName ($logfilename) {

 my ($filename, $directories, $suffix) = fileparse($fileName);

    my $file = $xml->XMLin($fileName) or die "Failed for $fileName: $!
\n";
    my $format = $file->{identification}{'identity'}{'format'};

    if ($format ne 'JPEG File Interchange Format') {
        print  OUTF  "$filename | Identity Format Error\n";
     
    }
[download]

Comment on Help regarding a perl script Download Code

Replies are listed 'Best First'.
Re: Help regarding a perl script by Marshall (Canon) on Dec 08, 2010 at 01:39 UTC
When asking a question, please post the code that you ran. There is no way that what you posted will run with `use strict; use warnings` in effect - although you show those lines at the top of your code. `foreach my $fileName ($logfilename)` is not a loop. This will just set $fileName equal to $logfilename and run the following code one time. I think you would be better off moving this log file checking code to right after you have created it - no need for another loop. You can move `my $xml = XML::Simple->new;` outside the loop and reuse this $xml object. Creating an object involves a certain amount of overhead and there is no need to keep doing it. As far as indenting goes, there are two styles, the trailing curly style and the "line up the curlies" style. `foreach my $file (@file_list) { code here... } or foreach my $file (@file_list){ code here... }` [download] Many folks find the first way easier to use as you can tell at simple glance whether you've got matched curly braces or not. And for that reason, I recommend this way for all beginners. Either way is considered acceptable. But use one way, do not combine ways - consistency is important. For indentation number of spaces, use 3 or 4. Two is too few and 5 or more adds nothing to understandability. The academic papers on this say that there is a slight advantage to 4 over 3 spaces. But the difference is not enough to "write home to mom about". Never use tab characters because number of spaces in a "tab" is highly variable between systems and there is no longer any reason to conserve a few bytes of blanks by using a single tab character. Note: you can set your editor to "convert tabs to spaces" and set tab stops to increments of 3 or 4 spaces. Don't attempt to get "overly tricky" or brief. Perl is short enough without doing that. I was surprised that `my $search = shift \|\| '/home/files/Input_test1';` actually worked. The command line args are in an array called @ARGV. `my $search = shift @ARGV \|\| '/home/files/Input_test1';` better conveys the intent. Also of course, @_ is context dependent - your code won't work if I put it in a subroutine. There are many places where the parens, () around function arguments can be eliminated - this is true of the "built-in" functions, like shift above. There is no need to write: shift (@ARGV). There are ways to define user functions that can mimic built-in functions and allow the omission of parens - that doesn't mean that omitting parens everywhere you can is a good idea. find() is defined in the File::Basename module and your use statement actually imported that function name (find). I would use parens when calling it because it is not a built-in function. Some folks figure that never using parens around built-in functions is a good idea. My opinion is more measured - meaning "sometimes this a good idea when clarity is enhanced". Update:I got off the track of the original question... Some ideas for you... The code seems to be interested in processing .jpg files. The File::find function will visit and call the specified function for every file within a directory tree. A directory is actually just a special kind of a file. So instead of collecting directory names that may or may not contain .jpg files, why not just get the path_names of all of the .jpg files as you go? And then process these fullpath file names (all of which will end in .jpg)? In collect_jpg_files() below, the conditions that identify a .jpg file could be expressed as a single regex. But in this case, I doubt that there will be much difference in run-time speed - I just did it the obvious simple step-by-step way. Also of course as your code shows, it is possible to have an anonymous subroutine instead of a named subroutine, i.e. collect_jpg_files() for File::find. But often having a name for a sub adds value in terms of clarity - I suspect that in your application, some more descriptive name than "collect_jpg_files" is appropriate. It is certainly possible for "collect_jpg_files()" to do some processing of the .jpg files. But that may or may not be advisable. File::find is doing a "change working directory" as it descends through the starting path name. Error recovery can be problematic if something unexpected happens. As you can see dealing with sub-directories is a hassle. I would certainly consider not replicating something that is a hassle yet again for your output. For example, you could just increase the filename string by pre-pending some directory info (of course changing /'s to .'s or some such thing). Also consider adding "_" (underscore) to filenames as appropriate to make parsing easier. Anyway, if you are following me so far, you could wind up with 2 directories: $results/worked/...paths.xml and $results/bad/...paths.jpg, where "paths" is enough to re-create the original pathname to the source file. Here is simple code that will get the fullpath names to the .jpg files into a single array. I think another poster pointed out that $base_dir below can be specified as relative to the path that this Perl program is running in. I highly recommend against doing that in most cases. In general it is desirable to "break the connection" between "where the program is running" and "where the data lives". `#!/usr/bin/perl -w use strict; use File::Find; my $base_dir = "C:/Temp"; my @jpg_files; find(\&collect_jpg_files, $base_dir); print @jpg_files; # for debugging #prints: C:/Temp/test_file.jpg foreach my $jpg_filepath (@jpg_files) { my $log_filepath = $jpg_filepath; $log_filepath =~ s/jpg$/xml/; #... blah .. blah ... } sub collect_jpg_files { return unless (-f); # only real files, not directories return unless (/\.jpg$/); # name ends in .jpg push (@jpg_files, $File::Find::name); }` [download]	[reply] [d/l] [select]
Re: Help regarding a perl script by 7stud (Deacon) on Dec 07, 2010 at 23:32 UTC
That code is pretty messy. File::Find will descend into all subdirectories and retrieve all the files therein, so you don't need to pick out directories and search them yourself. You just need to provide find() with: (1) a subroutine reference and (2) the starting directory. You create a subroutine reference like this: `sub do_stuff { my $full_path_to_file = $File::Find::name; ... } my $sub_ref = \&do_stuff; #or find(\&do_stuff, $start_dir);` [download] Do not try to cram the subroutine definition/reference into the call to find(). A lot of perl programmers favor brevity over clarity. Do not follow their lead. They have forever cursed perl as a write only language. The full path to the file will be contained in a variable called $File::Find::name--but the "full path" will be relative to the starting directory you provide. So if you provide "." for the starting directory, then the "full paths" will look like: ./dir1/dir2/file1, etc., i.e. they won't actually be full paths. You need the full path (or a path relative to the current directory) to open the file later. If you only provide a file name, e.g. file1, perl will try to open a file called file1 in the current directory. a)Get rid of all constructs like this: "$var_name". The quotes are extra typing and do nothing. The correct use of double quotes to interpolate a variable is for situations when you have something additional in the string, for instance: `print "The number of people was: $count.\n";` [download] b) Poor indenting makes code hard to read. Indent 4 spaces--do not use tabs, do not use 3 spaces, do not use 6 spaces, do not use 0 spaces. For-loops are written, spaced, and indented like this in perl: `for my $val (@vals) { ... ... }` [download] c) Putting blank lines after every line of code is poor use of spacing. Try to group related code and then use a blank line to separate other sections. d) Do not us a pipe(\|) as a separator in s/// or m/// or anything else. Use a / or braces {}, and that's it. Just because you can do something does not mean you should. e) You are required to start every perl program with these lines: `use strict; use warnings; use 5.010; #if using perl 5.10+` [download] That will require that you declare all variables with my()--which all good programmers do--or you will get an error. f) print() and say() are your friends. Before you try to open a file, print out the file name to see if you actually have the correct full path.	[reply] [d/l] [select]
Re^2: Help regarding a perl script by ikegami (Patriarch) on Dec 08, 2010 at 00:36 UTC
do not use tabs, do not use 3 spaces, do not use 6 spaces, do not use 0 spaces. For-loops are written, spaced, and indented like this in perl: This is far from accepted. The code for Perl itself doesn't follow that convention. "$var_name". The quotes are extra typing and do nothing. Actually, that construct creates a new string, which is a waste, which makes the reader wonder if it was somehow necessary to create a new string there. That makes the construct bad, not useless. You are required to start every perl program with these lines: ... `use 5.010; #if using perl 5.10+` What? No! print() and say() are your friends. Before you try to open a file, print out the file name to see if you actually have the correct full path. naw, just be sure to include the path in the error message when the call fails.	[reply] [d/l]