Re: Help regarding a perl script

When asking a question, please post the code that you ran. There is no way that what you posted will run with use strict; use warnings in effect - although you show those lines at the top of your code.

foreach my $fileName ($logfilename) is not a loop. This will just set $fileName equal to $logfilename and run the following code one time. I think you would be better off moving this log file checking code to right after you have created it - no need for another loop.

You can move my $xml = XML::Simple->new; outside the loop and reuse this $xml object. Creating an object involves a certain amount of overhead and there is no need to keep doing it.

As far as indenting goes, there are two styles, the trailing curly style and the "line up the curlies" style.

foreach my $file (@file_list)
{
   code here...
}
or
foreach my $file (@file_list){
   code here...
}
[download]

Many folks find the first way easier to use as you can tell at simple glance whether you've got matched curly braces or not. And for that reason, I recommend this way for all beginners. Either way is considered acceptable. But use one way, do not combine ways - consistency is important.

For indentation number of spaces, use 3 or 4. Two is too few and 5 or more adds nothing to understandability. The academic papers on this say that there is a slight advantage to 4 over 3 spaces. But the difference is not enough to "write home to mom about". Never use tab characters because number of spaces in a "tab" is highly variable between systems and there is no longer any reason to conserve a few bytes of blanks by using a single tab character. Note: you can set your editor to "convert tabs to spaces" and set tab stops to increments of 3 or 4 spaces.

Don't attempt to get "overly tricky" or brief. Perl is short enough without doing that. I was surprised that my $search = shift || '/home/files/Input_test1'; actually worked. The command line args are in an array called @ARGV. my $search = shift @ARGV || '/home/files/Input_test1'; better conveys the intent. Also of course, @_ is context dependent - your code won't work if I put it in a subroutine.

There are many places where the parens, () around function arguments can be eliminated - this is true of the "built-in" functions, like shift above. There is no need to write: shift (@ARGV). There are ways to define user functions that can mimic built-in functions and allow the omission of parens - that doesn't mean that omitting parens everywhere you can is a good idea. find() is defined in the File::Basename module and your use statement actually imported that function name (find). I would use parens when calling it because it is not a built-in function. Some folks figure that never using parens around built-in functions is a good idea. My opinion is more measured - meaning "sometimes this a good idea when clarity is enhanced".

Update:I got off the track of the original question... Some ideas for you...

The code seems to be interested in processing .jpg files. The File::find function will visit and call the specified function for every file within a directory tree. A directory is actually just a special kind of a file. So instead of collecting directory names that may or may not contain .jpg files, why not just get the path_names of all of the .jpg files as you go? And then process these fullpath file names (all of which will end in .jpg)?

In collect_jpg_files() below, the conditions that identify a .jpg file could be expressed as a single regex. But in this case, I doubt that there will be much difference in run-time speed - I just did it the obvious simple step-by-step way.

Also of course as your code shows, it is possible to have an anonymous subroutine instead of a named subroutine, i.e. collect_jpg_files() for File::find. But often having a name for a sub adds value in terms of clarity - I suspect that in your application, some more descriptive name than "collect_jpg_files" is appropriate.

It is certainly possible for "collect_jpg_files()" to do some processing of the .jpg files. But that may or may not be advisable. File::find is doing a "change working directory" as it descends through the starting path name. Error recovery can be problematic if something unexpected happens.

As you can see dealing with sub-directories is a hassle. I would certainly consider not replicating something that is a hassle yet again for your output. For example, you could just increase the filename string by pre-pending some directory info (of course changing /'s to .'s or some such thing). Also consider adding "_" (underscore) to filenames as appropriate to make parsing easier.

Anyway, if you are following me so far, you could wind up with 2 directories: $results/worked/...paths.xml and $results/bad/...paths.jpg, where "paths" is enough to re-create the original pathname to the source file.

Here is simple code that will get the fullpath names to the .jpg files into a single array. I think another poster pointed out that $base_dir below can be specified as relative to the path that this Perl program is running in. I highly recommend against doing that in most cases. In general it is desirable to "break the connection" between "where the program is running" and "where the data lives".

#!/usr/bin/perl -w
use strict;
use File::Find;

my $base_dir = "C:/Temp";
my @jpg_files;

find(\&collect_jpg_files, $base_dir);
print @jpg_files;                     # for debugging 

#prints: C:/Temp/test_file.jpg

foreach my $jpg_filepath (@jpg_files)
{

   my $log_filepath = $jpg_filepath;
   $log_filepath =~ s/jpg$/xml/;
   #... blah .. blah ...
}

sub collect_jpg_files
{
   return unless (-f);       # only real files, not directories
   return unless (/\.jpg$/); # name ends in .jpg
   push (@jpg_files, $File::Find::name);
}
[download]

Comment on Re: Help regarding a perl script Select or Download Code