Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
may have been slightly better. Also either a pre-mapping to "$dir/$_" or a chdir into $dir may have been advantageous. Incidentally, as far as the former option is concerned, you know that to do it in a really portable way you should have used File::Spec, don't you? (This is just FYI, I use stuff like "$dir/$_" myself all the time.)

Actually, no I know there are modules/methods that should be used to clean up paths, but it isn't something I have really done as of yet. Exactly what should I be doing?? If you dont mind explaining the whats and whys. Should I be using the canonpath or catfile methods? The manpage says that canonpath does a logical cleanup.. What exactly is that?


All in all, and maintaining an approach similar to yours, I may have done:

my (@files,@dirs);
for (readdir DIR) {
    next if $_ eq '.' or $_ eq '..';
    $_="$dir/$_";
    push @files, $_ if -f;
    push @dirs, $_ if -d _;
}

Thanks for your variation. The string comparison would be a better way to go in this situation. Regexes are probably my weakest point with perl, so this is something I have been overusing trying to get used to. The performace hit is good mention. thanks.


Now most file and directory opening functions support lexical handles. Well, that's not strictly true (from the technical point of view), but that's a good approximation to the truth. And with a lexical dirhanle you wouldn't need an explicit close.

I am almost possitive I have seen mention here or in one of the perl books that you should close filehandles and check for error code ( oops ) just as you do with calls to open... And that it is common mistake that people only check for success on open and not the close.


Well, I can't comment on your whole logic. But rather than having $recurse flag I would try to arrange things in other to have at some point a recursion for @dirs (this hypothetical @dirs not being yours), so that when @dirs is empty simply no recursion happens...

The point of the flag is if you do not want recursion at all, the main $recursion flag can be set to 0 and then it will only return images in the top directory.


        if (@$files) {    
            push @files, @$files;
        }    

Why not

        push @files, @$files;
instead? (I don't know which is faster, but I suspect they execute at much the same speed, and the latter is more clear, IMHO.)

Actually I added that test trying to search out a bug and left it in. The error was that I made the mistake of doing an explicit return undef; instead of just return; when being called in a list context. So I was returning a list of one value "undef" and then later trying to dereference the undef value... ooops.


    foreach $file (@files) {
        open FH, $file or
            die "Error opening $file: $!\n";
Ditto as above wrt using lexical filehandles. Also, some disagree, but I always recommend using the three args form of open.

I obviously do not understand your reference to lexical filehandles. Are you sugesting I dont check for error? HuH?? I must be having a brain fart because I am not following you.

I have heard mention that calling the three arg method is better, but have not heard good reasoning for this.


        if ( $data =~ /^BM/ ) {
            $type = 'BMP';
        }elsif ( $data =~ /^GIF879a/) {
            $type = 'GIF';
        }elsif ( $data =~ /^\xFF\xD8/ ) {
            $type = 'JPG';
        }else {
            $type = undef;
        }

Ouch! It doesn't mach my PNG images, not to say a few other tenths of popular formats, not to mention more exotic ones... Seriously this may be a good reason for using a dedicated module...

Here....

  • elsif ( data =~ /^\x89PNG\x0d\x0a\x1a\x0a/) { type = 'PNG'; }
Now it checks your pngs... Happy? :) Actually, this wasnt written just for the goal of seeing how easy it would be to identify a couple common formats. Also, if you are looking just for jpegs and gifs why search for pngs? I was not trying to identify every obscure image, even some of the more popular formats in this case.


No need for scalar. No need for the outer parentheses. No need for the inner ones either. No need for the if modifier, and no need for the second statement:
return @images;
is just as fine!

That is definetly cleaner. Perls flexibility of giving different values depending on how it is called can sometimes be confusing. So I have developed a defense mechanism that whenever any errors are encountered or I am unsure on precedence order then add paranthesis or force return mode scalar. Although not necessary, I think if not tooo overboard it isnt bad habit. What do you think given that reasoning?


Here you're returning a possibly huge list of files to scan them subsequently for images. I would check them on the fly, that is what I would do in the File::Find 'wanted' subroutine.

Ultimately, this is what should be done. However, it was a fun adventure and I learned a few new things... Namely the use of /z instead of $ in regexes when you dont want match before the newline. Not to do an explicit return undef; because you may get unexpected results when called in list context. As well as the many thing you have pointed out..

Thanks.....

zzSPECTREz

In reply to Re^2: Find images regardless of filetype extension. by zzspectrez
in thread Find images regardless of filetype extension. by zzspectrez

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-03-29 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found