bscheiman has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Perl Monks,

I come to you with a question that has been troubling me for the past hours. The problem itself is:

INPUT: (directory structure)

clocks001.jpg clocks002.jpg clocksvert001.jpg clocksvert002.jpg
clocksvertred001.jpg daisies001.jpg ...

I need to deduce category, subcategories and image numbers (I already solved that part) from the filenames without external database usages and then output it to a MySQL database through DBI.

The ideal output would consist of something like this;

clocks001.jpg -> category: clocks; subcats: null/null IM#: 001
clocks002.jpg -> category: clocks; subcats: null/null IM#: 002
clocksvert001.jpg -> category: clocks; subcats: vert/1 IM#: 001
clocksvert002.jpg -> category: clocks; subcats: vert/1 IM#: 002
clocksvertred001.jpg -> category: clocks; subcats: vert/1 red/1 IM#: 001 (vert is subcat 1 under clocks, red is subcat 1 under vert)

and so on.

Thanks for sharing your wisdom.

Replies are listed 'Best First'.
Re: Directory Structure parsing
by Aragorn (Curate) on Apr 01, 2003 at 19:58 UTC
    A possible solution:
    #!/usr/bin/perl -w use strict; # Here go all the top-level categories my @categories = qw(clocks daisies); # And the subcategories. my @subcats = qw(vert red); # Directory contents. You could also loop through STDIN or # something. See the comment below the foreach loop. my @names = qw(clocks001.jpg clocks002.jpg clocksvert001.jpg clocksvert002.jpg clocksvertred001.jpg daisies001.jpg); foreach my $name (@names) { # while (my $name = <STDIN>) { # chomp($name) # Get image number and strip off the extension my $output = "$name -> "; my ($im_no) = $1 if $name =~ s/(\d+)\..+$//g; foreach my $cat (@categories) { if ($name =~ s/^$cat//) { $output .= "category: $cat; "; next; } } $output .= "subcats:"; foreach my $subcat (@subcats) { if (not $name) { $output .= " null/null"; last; } if ($name =~ s/^$subcat//) { $output .= " $subcat/1"; } } $output .= " IM#: $im_no"; print "$output\n"; }

    Arjen

    Update: Fixed a small bug in the output.

      Thinking some more about it; if there can more of the same subcategorie (as suggested by the <subcat>/number), then you can replace the code
      if ($name =~ s/^$subcat//) { $output .= " $subcat/1"; }
      with
      if ($name =~ /^$subcat/) { $output .= " $subcat/"; my $c = 0; $c++ while $name =~ s/^$subcat//; $output .= $c; } # Special case: we don't want "null/null" after # adding valid subcats. last if not $name;
      This way, filenames like "clocksvertvertred001.jpg" are displayed like

      clocksvertvertred001.jpg -> category: clocks; subcats: vert/2 red/1 IM#: 001

      Arjen

Re: Directory Structure parsing
by Improv (Pilgrim) on Apr 01, 2003 at 19:23 UTC
    Do you already know all the possible categories/subcategories? Is there a tree structure, or is it more free-form? Basically, we need to know how we can tell the end of the name of a category and the beginning of the name of a subcategory. If we knew, for example, that there were just categories, clocks, birds, and eels, and subcategories for each include vert and mood, we might have something like:
    while(readdir(DIRHANDLE)) { if(s/^clocks//) { $thisline{category} = 'clocks'; } if(s/^birds//) { $thisline{category} = 'birds'; } ... if(! defined($thisline{category})) { die "No category defined!\n"; } if(s/^vert//) { $thisline{category} = 'vert'; } ... # And so on, for 2nd level subcategories. }
    On the other hand, depending on how your categories and subcategories are, another approach might be better. If you can clarify your question, I can probably give you a better answer.
      As a matter of fact, it's a free-form structure. The categories, subcategories, sub-subcategories (you get the idea) aren't known, that's part of why I need them to go into a DBI database. I know this isn't the best approach to do it. I also thought of using folders instead of names in the images: clocks/vert/red/001.jpg. But this would take some time since it's 250 megabytes worth of images. Oh well. I'll do whatever I have to do. Thanks again.
Re: Directory Structure parsing
by bscheiman (Initiate) on Apr 02, 2003 at 04:22 UTC
    I thought of a way to do it, but unfortunately I'm not completely sure as to how to write it. I seem to get lost in my mind everytime I think I've got my thoughts arranged. But anyway. My idea consisted of two while/for loops and comparing the string part of the file (ie, w/o the numbers) against the file that was run through the loop(s) previously.

    Example:
    file1: clocks - category clocks
    file2: clocksvert - category clocks (matches above file) subcategory vert (part of filename that doesn't match)

    I hope this makes sense. Thanks once again.