nirelit has asked for the wisdom of the Perl Monks concerning the following question:

hello community,
Say we have the following structure in our filesystem:

dir1 dir2 dir3 dir4
dir stands for directory of course. In dir1, there is a file1.txt that has in it numbers, like below
6576576 898798789 5645436549 76567576576 876876876876

Same goes for dir2. In dir2, there is a file2.txt, that has in it numbers, like below

6576576 89879878963 56454365492 765675765763 8768768768765

And so on, with all the rest of the folders. What we need to do, is have a new file (like an index) out of all directories and files values, like below:

dir1;6576576,898798789,5645436549,76567576576,876876876876 dir2;6576576,89879878963,56454365492,765675765763,8768768768765
And secondly, another index file, which will have the reverse info
6576576;dir1,dir2

Any ideas on how would you approach this?
Best

Replies are listed 'Best First'.
Re: creating an index of files contents
by haukex (Archbishop) on Sep 23, 2016 at 09:43 UTC

    Hi nirelit,

    Any ideas on how would you approach this?

    One way is to use the core module File::Find, you can get the full filename of each found file from the variable $File::Find::name, and use File::Spec's splitdir to split the filename into its components. The reverse index could be done with a hash, assuming there's enough memory for all the entries to fit in.

    That's just one way, there are also the modules File::Find::Rule, Path::Class, or Path::Tiny, but the above has the advantage that it uses only core modules. The advantage of Path::Class and Path::Tiny is that they also have methods built in to slurp the files, which would make creating your index files a little bit easier.

    Hope this helps,
    -- Hauke D

      thank you for your quick reply, an example with code would be also much appreciated

      Update:
      thank you for your quick reply, a more detailed description on your approach with the reverse index would be also much appreciated

        Hi nirelit,

        You'll find some example code under the links I provided and by searching for the names of the modules (one example of many). Note that your request for "example with code" can be (mis)understood to mean "please do my work for me for free", which is considered rude. A good rule of thumb is that people will expend roughly as much effort answering a question as the person asking the question put into it; personally I'm quite happy to provide code for those who have shown their efforts. My suggestion is that you try writing some code, and if you have trouble with it please feel free to post here (guidelines on good questions) and I'm sure people will be happy to help more.

        Regards,
        -- Hauke D

        Hi nirelit,

        a more detailed description on your approach with the reverse index would be also much appreciated

        Please mark updates to your nodes as such. See How do I change/delete my post?, especially "It is uncool to update a node in a way that renders replies confusing or meaningless".

        Anyway, I'll give you the following hint: as you're reading your input files, let's say you have the current directory name stored in the variable $curdir, and you're reading your input file line-by-line (see for example "Files and I/O" in perlintro). Then you could do something like this:

        my %reversetbl; # at the beginning # ... open each file ... while (<$fh>) { chomp; # remove newline # ... $reversetbl{$_}{$curdir}++; } # at the end: for my $k (keys %reversetbl) { print $k, ';', join(',', keys %{ $reversetbl{$k} } ), "\n"; }

        This will give you the "6576576;dir1,dir2" output you want.

        To see what this code is doing, you can look at the data structure (a "hash of hashes") with Data::Dumper, for example: use Data::Dumper; print Dumper(\%reversetbl);.

        Hope this helps,
        -- Hauke D

Re: creating an index of files contents
by GotToBTru (Prior) on Sep 23, 2016 at 12:01 UTC

    Good first step would be to write out how you would do it without a computer. Assembling either index by itself is not complex, but doing both at the same time can seem daunting.

    Just off the top of my head, hashes and array references will be useful. Work through the Tutorials here on data types. Write some code and if you get stuck, ask!

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: creating an index of files contents
by kcott (Archbishop) on Sep 25, 2016 at 04:10 UTC

    G'day nirelit,

    Welcome to the Monastery.

    Please do not wrap your entire post in 'pre' tags; in fact, try not to use them at all. Put code and data in '<code>...</code>' blocks (which provides us with a [Download] link to the source); put paragraph text in '<p>...</p>' blocks; use other elements (e.g. the 'ul', 'ol' and 'dl' lists) as appropriate.

    Useful reading for the Initiate:

    Enjoy your time at the Monastery and learning Perl.

    — Ken