creating an index of files contents

nirelit has asked for the wisdom of the Perl Monks concerning the following question:

hello community,
Say we have the following structure in our filesystem:

dir1
dir2
dir3
dir4
[download]

dir stands for directory of course. In dir1, there is a file1.txt that has in it numbers, like below

6576576
898798789
5645436549
76567576576
876876876876
[download]

Same goes for dir2. In dir2, there is a file2.txt, that has in it numbers, like below

6576576
89879878963
56454365492
765675765763
8768768768765
[download]

And so on, with all the rest of the folders. What we need to do, is have a new file (like an index) out of all directories and files values, like below:

dir1;6576576,898798789,5645436549,76567576576,876876876876
dir2;6576576,89879878963,56454365492,765675765763,8768768768765
[download]

And secondly, another index file, which will have the reverse info

6576576;dir1,dir2
[download]

Any ideas on how would you approach this?
Best

Comment on creating an index of files contents Select or Download Code

Replies are listed 'Best First'.
Re: creating an index of files contents by haukex (Archbishop) on Sep 23, 2016 at 09:43 UTC
Hi nirelit, Any ideas on how would you approach this? One way is to use the core module File::Find, you can get the full filename of each found file from the variable `$File::Find::name`, and use File::Spec's `splitdir` to split the filename into its components. The reverse index could be done with a hash, assuming there's enough memory for all the entries to fit in. That's just one way, there are also the modules File::Find::Rule, Path::Class, or Path::Tiny, but the above has the advantage that it uses only core modules. The advantage of Path::Class and Path::Tiny is that they also have methods built in to slurp the files, which would make creating your index files a little bit easier. Hope this helps, -- Hauke D	[reply] [d/l] [select]
Re^2: creating an index of files contents by nirelit (Initiate) on Sep 23, 2016 at 09:58 UTC
thank you for your quick reply, an example with code would be also much appreciated Update: thank you for your quick reply, a more detailed description on your approach with the reverse index would be also much appreciated	[reply]
Re^3: creating an index of files contents by haukex (Archbishop) on Sep 23, 2016 at 10:35 UTC
Hi nirelit, You'll find some example code under the links I provided and by searching for the names of the modules (one example of many). Note that your request for "example with code" can be (mis)understood to mean "please do my work for me for free", which is considered rude. A good rule of thumb is that people will expend roughly as much effort answering a question as the person asking the question put into it; personally I'm quite happy to provide code for those who have shown their efforts. My suggestion is that you try writing some code, and if you have trouble with it please feel free to post here (guidelines on good questions) and I'm sure people will be happy to help more. Regards, -- Hauke D	[reply]
Re^3: creating an index of files contents by haukex (Archbishop) on Sep 23, 2016 at 12:24 UTC
Hi nirelit, a more detailed description on your approach with the reverse index would be also much appreciated Please mark updates to your nodes as such. See How do I change/delete my post?, especially "It is uncool to update a node in a way that renders replies confusing or meaningless". Anyway, I'll give you the following hint: as you're reading your input files, let's say you have the current directory name stored in the variable `$curdir`, and you're reading your input file line-by-line (see for example "Files and I/O" in perlintro). Then you could do something like this: `my %reversetbl; # at the beginning # ... open each file ... while (<$fh>) { chomp; # remove newline # ... $reversetbl{$_}{$curdir}++; } # at the end: for my $k (keys %reversetbl) { print $k, ';', join(',', keys %{ $reversetbl{$k} } ), "\n"; }` [download] This will give you the "`6576576;dir1,dir2`" output you want. To see what this code is doing, you can look at the data structure (a "hash of hashes") with Data::Dumper, for example: `use Data::Dumper; print Dumper(\%reversetbl);`. Hope this helps, -- Hauke D	[reply] [d/l] [select]
Re^4: creating an index of files contents by nirelit (Initiate) on Sep 23, 2016 at 13:03 UTC
Re: creating an index of files contents by GotToBTru (Prior) on Sep 23, 2016 at 12:01 UTC
Good first step would be to write out how you would do it without a computer. Assembling either index by itself is not complex, but doing both at the same time can seem daunting. Just off the top of my head, hashes and array references will be useful. Work through the Tutorials here on data types. Write some code and if you get stuck, ask! But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)	[reply]
Re: creating an index of files contents by kcott (Archbishop) on Sep 25, 2016 at 04:10 UTC
G'day nirelit, Welcome to the Monastery. Please do not wrap your entire post in 'pre' tags; in fact, try not to use them at all. Put code and data in '`<code>...</code>`' blocks (which provides us with a [Download] link to the source); put paragraph text in '`<p>...</p>`' blocks; use other elements (e.g. the 'ul', 'ol' and 'dl' lists) as appropriate. Useful reading for the Initiate: Writeup Formatting Tips What shortcuts can I use for linking to other information? Posting on PerlMonks Enjoy your time at the Monastery and learning Perl. — Ken	[reply] [d/l] [select]