Retrieving meta information from txt

fanticla has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Retrieving meta information from txt by chrestomanci (Priest) on Dec 10, 2010 at 12:07 UTC
The way I see it, you have two possible approaches. You could construct an index where each word of interest is stored with the meta information that it is associated with. `open $srcFile, '<', 'source.txt' or die $! my $metaText = ""; while( my $line = <$srcFile> ) { if( $line =~ m/<text>(.*)<\/text>/ ) { # Replace the old meta text with the new one. $metaText = $1; } else { # Normal line. The old meta text applies. my @words = split /\s+/ $line; foreach my $word (@words) { store_in_index($metaText, $word); } } } close $srcFile;` [download] In the example above, the `store_in_index()` function could anything from storage into a simple hash, to a relational database. It would depend on how much data, and how long you want to keep that data. An alternative approach would be one where you read the file backwards. Firstly for the word the user is looking for, and then for the meta information line that it relates to. Off the top of my head, I am not sure how that would be done, but I am sure there are ways.	[reply] [d/l] [select]
Re^2: Retrieving meta information from txt by Anonymous Monk on Dec 10, 2010 at 12:09 UTC
File::Bidirectional / File::ReadBackwards	[reply]
Re: Retrieving meta information from txt by jethro (Monsignor) on Dec 10, 2010 at 12:18 UTC
Instead of first searching and then looking for the tag you could read the file so that you already have separated each text and its tag For example you can change $/ to "<text>" so that when you read that file, you read it not line by line, but text by text. Or just parse line by line and store into a new index of an array whenever a line begins with '<text>'. Have an array for the tags and an array for the texts where corresponding tag and text have the same index and you have all the information "finely sliced at your fingertips" ;-)	[reply]
Re^2: Retrieving meta information from txt by Anonymous Monk on Dec 10, 2010 at 13:30 UTC
my @something; { my $before = []; my $after = []; my $curr = $before; my $meta = []; LOOP: while (<$fh>) { if (/^\Q..../) { $curr = $after; next LOOP; } if (/^</) { ## Something( { } ); push @something, { meta => $meta, before => $before, after => $after, }; $before = []; $after = []; $curr = $before; $meta = [$_]; next LOOP; } ## end if (/^</) push @$curr, $_; } ## end while (<$fh>) if ( grep { $_ or @$_ } $meta, $before, $after ) { push @something, { meta => $meta, before => $before, after => $after, }; } ## end if ( grep { $_ or @$_ ...}) } [download]	[reply] [d/l]
Re: Retrieving meta information from txt by bart (Canon) on Dec 10, 2010 at 12:00 UTC
shows in a TK::Text Widget the portion of the text containing that word. This works fine. If that is working, then you almost have what you want. Likely, the program now searches back up from the found word to where a line containing "`<text>`" to find the beginning of the section, and look further down to the next occurrence. So hook into the first part: as it must have searched for the start, all you have to do is retrieve the contents: `my($meta) = substr($everything, $sectionstart) =~ /<text>(.*?)<\/text> +/;` [download]	[reply] [d/l] [select]
Re: Retrieving meta information from txt by cdarke (Prior) on Dec 10, 2010 at 13:59 UTC
If you are on Microsoft Windows using NTFS then you can use an Additional Data Stream (ADS), which is used for exactly your purpose with, for example, Notepad. ADS streams are marked with a :suffix. For example, from cmd.exe: `echo secret > fred.txt:hidden more < fred.txt:hidden secret` [download] The hidden ADS is otherwise not normally visible. In Perl you can open, read, and write, ADS files in the same way as any other.	[reply] [d/l]
Re: Retrieving meta information from txt by elef (Friar) on Dec 10, 2010 at 18:19 UTC
If all you want to do is add the header (meta info) to every hit, this can probably be done in two very simple lines of code: always store the current (latest) header and add it to the results. If you do the main data lookup line by line with a while loop or similar construct and supposing that no line has more than one <text> tag pairs (i.e. new texts are on new lines) and the opening and closing text tags are always on the same line (i.e. there are no line breaks within the header), something like this should work: `while (<FILE>){ if (/<text>(.*)<\/text>/) {$currentheader = $1} # if the current line +has a header, save it, otherwise, keep last saved header # your data search code goes here if (there was a hit) {print $currentheader along with the hit} }` [download] This way you only go through the file once. Of course this may not be doable if you use some exotic solution for your search word matching instead of a while loop. Edit: I just noticed that chrestomanci already proposed a more elaborate execution of the same idea.	[reply] [d/l]
Re: Retrieving meta information from txt by Anonymous Monk on Dec 10, 2010 at 11:41 UTC
I am not searching for a peace of code, but just for the right inspiration...Thanks. Cla Peace of code, is that like peace of mind? If peace doesn't inspire you, maybe you're not looking for inspiration?	[reply]