Your Mother and bart told me to write this up during a conversation in the Chatterbox after talking about getting data manually out of HTML files. I have looked at a lot of HTML parsers out there, but I can't seem to get a grip on how they work. I would like to parse these files into a csv file and separate description files.
Note: get_data_file is a home rolled subroutine I wrote to make it easier for me to get files from my data directory.
I can start the script easily enough...
#!/usr/bin/perl use strict; use warnings; use File::Find; my @files; sub wanted { push @files, $_; } find(\&wanted,C:/Documents and Settings/ME/My Documents/fantasy/Role_p +laying/Magic_items/Spell_scrolls); for $file (@files) { }
The lines in the csv would be...
open(my $spell_csv, '>>', get_data_file('Role_playing','Spell_list.csv +')); print $spell_csv "$spell_name|$school|$level|$range|$duration|$area_of +_effect|$components|$casting_time|$saving_throw|$note"; push @spell_list, $spell_name;
For the description below all of that, the text with all of the html included would be written into a separate .txt file for each spell. If there are any lines in the description that begin with the word Note, put the note in the .csv file.
open(my $spell_description, '>', get_data_file('Role_playing/Spell_des +criptions',"$spell_name.txt")); print $spell_description $description;
After that is all created, get the name of the original file and create a .pl file with the same name in the same directory.
my $html_file = basename($0); my $pl_file = $html_file; $pl_file =~ s!html$!pl!; open(my $new_pl_file, '>', $pl_file); print $new_pl_file q{#!/usr/bin/perl use strict; use warnings; use lib "C:/Documents and Settings/ME/My Documents/fantasy/files/perl/ +lib"; use RolePlaying::SpellList qw(print_spell_scroll); print_spell_scroll(} .join(',',@spell_list). q{);};
Once the files are parsed and the new perl files created, delete the html files.
In reply to Parsing HTML into various files by Lady_Aleena
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |