in reply to Extracting Data from a File

Hopefully you can add/install modules, or those modules are already there. Note that you can bundle your code together in a number of ways, which was recently discussed here, which means that they aren't strictly needed on the server, you can ship them with whatever you have, and they will either live with your code or they will get installed, whichever way you go.

CSV: Text::CSV_XS. Separating things by commas sounds straight forward (join ',', @list), but there are corner/edge cases that may crop up and will get you to throw your hands up in frustration. Text::CSV_XS handles those cases for you, both for parsing and writing.

Parsing HTML: if they are well-formed XHTML, I prefer XML::Twig, but if not, check CPAN for html parsers. Again, you probably could regex the search for your meta names, but unless they are always exactly the same format (all on one line, always with name before content, neither of which are strictly required by HTML), it will be painful. Let a module do the heavy work for you, and you should be able to ask for the meta element with a name attribute of 'description', then you ask for the value of its 'content' key.

Looking for the HTML files: File::Find, which *is* included in the perl you're using, though I've seen others prefer File::Find::Rule, which is not included in the core Perl distribution (but may be installed on your server anyway).

Once you have all the documentation and modules, and you have your plan on how to "distribute" your code to the server, you can pull it all together. If you write it well, I'm guessing that your code will amount to 20-50 lines. That's it. With the contents of CPAN, I can't think of another scripting language that is better suited to what you're doing.

Replies are listed 'Best First'.
Re^2: Extracting Data from a File
by Corion (Patriarch) on Nov 11, 2010 at 15:19 UTC

    If this is really about only extracting <meta tags from HTML files, HTML::HeadParser is a limited parser written for that.

Re^2: Extracting Data from a File
by globaldre (Initiate) on Nov 11, 2010 at 18:59 UTC
    Could you give me an example on how it would look like? I am assuming the File::Find and other similar modules are just packaged classes in the same way they are used in .Net or Java, correct? I apologize again for my lack of knowledge.