Hello - I am a .Net developer with very limited Perl knowledge. I am attempting to read some values from several files and output the value as a csv file. I can write an .exe in C# that can accomplish this, but due to security concerns I am not able to run a .exe on the server. I know Perl is already installed on the server and running a Perl script wouldn't raise any concerns. I've been reading through several Perl books, but haven't come across any examples on how to get this done. Any ideas/suggestions is greatly appreciated.

I have some files in a directory that looks like this:
1. - root
  1.1 - html
    1.1.1 - html2010
        file1.html
        file2.html
        file3.html
        etc
    1.1.2 - html2010
        file1.html
        file2.html
        file3.html

I need to read the content from the "description" and "keywords" meta (test1,test2,test3,testk1,testk2,etc) from each file and output it as a csv file.

The html pages look something like this:

<!-- File 1 --> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/T +R/html4/strict.dtd"> <html lang="en" dir="ltr"> <head> <title>Test Page1</title> <meta name="description" content="test1,test2,test3,test4,test5"/> <meta name="keywords" content="testk1,testk2,testk3,testk4,testk5"/> </head> <body> Body of the page 1 </body> </html>
<!-- File 2 --> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/T +R/html4/strict.dtd"> <html lang="en" dir="ltr"> <head> <title>Test Page2</title> <meta name="description" content="test6,test7,test8,test9,test10"/> <meta name="keywords" content="testk6,testk7,testk8,testk9,testk10"/> </head> <body> Body of the page 2 </body> </html>

I have a very limited understanding of Perl. Feel free to make a recommendation if you think there is another scripting language that is more suitable for accomplishing the above task


In reply to Extracting Data from a File by globaldre

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.