Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Particular HTML contents to CSV or DB

by nicpon (Initiate)
on Aug 25, 2005 at 15:44 UTC ( [id://486594]=note: print w/replies, xml ) Need Help??


in reply to Particular HTML contents to CSV or DB

All the html files were generated in php. Here is the link to give an idea how does the html file look like http://www.nicpon.net/mAHuEGF.html .What I would need from that is bold name and then the rest of the info frm that product(all the fields under the product name). SO, if i could first take name and the rest of the info and insert it into new file with commas separated values(I can use csv since then i can easily import it into database and this way i dont have to worry about connnection frm the script) and then strip all the html. Or other way would be first take just the part of each html and insert it into new file since the product info is always starts with and ends with . My other question is how do i get a list of all the files in the folder ??
  • Comment on Re: Particular HTML contents to CSV or DB

Replies are listed 'Best First'.
Re^2: Particular HTML contents to CSV or DB
by ww (Archbishop) on Aug 25, 2005 at 15:48 UTC
Re^2: Particular HTML contents to CSV or DB
by Ctrl-z (Friar) on Aug 26, 2005 at 15:03 UTC

    As the pages were generated programmatically, I think you would be more successful scraping the relevent data out of the surrounding markup. This avoids traversing the DOM or mucking around with regular expressions. See

    Template::Extract
    Text::Scraper




    time was, I could move my arms like a bird and...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://486594]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (1)
As of 2024-04-15 18:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found