coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:
I am using WWW::Mechanize to scrape a site that has images and text next to them. I want to rip through and pull out all images and put them in an array. I'd use a similar regex to then slurp up all the text and place them in array #2 (the images and text have to be in the same order as they are found).
I have a regex that ripped out all useless junk in the HTML file keeping just the table that I'm looking for. I'm not sure how to loop over $page (content dump) to pull out every unique instance of an image WITHOUT using the image function within this module. Using this image function would still leave me stranded for trying to get the text to come with it.
Below is a sample of what I am working with
I want all images to be in @images and all text next to that image be in @text. There is definitely a way to go through this in one pass and collect both but would it be easier having two separate regexes to do this?</a><br><br><table width="100%" cellpadding="2" cellspacing="0" border +="0"><tr><td align="left" valign="bottom"><img src='http://images.tek +-tips.com/items/image001.gif' alt='Image001' width='40' height='40' b +order='0'> Description of image here</td><td align="right" valign="bo +ttom"></td> </tr><tr><td align="left" valign="bottom"><img src='http://images.tek- +tips.com/items/image002.gif' alt='Image002' width='40' height='40' bo +rder='0'> Description of image here</td><td align="right" valign="bot +tom"></td>
These are not my strong point and I appreciate any and all help to get the data extracted.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Pulling all instances of a regex out
by Roy Johnson (Monsignor) on Oct 04, 2005 at 19:02 UTC | |
|
Re:Pulling all instances of a regex out
by SamCG (Hermit) on Oct 04, 2005 at 19:00 UTC | |
|
Re: Pulling all instances of a regex out
by davido (Cardinal) on Oct 04, 2005 at 18:57 UTC | |
|
Re: Pulling all instances of a regex out
by GrandFather (Saint) on Oct 04, 2005 at 19:42 UTC | |
|
Re: Pulling all instances of a regex out
by wfsp (Abbot) on Oct 05, 2005 at 10:25 UTC |