I am using WWW::Mechanize to scrape a site that has images and text next to them. I want to rip through and pull out all images and put them in an array. I'd use a similar regex to then slurp up all the text and place them in array #2 (the images and text have to be in the same order as they are found).
I have a regex that ripped out all useless junk in the HTML file keeping just the table that I'm looking for. I'm not sure how to loop over $page (content dump) to pull out every unique instance of an image WITHOUT using the image function within this module. Using this image function would still leave me stranded for trying to get the text to come with it.
Below is a sample of what I am working with
I want all images to be in @images and all text next to that image be in @text. There is definitely a way to go through this in one pass and collect both but would it be easier having two separate regexes to do this?</a><br><br><table width="100%" cellpadding="2" cellspacing="0" border +="0"><tr><td align="left" valign="bottom"><img src='http://images.tek +-tips.com/items/image001.gif' alt='Image001' width='40' height='40' b +order='0'> Description of image here</td><td align="right" valign="bo +ttom"></td> </tr><tr><td align="left" valign="bottom"><img src='http://images.tek- +tips.com/items/image002.gif' alt='Image002' width='40' height='40' bo +rder='0'> Description of image here</td><td align="right" valign="bot +tom"></td>
These are not my strong point and I appreciate any and all help to get the data extracted.
In reply to Pulling all instances of a regex out by coldfingertips
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |