Hello,
I have spent several days perusing all of the good ideas on this site about how to parse and manipulate tables, but alas, no one has specifically asked about this particular situation, which I will describe.
Currently, I am using (with permission) vendor web sites which have product data in tabular form. My goal is to integrate their pages into our e-commerce system. I grab the pages with
HTTP::Request, modify the pages, and then re-serve them as if they are our own. (This is a simplification - some pages are static, where we use
wget as a crontab and store them locally to be polite.) The tables are consistent, and I need to extract part number, descrption and application, and then insert a form to each row, which contains a button to add the item to the shopping cart.
Thus far, my approach, which works nicely (today), but is not the proper approach from what I have read, is to parse the pages using regexp,
split and
join. My ultimate goal is to use one of the modules to accomplish this in a cleaner, more robust fashion. It looks like
HTML::ElementTable is the way to go, but most examples I have seen build the table from scratch. Reading the CPAN docs shows that this module will operate on
HTML::Element objects, but the only way I know of to build them from an HTML string is with
HTML::TreeBuilder, which appears to be very CPU-hungry.
Is there a better way to create the
HML::Element objects from an HTML string? Also, once I do the necessary manipulation, will the
as_HTML subroutine recreate the original document satisfactorily? Is this even the direction I want to go with this?
Many thanks in advance for any wisdom that may be shared.
With kind regards,
Mark
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.