Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hello,

I have spent several days perusing all of the good ideas on this site about how to parse and manipulate tables, but alas, no one has specifically asked about this particular situation, which I will describe.

Currently, I am using (with permission) vendor web sites which have product data in tabular form. My goal is to integrate their pages into our e-commerce system. I grab the pages with HTTP::Request, modify the pages, and then re-serve them as if they are our own. (This is a simplification - some pages are static, where we use wget as a crontab and store them locally to be polite.) The tables are consistent, and I need to extract part number, descrption and application, and then insert a form to each row, which contains a button to add the item to the shopping cart.

Thus far, my approach, which works nicely (today), but is not the proper approach from what I have read, is to parse the pages using regexp, split and join. My ultimate goal is to use one of the modules to accomplish this in a cleaner, more robust fashion. It looks like HTML::ElementTable is the way to go, but most examples I have seen build the table from scratch. Reading the CPAN docs shows that this module will operate on HTML::Element objects, but the only way I know of to build them from an HTML string is with HTML::TreeBuilder, which appears to be very CPU-hungry.

Is there a better way to create the HML::Element objects from an HTML string? Also, once I do the necessary manipulation, will the as_HTML subroutine recreate the original document satisfactorily? Is this even the direction I want to go with this?

Many thanks in advance for any wisdom that may be shared.

With kind regards,

Mark

In reply to Table Manipulation by PerlPilgrim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-04-23 20:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found