NERDVANA has a good point in Re: perl Mojo DOM CSS syntax issues, saying that:
In particular, <div class="JMWMJ"><div class="toI8Rb OSrXXb usbThf"> +looks like it was auto-generated by some tool on the remote end, so y +ou can expect those classes to change to new random strings any time +the remote side gets recompiled.
In the long term this is going to be a problem. But a problem which has a solution that can be automated fully. As opposed to the problem of the website changing its structure by adding/removing divs for example.
The solution to divs class/id being renamed is to keep some html documents from the website at a time when your program worked. And diff the attributes to current html. The diff will tell you how the div names/ids changed and pass that info to your script to revise its anchors.
Here is my 3AM-whipped-up-code which utilises XML::Diff -- which, despite its name, works for any DOM flavour, html included:
use strict; use warnings; use XML::Diff; my $html1 =<<EOH; <html> <body> <div id="1"> <div id="2"></div> </div> </body> </html> EOH my $html2 =<<EOH; <html> <body> <div id="4"> <div id="5"></div> </div> </body> </html> EOH my $diff = XML::Diff->new(); my $diffgram = $diff->compare( -old => $html1, -new => $html2, ); print $diffgram;
and the result reveals the changed div ids:
<?xml version="1.0"?> <xvcs:diffgram xmlns:xvcs="http://www.xvcs.org/"> <xvcs:update id="2" first-child-of="/html/body"> <xvcs:attr-update name="id" old-value="1" new-value="4"/> </xvcs:update><xvcs:update id="1" first-child-of="/html/body/div"> <xvcs:attr-update name="id" old-value="2" new-value="5"/> </xvcs:update></xvcs:diffgram>
The so-called diffgram can tell your program its new anchors. With the new anchors automatically fixed, all you have to do is deal with structural changes in the website. Which is a sisyphian task with a herculean twist but, hey, no standards and no APIs or obfuscating important information inide unstructured HTML is how Capitalism creates jobs for the plebes and profits for the bosess. What the legend did not tell us is that everytime Sisyphus' rock rolls back down the hill some fatcat makes a few drachmas.
bw, bliako
In reply to Re: perl Mojo DOM CSS syntax issues
by bliako
in thread perl Mojo DOM CSS syntax issues
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |