Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Regular Expression:

I need a regex to fetch data from an html file

The contents are

<td id="Au"><%=openInvoice[0]%></td> <td id="Kun"><%=openInvoice[1]%></td> <td id="Re"><%=openInvoice[2]%></td> <td id="Rec"><%=openInvoice[3]%></td> <td id="Zah"><%=openInvoice[4]%></td> <td id="Off"><%=openInvoice[5]%></td>

The regex which I have with me is

<td id='Au'>(\D\d*\d)

but this gives data only for one column and if I attach similar sort of regex for another columns none of them work

Please help

Replies are listed 'Best First'.
Re: Regular Expression: I need a regex to fetch data from an html file
by CountZero (Bishop) on Feb 27, 2012 at 09:57 UTC
    Actually, I think it would be handy if you tell us what data you wish to extract. Is it only the numeric parameter of openInvoice[0]?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      <tr><td id='Auf'>50956866</td> <td id='Ku'>D510848</td> <td id='Rec'>18.10.2011</td> <td id='Re'>EUR 118,95</td> <td id='Za'>EUR 0,00</td> <td id='Off'>EUR 118,95</td>
      this was my html file from where I wanted to extract the data but finally I have the solution just wanted to share this with you monks the Regex:
      <td id='AuftragsId'>(.*)</td>\s*<td id='KundenNr'>(.*)</td>\s*<td id= +'RechnungsDatum'>(.*)</td>\s*<td id='RechnungsBetragAktuell'>(.*)</td +>\s*<td id='ZahlungsBetragAktuell'>(.*)</td>\s*<td id='OffenePosten'> +(.*)</td>

        There are a lot of nice modules in CPAN that will do your extraction in a more robust way-- i.e. they won't break if the maker of the table makes small changes in the text.

        Some places to start:
        HTML::TableExtract
        HTML::TreeParser
        HTML::TokeParser

        Unless you're trying to do something really out there (and maybe even then), someone has probably already solved more than half of your problem and posted a module that does it reliably.

Re: Regular Expression: I need a regex to fetch data from an html file
by Anonymous Monk on Feb 27, 2012 at 09:27 UTC

    but this gives data only for one column

    No, it doesn't match the data you posted at all

    and if I attach similar sort of regex for another columns none of them work

    Show that code?