in reply to Re: Extracting variable-length strings between delimiters
in thread Extracting variable-length strings between delimiters

While I'm looking at the responses, here's the requested data sample (read in from a file with a consistent name):

...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>...

From which I want to extract:

5HN04039SW052A35R

Name of the first item 14

Name of the second item 12

...

As you can see, there are varying lengths of unpredictable non-unique characters, quotation marks and angle brackets between the start string and the desired text. What is known is that they are always preceded somewhere by "Transaction ID", "item-title" and "qty" and terminated with a parenthesis for the transaction ID and "" tag for the item and quantity.

Replies are listed 'Best First'.
Re^3: Extracting variable-length strings between delimiters
by ahmad (Hermit) on Feb 12, 2010 at 08:39 UTC

    Since what you have is HTML I think it's better to use html parser module for this job

    #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser; my $Text; { local $/; $Text = <DATA>; } my $p = HTML::TokeParser->new( \$Text ); while ( my $token = $p->get_tag('td') ) { my $txt = $p->get_trimmed_text("/td"); print $txt,"\n"; } __DATA__ ...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>...

    Untested