in reply to Extracting variable-length strings between delimiters

Where's your data sample ?

  • Comment on Re: Extracting variable-length strings between delimiters

" tag for the item and quantity.

Replies are listed 'Best First'.
Re^2: Extracting variable-length strings between delimiters
by PMReader (Initiate) on Feb 12, 2010 at 06:31 UTC

    While I'm looking at the responses, here's the requested data sample (read in from a file with a consistent name):

    ...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>...

    From which I want to extract:

    5HN04039SW052A35R

    Name of the first item 14

    Name of the second item 12

    ...

    As you can see, there are varying lengths of unpredictable non-unique characters, quotation marks and angle brackets between the start string and the desired text. What is known is that they are always preceded somewhere by "Transaction ID", "item-title" and "qty" and terminated with a parenthesis for the transaction ID and "

      Since what you have is HTML I think it's better to use html parser module for this job

      #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser; my $Text; { local $/; $Text = <DATA>; } my $p = HTML::TokeParser->new( \$Text ); while ( my $token = $p->get_tag('td') ) { my $txt = $p->get_trimmed_text("/td"); print $txt,"\n"; } __DATA__ ...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>...

      Untested