Re: Extracting variable-length strings between delimiters

" tag for the item and quantity.

Replies are listed 'Best First'.
Re^2: Extracting variable-length strings between delimiters by PMReader (Initiate) on Feb 12, 2010 at 06:31 UTC
While I'm looking at the responses, here's the requested data sample (read in from a file with a consistent name): ...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>... [download] From which I want to extract: 5HN04039SW052A35R Name of the first item 14 Name of the second item 12 ... As you can see, there are varying lengths of unpredictable non-unique characters, quotation marks and angle brackets between the start string and the desired text. What is known is that they are always preceded somewhere by "Transaction ID", "item-title" and "qty" and terminated with a parenthesis for the transaction ID and "	[reply] [d/l]
Re^3: Extracting variable-length strings between delimiters by ahmad (Hermit) on Feb 12, 2010 at 08:39 UTC
Since what you have is HTML I think it's better to use html parser module for this job #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser; my $Text; { local $/; $Text = <DATA>; } my $p = HTML::TokeParser->new( \$Text ); while ( my $token = $p->get_tag('td') ) { my $txt = $p->get_trimmed_text("/td"); print $txt,"\n"; } __DATA__ ...<tr><td colspan="3"> <span><strong>SomeNameHere</strong></span> <span class="small">(Transa +ction ID #5HN04039SW052A35R)</span><br><br> </td></tr> <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the first item</td> <td align="center" class="qty" width="9%">14</td>... <tr><td colspan="3"><hr class="dotted"></td></tr> <tr><td colspan="3"><br class="h10"></td></tr> ...<td class="item-title" width="40%">Name of the second item</td> <td align="center" class="qty" width="9%">12</td>... [download] Untested	[reply] [d/l]