comment on

Before I get into anything else, have you considered that your search string uses double quotes and your '/' may need to be escaped?

If the table method is the only one available to you then I'll outline a possible approach below, but first may I suggest that you dynamically generate the table used on the page (assuming that this is possible).

Consider a file that looked like this:

FOO\twww.yahoo.com/yahoo\tsomething else\tand another
BAR\twww.altavista.com\tsomething else\
[download]

Obviously the tab character is a physical tab and not the literal '\t'.

Then your code can be something like:

while (<FILE>) {
    my @line = split /\t/;

    # do_something...
}
[download]

This would allow you format the pages as you liked (and pretty damned easily) and would minimize the opportunity for inconsistent coding...

And now on to the approach...

According to what you've said, and factoring in what others have pointed out by way of problems, the only thing we can 'reliably' count on is a /<tr/i starting our table row and, thus, a new search term. I'd suggest avoiding /<tr>/ as it's entirely possible that someone will add bgcolor or valign or some other piece of amusing code...

So here's a shot based on what you already have. It does have the side effect of stripping out the <tr>/</tr> (assuming that these are on seperate lines and none of the following occur:

1. <tr><p>Some content</p> # will screw up because it skips to the nex
+t line
2. <p>Some content</p></tr> # will also screw up because it doesn't ta
+ke the last line
[download]

Anyway...


open (FILE, "<$read") or die ("Couldn't open file to read: $!");

while ($line = <FILE>) {

   next unless ($line =~ /<tr/i);

   my $match;

   while (($additional_lines = <FILE>) !~ /<[\/]{0,1}tr/i) {

       $match .= $additional_lines;

   }
   print "MATCH: " . $match . "\n\n";
}

close FILE;

exit 0;
[download]

The best way around the limitations of the previous code would be to undef $/ and slurp your file into a scalar as follows (this is just pseudo code as I couldn't get it working quite the way I wanted):

undef $/;

my $read = 'test.html';

open (FILE, "<$read") or die ("Couldn't open file to read: $!");

my $file = <FILE>;

close FILE;

my @possible_search_terms = split /<tr[^>]*?>/i, $file;

foreach (@possible_search_terms) {

    next if $_ =~ /^\s*$/;

    print "WORKING: '" . $_ . "'\n";


}
[download]

In reply to Re: File/String search... by jreades
in thread File/String search... by Sharky_The_Dog

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.