Re: File/String search...

Before I get into anything else, have you considered that your search string uses double quotes and your '/' may need to be escaped?

If the table method is the only one available to you then I'll outline a possible approach below, but first may I suggest that you dynamically generate the table used on the page (assuming that this is possible).

Consider a file that looked like this:

FOO\twww.yahoo.com/yahoo\tsomething else\tand another
BAR\twww.altavista.com\tsomething else\
[download]

Obviously the tab character is a physical tab and not the literal '\t'.

Then your code can be something like:

while (<FILE>) {
    my @line = split /\t/;

    # do_something...
}
[download]

This would allow you format the pages as you liked (and pretty damned easily) and would minimize the opportunity for inconsistent coding...

And now on to the approach...

According to what you've said, and factoring in what others have pointed out by way of problems, the only thing we can 'reliably' count on is a /<tr/i starting our table row and, thus, a new search term. I'd suggest avoiding /<tr>/ as it's entirely possible that someone will add bgcolor or valign or some other piece of amusing code...

So here's a shot based on what you already have. It does have the side effect of stripping out the <tr>/</tr> (assuming that these are on seperate lines and none of the following occur:

1. <tr><p>Some content</p> # will screw up because it skips to the nex
+t line
2. <p>Some content</p></tr> # will also screw up because it doesn't ta
+ke the last line
[download]

Anyway...


open (FILE, "<$read") or die ("Couldn't open file to read: $!");

while ($line = <FILE>) {

   next unless ($line =~ /<tr/i);

   my $match;

   while (($additional_lines = <FILE>) !~ /<[\/]{0,1}tr/i) {

       $match .= $additional_lines;

   }
   print "MATCH: " . $match . "\n\n";
}

close FILE;

exit 0;
[download]

The best way around the limitations of the previous code would be to undef $/ and slurp your file into a scalar as follows (this is just pseudo code as I couldn't get it working quite the way I wanted):

undef $/;

my $read = 'test.html';

open (FILE, "<$read") or die ("Couldn't open file to read: $!");

my $file = <FILE>;

close FILE;

my @possible_search_terms = split /<tr[^>]*?>/i, $file;

foreach (@possible_search_terms) {

    next if $_ =~ /^\s*$/;

    print "WORKING: '" . $_ . "'\n";


}
[download]

Comment on Re: File/String search... Select or Download Code

Replies are listed 'Best First'.
RE: Re: File/String search... by tye (Sage) on Sep 07, 2000 at 19:22 UTC
Before I get into anything else, have you considered that your search string uses double quotes and your '/' may need to be escaped? You don't need to escape / inside double quotes. There are really very few characters that require escaping inside of (Perl) double quotes. Namely, \, @, $, and the delimiter character (which is " for "this" and . for qq.this., etc.). Having heard something like this twice in as many days, I felt it was important to comment on this. - tye (but my friends call me "Tye")	[reply]

Replies are listed 'Best First'.

RE: Re: File/String search...
by tye (Sage) on Sep 07, 2000 at 19:22 UTC

Before I get into anything else, have you considered that your search string uses double quotes and your '/' may need to be escaped?

You don't need to escape / inside double quotes. There are really very few characters that require escaping inside of (Perl) double quotes. Namely, \, @, $, and the delimiter character (which is " for "this" and . for qq.this., etc.).

Having heard something like this twice in as many days, I felt it was important to comment on this.

tye

[reply]