acessing the contents of a file character by character

gvs_jagan has asked for the wisdom of the Perl Monks concerning the following question:

hi , i want to write a cgi code to open a file and access its contents character by character ( what fgetc does in C) , and then simultaneously search for a pattern (say /blue/ ) and output the results into another file.

How to access the file content chracter by character , i have tried using to do with the "do" function and $/ , but the problem is that in this manner a pattern is looked for only once .

here is the url http://83.138.185.127/cgi-bin/search3.cgi iam pasting the code below .
regards , jagan

----------------------------------------------------------

#!/usr/bin/perl -T

use CGI qw(:standard);
use CGI qw(warningsToBrowser fatalsToBrowser);

my $file1 = param("fil1");
my $key1  = param("phr1");
my $st1   = param("str1");

print header, start_html(-title=>"Searching for a pattern in a file" ,
+ -bgcolor=>"#cccccc");
h1("Searching for a pattern in a file" );

if(param())
{
my $file1 = param("fil1");
my $key1  = param("phr1");
my $st1   = param("str1");
my $i =0 ;

open(FILE , "$file1") or die("could not open the file $!\n");

while(<FILE>)
{
my $st1 = do{local $/ , <FILE>};
if($st1=~ m/$key1/)
{
$i++;
print p("The pattern <b> $key1</b> has been found <b> $i</b> times.\n"
+);
}
else
{
print p("The pattern <b> $key1</b> has not been found.\n");
}
print p "<a href=http://83.138.185.127/cgi-bin/search3.cgi> Try Again 
+</a> " ;
}
}

else
{
print hr() , start_form();
print p("Enter the file name you wish to make a search into" , popup_m
+enu("fil1" , ['file1' , 'file2' ,'file3' ,'file4']));
print p("Enter the key word you wish to search for " , textfield("phr1
+" , ""));
print p(submit("go") , reset("clear"));
print end_form() , hr();
}

end_html();
[download]

Comment on acessing the contents of a file character by character Download Code

Replies are listed 'Best First'.
Re: acessing the contents of a file character by character by Fletch (Bishop) on Jun 10, 2005 at 11:53 UTC
`my $i = 0; $i++ while $stl =~ /$key1/g;` [download] See `perldoc perlre` for the `/g` flag. And in general if you're trying to access a string character-by-character you're not thinking Perl, you're thinking C (or the like). -- We're looking for people in ATL	[reply] [d/l] [select]
Re: acessing the contents of a file character by character by tlm (Prior) on Jun 10, 2005 at 12:22 UTC
Why would you want to read a file character by character when you're looking for a pattern (like `/blue/`) that is longer than a single character? Not that it is impossible to do this, but it looks to me like you are setting up a "Blind Men and The Elephant"-type situation for yourself, for no good reason. Any way, check out `sysread`. the lowliest monk	[reply]
Re: acessing the contents of a file character by character by Limbic~Region (Chancellor) on Jun 10, 2005 at 12:29 UTC
gvs_jagan, Like others, it seems odd to me that you would want to read character by character if you are trying to match a pattern than spans multiple characters. With that said, it can certainly be done. If the pattern match is always fixed width, you can use a sliding buffer window. The buffer fills up to the width of the pattern you are looking for and when you add a new character the oldest one falls off. To see how to read character by character: `{ local $/ = \1; while ( <FILE> ) { my $window = adjust_window( $_ ); next if $window ne $search; # .... what to do if you find the match } }` [download] Cheers - L~R	[reply] [d/l]
Re: acessing the contents of a file character by character by jpeg (Chaplain) on Jun 10, 2005 at 12:43 UTC
splitting on an empty string will give you an array of chars which you can loop through, looking for the start of your pattern, then the next char of your pattern, etc. It's kind of.. unsophisticated, but so is examining every character when perl has a perfectly good regex engine. -- jpg	[reply]
Re: acessing the contents of a file character by character by TedPride (Priest) on Jun 10, 2005 at 17:12 UTC
How big are the files? How many searches are going to be done vs how many updates to the files? If there are going to be many searches and few updates, I'd personally run them through a parser first and eliminate non-word characters and dead space, then lowercase what's left and save to copies of the original files. Then the copies are what I'd search for the keywords. If the files are under say 20 MB and there are going to be lots of searches, I could load each file into memory in its entirety instead of line by line, and assuming I only cared about whethere the keywords were present in the file or not, I could use index (since I lowercased) rather than regex, which is significantly slower with the i flag on. I don't know what you're doing, but it's the wrong approach.	[reply]