myfrndjk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks I am here to seek a help from you .I have perl-cgi script ,when called it has to crawl the site and print the crawled data both in web page and as text/html file.If I use command prompt to execute the script it works fine, however if I execute through browser contents displayed only in web page not printed in external file.Thanks in advance

#!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder::XPath; use LWP::UserAgent ; use HTTP::Request ; open (OUT, '>:encoding(cp1252)',"/home/local/ANT/hemesh/Desktop/test.h +tml"); my $URL = 'http://www.7dayshop.com/delivery-and-returns'; my $agent = LWP::UserAgent->new(agent => "Mozilla/5.0"); my $request = HTTP::Request->new(GET=> $URL); my $response = $agent->request($request); # Check the outcome of the response if ($response->is_success) { my $xp = HTML::TreeBuilder::XPath->new_from_url($URL); my $node = $xp->findnodes_as_string('//strong[contains(.,\'UK Mainland + Standard\')]'); print "Content-type: text/html\n\n"; print OUT $node and print $node; } elsif ($response->is_error) { print "Error:$URL\n"; print $response->error_as_HTML; }

Expected result

Web page: my crawl data

test.html in desktop

Replies are listed 'Best First'.
Re: print result in text/html file-perl cgi
by graff (Chancellor) on Jul 27, 2014 at 20:49 UTC
    When you say "execute through the browser", I presume you mean "send a request to a web server that happens to be running on the same machine" -- i.e. the machine you're using when you execute the script as a shell command. Or at least, you're using a network where both your shell and the web server are able to access the same directory path.

    If that's not true, then that's you're problem. But if it is true, then the problem is probably due to permission settings on the directory (and on the file "test.html", if it happens to exist already when you "execute through the browser"): you probably own the directory (and the file), but the web server that executes your script "through the browser" is not a process owned by you, and that process owner would need to be granted write-access to the directory (and to the file, if it already exists).

    Bear in mind that there are risks involved when granting such permissions to processes that run via a web server. Assuming that your system is managed by one or more qualified sysadmins, they would (and should) have something to say about the code you are running on your web server, when this sort of access is involved. (For example, they might advise that the writing of data files by the server be restricted to a specific path that isn't in your own home directory.)

      Hi thanks that is due to permission issue

Re: print result in text/html file-perl cgi
by 1s44c (Scribe) on Jul 27, 2014 at 21:03 UTC

    That can't be all the code, it only prints to OUT, not OUT and STDOUT.

    Look at File::Tee for an easy way to output to two filehandles. Plus put a 'or die' after the open or 'use autodie'. My guess is that the user running the CGI script can't open the output file.

Re: print result in text/html file-perl cgi
by Anonymous Monk on Jul 27, 2014 at 20:51 UTC

    Is that the actual code you're having trouble with? Because I don't see where you print the data to standard output. You'll have to print the data to both standard output and to the file if you want it to appear in both places...

    If the file isn't being created, what user ID is your webserver executing as? Does that user have write access to your desktop? Also, do the error logs of your webserver say anything?

    Also, why do you fetch the $URL twice? (once with LWP::UserAgent and once with HTML::TreeBuilder's new_from_url?)

Re: print result in text/html file-perl cgi
by Anonymous Monk on Jul 27, 2014 at 22:32 UTC

    That "and print $node" wasn't there before. Please mark your updates - see How do I change/delete my post?

    Did you try the other suggestions, such as adding the or die $!; to the open and checking the write permissions?

      Thanks all for you suggestions.That is due to permission issue .I printed the file in "www" itself now it is working.