Well, this is a VERY localized script - but I thought it was cool, seems to be working quite well, and I am a palm user that does not surf with it. This script will grab the daily headlines from www2.alberta.com/news and (using cron) update my palm pilot with the file before I leave. I am ready to be critiqued :-)
#!/usr/bin/perl -w ########################################### # This will hopefully become a full # # fledged perl script that downloads # # news from alberta.com and converts # # to txt or some other format readable by # # palms the idea is to run this nightly # # using cron, and schedule syncs with the # # Palm Pilot # # - Slycer 10/19/2000 # ########################################### use strict; use LWP::Simple; my $content; #the content of the alberta.com ind +ex my $workingdir = "/home/slycer"; #the dir to write all the fi +les to my $url = "www2.alberta.com/news"; #the url of alberta.com news my $outlist= "abcom.htm"; #the outfile for the above my $outhtm= "news.htm"; #the outfile for all the urls below my (@urls, #the list of good urls @end_ref, #the ending line of the stories @start_ref,); #the starting line of the stories my %storieshash; #the hash that contains the text my $nfile="news.txt"; #the news file that we read/write t +o (plain #text) my $lines="83"; #the amount of CRAP between stories my $conprog="/usr/bin/txttopdb"; #the script to convert to pdb format my $final="stripped.txt"; #the final file to convert my $pdbfile="news.pdb"; #the name of the pdb file my $title="DailyNews"; #the Title (as displayed in the Pal +m) chdir $workingdir; #####First go get the index file from alberta.com###### open (OUT,">$outlist")||warn "Could not create alberta.com news"; unless (defined ($content = get "http:\/\/$url")) { die "Unable to access abcom website : $!"; } print OUT $content; close OUT; #####Make the URLS usable##### open(CLGR,"<$outlist"); while (<CLGR>){ if (/.*fs\.*/){ s/.+href=\"//i; s/\".+//; chomp ($_); $_= "\"http:\/\/$url\/$_\""; print "$_\n"; push (@urls => $_); } } close (CLGR); ###Done creating our urls - now go get the stuff - uses the subroutine +### &fetch; ###Convert it all to text - don't bother reinventing the wheel### my $text= `lynx -dump -nolist $outhtm`; open (NEWS => ">$nfile"); print NEWS $text; close (NEWS); ###Now that we have a plain text file - strip the CRAP out &strip; ### call the txttopdb script (from freshmeat) to convert to Palm forma +t #for some reason having troubles finding the files!! #so we will joing the dir and file name... $final=join '/',$workingdir,$final; $pdbfile=join '/',$workingdir,$pdbfile; `$conprog -t $title $final $pdbfile\n`; ###lastly - do some cleanup unlink ($outlist,$outhtm,$nfile,$final)|| warn "Could not erase the fi +les $!"; ###Subs from here on down### ####### #FETCH# ####### sub fetch{ open (HTM => ">$outhtm")||warn "wtf.. $!"; chomp (@urls); foreach (@urls){ print "getting $_\n"; unless (defined ($content = get $_)) { warn "could not get $_\n"; } print HTM $_; print HTM $content; } close (HTM); } ####### #STRIP# ####### sub strip{ ###This part finds out the ENDS of each story### $end_ref[0]=0; #set first story to begin @ line 0 open (NEWS,"<$nfile") #the news file in plain text.. ||warn "Could not open news file $!"; my $i=1; while (<NEWS>){ $storieshash{$i}=$_; if ($_ =~ /.*opyright*/){ #all of them end in "copyright" push (@end_ref=>$i); #a reference to the hash that marks t +he end #of the story } ++$i; } ###This next part needs to figure out how many lines between the END o +f ###story (marked by "copyright") and the beggining of the next ($lines ###l8r) Yes, this is aimed strictly at alberta.com foreach (@end_ref){push (@start_ref => ($_ + $lines))} shift @end_ref; #get rid of the zero pop @start_ref; #get rid of the last story open (FINAL,">$final"); foreach (@end_ref){ print FINAL "\n Story starts here\n"; for ($i = shift(@start_ref);$i <= $_;$i++){ print FINAL "$storieshash{$i}"; } } close FINAL; }

In reply to Alberta.com on the Palm by the_slycer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.