Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Querying Saints in our Book (PL/Perl)

by erix (Prior)
on Jul 29, 2015 at 22:34 UTC ( [id://1136804]=note: print w/replies, xml ) Need Help??


in reply to Querying Saints in our Book

PostgreSQL has stored procedures in perl (PL/Perl - Procedural Language/Perl).

Here is one example that retrieves the saints-page and shows the columns (sorry; I just parsed the html with a regex):

It needs plperlu (the 'u' stands for 'untrusted' as it lifts certain security limitations. It can be installed with: "CREATE LANGUAGE plperlu;").

I guess security is out the window when you retrieve stuff from remote sites straight into your database -- but this is just a demonstration. Be careful out there.

create function saints_in_our_book() returns table ( rank integer , userid integer , username text , xp integer , userlevel text , writeups integer , usersince timestamp , lasthere text ) language plperlu as $plp$ use strict; use warnings; use LWP::Simple; use IO::String; my $url = "http://perlmonks.org/?node_id=3559"; my $io = IO::String->new(get($url)); while ( <$io> ) { chomp; if(m, ^ <tr(?:[ ]class="highlight")?> <td[ ]align="right">([0-9]+)</td> # + rank <td> <span[ ]class="user-([0-9]+)"> # + userid <a[ ]href="[?]node_id=[0-9]+"> ([^<]+) # + username </a> </span> </td> <td[ ]align="right">([0-9]+)</td> # + xp <td>([a-zA-Z]+\s+\([0-9]+\))</td> # + userlevel <td[ ]align="right"> <a[ ]href="[?]node_id=6364;usersearch=[^"]+"> ([0-9]+) # + writeups </a> </td> <td[ ]align="right">([^<]+)</td> # + usersince <td[ ]align="right">([^<]+)</td> # + lasthere </tr> $ ,x ) { return_next ( { rank => $1 , userid => $2 , username => $3 , xp => $4 , userlevel => $5 , writeups => $6 , usersince => $7 , lasthere => $8 } ); } } return undef; $plp$;

Now you can query that remote HTML page with SQL. It is slow, of course; I guess it only makes sense to do this if you save it into a table (see CREATE TABLE t AS ...) or a materialized view (see CREATE MATERIALIZED VIEW mv AS ...).

The SQL query below shows the top 50 rows:

$ echo " select rank , userid , username , xp , userlevel , writeups , usersince , lasthere from saints_in_our_book() where rank <= 50 ; " | psql -qX rank | userid | username | xp | userlevel | wr +iteups | usersince | lasthere ------+--------+----------------------+---------+-----------------+--- +-------+---------------------+---------------- 1 | 979 | vroom | 1007449 | Pope (28) | + 607 | 1999-11-12 05:53:00 | 37 weeks ago 2 | 171588 | BrowserUk | 156800 | Pope (28) | + 22020 | 2002-06-04 18:12:00 | 13 minutes ago 3 | 381608 | ikegami | 126952 | Pope (28) | + 19570 | 2004-08-10 14:55:00 | 15 hours ago 4 | 5348 | Corion | 117813 | Pope (28) | + 9369 | 2000-03-14 13:56:00 | 4 hours ago 5 | 9073 | merlyn | 71106 | Sage (25) | + 6322 | 2000-04-25 22:41:00 | 5 weeks ago 6 | 461912 | GrandFather | 70930 | Sage (25) | + 6450 | 2005-05-31 03:02:00 | 8 seconds ago 7 | 22609 | tye | 69775 | Cardinal (24) | + 7769 | 2000-07-14 22:45:00 | 13 hours ago 8 | 616540 | moritz | 65723 | Cardinal (24) | + 6415 | 2007-05-21 12:33:00 | 23 hours ago 9 | 17000 | Ovid | 60423 | Cardinal (24) | + 2990 | 2000-06-08 04:57:00 | 1 week ago 10 | 85580 | dragonchild | 58436 | Archbishop (23) | + 5988 | 2001-06-04 23:41:00 | 4 days ago 11 | 1382 | chromatic | 58205 | Archbishop (23) | + 5941 | 1999-12-24 11:10:00 | 5 weeks ago 12 | 26179 | tilly | 57174 | Archbishop (23) | + 5646 | 2000-08-04 16:59:00 | 1 year ago 13 | 131741 | zentara | 56223 | Archbishop (23) | + 7239 | 2001-12-14 00:14:00 | 40 weeks ago 14 | 281137 | davido | 55541 | Archbishop (23) | + 4698 | 2003-08-05 19:52:00 | 16 hours ago 15 | 82147 | Zaxo | 54786 | Archbishop (23) | + 3416 | 2001-05-22 05:34:00 | 6 years ago 16 | 352046 | ww | 45944 | Bishop (22) | + 3539 | 2004-05-10 13:43:00 | 13 hours ago 17 | 290905 | Old_Gray_Bear | 45468 | Bishop (22) | + 587 | 2003-09-12 02:59:00 | 11 hours ago 18 | 29008 | grinder | 43661 | Bishop (22) | + 1570 | 2000-08-22 17:56:00 | 6 years ago 19 | 198160 | CountZero | 42943 | Bishop (22) | + 4284 | 2002-09-16 05:38:00 | 3 hours ago 20 | 324763 | marto | 42094 | Bishop (22) | + 4470 | 2004-01-28 20:01:00 | 24 hours ago 21 | 622051 | toolic | 41484 | Bishop (22) | + 3483 | 2007-06-19 16:21:00 | 10 hours ago 22 | 169744 | Abigail-II | 40866 | Bishop (22) | + 4089 | 2002-05-28 11:53:00 | 6 years ago 23 | 18800 | jeffa | 40192 | Bishop (22) | + 2764 | 2000-06-19 19:14:00 | 16 hours ago 24 | 22308 | dws | 38846 | Chancellor (21) | + 2536 | 2000-07-13 05:09:00 | 35 weeks ago 25 | 180961 | Limbic~Region | 38012 | Chancellor (21) | + 2960 | 2002-07-11 06:53:00 | 1 week ago 26 | 104919 | perrin | 36242 | Chancellor (21) | + 4171 | 2001-08-15 04:13:00 | 2 years ago 27 | 114691 | Aristotle | 35627 | Chancellor (21) | + 5335 | 2001-09-26 01:49:00 | 20 weeks ago 28 | 2680 | davorg | 34015 | Chancellor (21) | + 2988 | 2000-02-02 01:01:00 | 4 weeks ago 29 | 58196 | Fletch | 33799 | Chancellor (21) | + 3608 | 2001-02-13 23:08:00 | 2 days ago 30 | 80749 | tachyon | 32350 | Chancellor (21) | + 3268 | 2001-05-16 03:26:00 | 10 years ago 31 | 186362 | adrianh | 32159 | Chancellor (21) | + 2205 | 2002-07-30 20:57:00 | 2 years ago 32 | 44715 | graff | 31661 | Chancellor (21) | + 3880 | 2000-12-04 02:04:00 | 2 weeks ago 33 | 108447 | demerphq | 31508 | Chancellor (21) | + 3269 | 2001-08-28 18:16:00 | 1 year ago 34 | 248054 | Your Mother | 30978 | Chancellor (21) | + 2536 | 2003-04-04 13:42:00 | 14 hours ago 35 | 534893 | Gavin | 30799 | Chancellor (21) | + 598 | 2006-03-07 12:15:00 | 2 hours ago 36 | 533863 | roboticus | 30531 | Chancellor (21) | + 2888 | 2006-03-02 12:31:00 | 7 hours ago 37 | 176576 | eyepopslikeamosquito | 30092 | Chancellor (21) | + 1295 | 2002-06-23 06:02:00 | 13 minutes ago 38 | 442602 | planetscape | 30055 | Chancellor (21) | + 1290 | 2005-03-27 09:28:00 | 2 days ago 39 | 524150 | syphilis | 29621 | Canon (20) | + 2714 | 2006-01-19 07:30:00 | 4 hours ago 40 | 157432 | Joost | 29453 | Canon (20) | + 3245 | 2002-04-08 13:32:00 | 3 years ago 41 | 708738 | LanX | 29231 | Canon (20) | + 4599 | 2008-09-03 12:42:00 | 3 hours ago 42 | 1936 | japhy | 29144 | Canon (20) | + 2347 | 2000-01-10 19:44:00 | 42 weeks ago 43 | 224409 | borisz | 27898 | Canon (20) | + 913 | 2003-01-05 14:23:00 | 3 weeks ago 44 | 190859 | bart | 27165 | Canon (20) | + 2611 | 2002-08-17 14:36:00 | 51 weeks ago 45 | 528646 | Khen1950fx | 26518 | Canon (20) | + 1899 | 2006-02-07 22:12:00 | 12 hours ago 46 | 510280 | shmem | 26470 | Canon (20) | + 2519 | 2005-11-20 20:58:00 | 19 hours ago 47 | 421114 | Tanktalus | 26170 | Canon (20) | + 2178 | 2005-01-10 22:07:00 | 1 day ago 48 | 123953 | strat | 26010 | Canon (20) | + 538 | 2001-11-08 03:38:00 | 2 years ago 49 | 832495 | choroba | 25807 | Canon (20) | + 3418 | 2010-04-02 15:13:00 | 16 minutes ago 50 | 194920 | diotalevi | 25716 | Canon (20) | + 3030 | 2002-09-03 22:06:00 | 3 years ago (50 rows)

UPDATE:Some lines are missing as I apparently made some error in the regex; I haven't the time to fix it now (maybe later).

UPDATE 2: fixed. (error was: while (<$io>) { and then $io->getline *facepalm*)

UPDATE 3: I added 'User Since' and 'Last Here' columns, and a computed column 'away' (=lasthere cast to postgres data type interval).

-- clear earlier stuff drop table if exists saints_in_our_book_20150801 cascade; -- create the current html page download into a table: create table saints_in_our_book_20150801 as select * from sa +ints_in_our_book(); -- create a view out of that table (and add the 'away' colum): create or replace view saints_in_our_book as select *, replac +e(lasthere, ' ago', '')::interval as away from saints_in_our_book_201 +50801; -- query the view: $ echo " select * from saints_in_our_book -- the view on top of the table where away < '4 hours' order by now() - away desc , xp desc ; " | psql -qX rank | userid | username | xp | userlevel | wr +iteups | usersince | lasthere | away ------+---------+----------------------+--------+-----------------+--- +-------+---------------------+----------------+---------- 284 | 399498 | erix | 6400 | Vicar (15) | + 362 | 2004-10-15 13:22:00 | 4 seconds ago | 00:00:04 6 | 461912 | GrandFather | 70930 | Sage (25) | + 6450 | 2005-05-31 03:02:00 | 5 seconds ago | 00:00:05 108 | 70929 | atcroft | 15133 | Monsignor (18) | + 633 | 2001-04-09 08:59:00 | 12 seconds ago | 00:00:12 80 | 401112 | johngg | 18402 | Abbot (19) | + 1682 | 2004-10-21 10:23:00 | 17 seconds ago | 00:00:17 72 | 634253 | AnomalousMonk | 20401 | Abbot (19) | + 2819 | 2007-08-22 06:27:00 | 52 seconds ago | 00:00:52 634 | 310300 | wjw | 3212 | Curate (13) | + 371 | 2003-11-26 16:34:00 | 1 minute ago | 00:01:00 2 | 171588 | BrowserUk | 156800 | Pope (28) | + 22020 | 2002-06-04 18:12:00 | 5 minutes ago | 00:05:00 49 | 832495 | choroba | 25807 | Canon (20) | + 3418 | 2010-04-02 15:13:00 | 10 minutes ago | 00:10:00 37 | 176576 | eyepopslikeamosquito | 30092 | Chancellor (21) | + 1295 | 2002-06-23 06:02:00 | 13 minutes ago | 00:13:00 287 | 1001958 | karlgoethebier | 6345 | Vicar (15) | + 925 | 2012-11-02 12:43:00 | 20 minutes ago | 00:20:00 561 | 905403 | tangent | 3558 | Curate (13) | + 363 | 2011-05-18 02:41:00 | 21 minutes ago | 00:21:00 254 | 491819 | ioannis | 6917 | Vicar (15) | + 131 | 2005-09-14 11:10:00 | 27 minutes ago | 00:27:00 302 | 885521 | hippo | 6076 | Vicar (15) | + 505 | 2011-02-01 15:37:00 | 27 minutes ago | 00:27:00 350 | 221725 | poj | 5228 | Priest (14) | + 623 | 2002-12-22 10:42:00 | 27 minutes ago | 00:27:00 325 | 749850 | VinsWorldcom | 5667 | Vicar (15) | + 326 | 2009-03-11 12:42:00 | 38 minutes ago | 00:38:00 267 | 436161 | davies | 6628 | Vicar (15) | + 446 | 2005-03-03 11:39:00 | 42 minutes ago | 00:42:00 132 | 992570 | Laurent_R | 12485 | Monsignor (18) | + 1850 | 2012-09-09 08:56:00 | 48 minutes ago | 00:48:00 110 | 200365 | Tux | 14738 | Monsignor (18) | + 1440 | 2002-09-24 13:49:00 | 1 hour ago | 01:00:00 295 | 919638 | RichardK | 6195 | Vicar (15) | + 396 | 2011-08-10 11:34:00 | 1 hour ago | 01:00:00 35 | 534893 | Gavin | 30799 | Chancellor (21) | + 598 | 2006-03-07 12:15:00 | 2 hours ago | 02:00:00 71 | 968231 | Athanasius | 20547 | Abbot (19) | + 1233 | 2012-05-01 11:13:00 | 2 hours ago | 02:00:00 147 | 747201 | afoken | 11033 | Prior (17) | + 1139 | 2009-02-28 20:01:00 | 2 hours ago | 02:00:00 184 | 341121 | dave_the_m | 8997 | Parson (16) | + 849 | 2004-03-30 22:38:00 | 2 hours ago | 02:00:00 331 | 1015412 | soonix | 5615 | Vicar (15) | + 290 | 2013-01-25 21:42:00 | 2 hours ago | 02:00:00 367 | 222702 | parv | 5011 | Priest (14) | + 690 | 2002-12-28 08:05:00 | 2 hours ago | 02:00:00 19 | 198160 | CountZero | 42943 | Bishop (22) | + 4284 | 2002-09-16 05:38:00 | 3 hours ago | 03:00:00 39 | 524150 | syphilis | 29621 | Canon (20) | + 2714 | 2006-01-19 07:30:00 | 3 hours ago | 03:00:00 41 | 708738 | LanX | 29231 | Canon (20) | + 4599 | 2008-09-03 12:42:00 | 3 hours ago | 03:00:00 74 | 295576 | ambrus | 19603 | Abbot (19) | + 2740 | 2003-10-01 11:47:00 | 3 hours ago | 03:00:00 88 | 446266 | salva | 17322 | Abbot (19) | + 2243 | 2005-04-09 17:47:00 | 3 hours ago | 03:00:00 576 | 1039428 | Eily | 3498 | Curate (13) | + 278 | 2013-06-17 17:59:00 | 3 hours ago | 03:00:00 (31 rows)

Now I am at the top of 'Saints in our book' ;)

Replies are listed 'Best First'.
Re^2: Querying Saints in our Book (PL/Perl)
by chacham (Prior) on Jul 30, 2015 at 14:14 UTC

    Good stuff. I forgot about external tables. Though, i'm not sure PM would appreciate that too much anyway. :)

    I guess security is out the window when you retrieve stuff from remote sites straight into your database

    Every once in a while i want to query or move data between multiple environments. Normally, this would require a database link which requires authorization by the DBAs and so on. So, i installed a local database with database links. Very convenient, at least. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1136804]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 12:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found