in reply to Re: Email - jquery
in thread Email - jquery

I'm not sure... I want to extract the email from this page: http://silkeborgkommune.dk/Politik/Byraadet/Silkeborg-Byraad/Steen-Vindum When I mouse over the text "Send E-mail" I'm able to see the mailto:-text but I don't know how to scrape it. Can you give me any hints? Thanks :-)

Replies are listed 'Best First'.
Re^3: Email - jquery
by davido (Cardinal) on Feb 05, 2014 at 19:34 UTC

    A little further down in the source for the page you will find <script src="/static/js/main.js"></script>, and following that link to http://silkeborgkommune.dk/static/js/main.js you will find the definition of the SetEmailLink function:

    function SetEmailLink(id, linkCiphers, textChipers, key) { var link = ""; var text = ""; for (var i = 0; i < linkCiphers.length; i++) { var linkChar = linkCiphers[i] - key[i % key.length]; link += String.fromCharCode(linkChar); } for (var j = 0; j < textChipers.length; j++) { var textChar = textChipers[j] - key[j % key.length]; text += String.fromCharCode(textChar); } var element = jQuery("#" + id); if (element.is("a")) { if (text != "") { element.html(text); } if (link != "") { element.attr("href", link); } } }

    It looks like a pretty straightforward cypher to re-implement in Perl. Figure out what that is doing, and you will be a step closer to automating the deobfuscation of the email address, as well as closer to violating the presumed intent of hiding the email address in cypher form.

    Here's the relevant part, rewritten in Perl:

    sub SetEmailLink { my ( undef, $linkCiphers, undef, $key ) = @_; my $link = q(); for( my $i = 0; $i != @$linkCiphers; ++$i ) { my $linkChar = $linkCiphers->[$i] - $key->[$i % @$key]; $link .= chr($linkChar); } return $link; }; print SetEmailLink('phmain_1_phrightcontent_0_lnkEmail', [113,104,125, +164,118,187,87,149,128,146,184,114,53,138,161,112,176,146,143,76,160, +188,112,114,121,154,113,190,132,80,112,152], [], [4,7,20,56,2,76,29,3 +4,12,45,83])

    Note, there's also an illegal character embedded between "146" and ",184" which you will need to deal with. After removing that, here is the (partially obscured) output:

    mailto:st###.vind##@silk#####.d#

    Which goes to show, attempting to obscure the target of mailto: links is an inferior approach compared to server-side alternatives.


    Dave

      Wauv, thank you for your help and advice! Actually, I will contact the web site responsible to ask if it is okay that I scrape their e-mails. But no need to do that if I don't know how to scrape ;-)

        At the risk of upsetting the keepers of the "Thou shalt not parse web pages with regular expressions" commandment (who are almost always correct), here is a fragile solution that works, for the sample input you provided and for the web page complete:

        use strict; use warnings; use utf8; use feature qw/unicode_strings say/; my $doc = do{ local $/ = undef; <DATA>; }; say GetEmailLink($doc) // 'Unable to parse document.'; sub GetEmailLink { my $document = shift; my %component = fetch_obscured_email($document); return unless keys %component; # Detect and pass along failure to pa +rse. my $link = q(); for( my $i = 0; $i != @{$component{cypher}}; ++$i ) { my $linkChar = $component{cypher}[$i] - $component{key}[$i % @{$co +mponent{key}}]; $link .= chr($linkChar); } return $link; }; sub fetch_obscured_email { my $data = shift; $data =~ m/ SetEmailLink\s*\(\s* # Function name and opening paren +(anchor). [^,]*, # Unwanted first parameter. \s* \[ \s* ( [^]]+ ) \s* \] \s*, # Wanted second parameter. [^,]*, # Unwanted third parameter. \s* \[ \s* ( [^]]+ ) \s* \] \s* # Wanted fourth parameter. \s*\) # Closing paren. /x or return; # Condition: Failure to parse. my( $text_param, $key_param ) = ( $1, $2 ); tr/0-9,//dc for $text_param, $key_param; # Keep only what we need +. return( cypher => [ split /,/, $text_param ], key => [ split /,/, $key_param ] ); } __DATA__ <script type="text/javascript"> //<![CDATA[ jQuery(function () { SetEmailLink('phmain_1_phrightcontent_0_lnkEmail', [113,104,125,164,11 +8,187,87,149,128,146,184,114,53,138,161,112,176,146,143,76,160,188,11 +2,114,121,154,113,190,132,80,112,152], [], [4,7,20,56,2,76,29,34,12,4 +5,83]); });//]]> </script>

        Dave