kevyt has asked for the wisdom of the Perl Monks concerning the following question:

Can I capture data such as the ID number below from a url or the ID number that is seen when you view the source of the web page?

I would like to capture the ID number below so I can see who is viewing my profile on a website. I can't use java script but I can use perl. I thought about having an image display on the page that would count the visitors but I dont know if I can make something read the page to capture that data. There must be a way.

href="http://web_page.com/ID=72330785&MyToken=99999"

Replies are listed 'Best First'.
Re: capture data - html
by madbombX (Hermit) on Jun 29, 2006 at 02:50 UTC
    kevyt,

    Assuming you already have the URL that you want, its a simple regex. This regex makes a few assumptions, primarily that the URL will always look like this.

    print "UID: $1\n" if ($url =~ /^http:.*\/ID=(.*)&MyToken=.*$/);
    You could also push them onto an array and store all the values that you come accross:
    push (@uid_list, $1) if ($url =~ /^http:.*\/ID=(.*)&MyToken=.*$/);
Re: capture data - html
by dorward (Curate) on Jun 29, 2006 at 08:55 UTC

    I'm going to try to clarify what I think the question is before attempting to answer it:

    There are four parties

    • You
    • A provider of webspace (X) that gives very limited access to what you can add to the page
    • A provider of webspace (Y) that lets you run server side Perl
    • A visitor

    You want to know when a someone visits your page on site X.

    Since you have no access to the logs of X, cannot add JavaScript to X, cannot run server side scripting on X, then the only way to get the information is to cause the visitor to make a request to something on Y.

    About the only way to do this is to include an image on that page which is loaded off server Y. This isn't entirely reliable as users can disable iamges, disable images from remote sites, etc, etc

    Without putting a seperate URL to the image for each id on X you can't reliably know the ID.

    HTTP provides a referer header, this will tell you the URL that referred the user agent to another URL (i.e. the URL of the page that linked to the image).

    The referer header is, however, optional and many personal firewall products munge it.

    There is no reliable way to do what you want.

      You make good points, but instead of images I would rather include a link to a css residing on Y, which is more likely to be loaded. And then, for IE Browsers there's a way to sneak JavaScript in via a css statement, eg.
      css: a { behaviour: url('http://example.com/sneak.htc'); } sneak.htc: <PUBLIC:ATTACH EVENT="onmouseover" ONEVENT="DoHover()" /> function DoHover() { // JavaScript statements... }
      but as said, that only works with buggy browsers full of holes... ;-)

      But! that could just be cheating the provider and should not be lightly undertaken, and if they have a clue, they may block offsite CSS as well (ah, and offsite images.. ) .

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Thanks Shmem....

        I am try this later this afternoon :) I informed the provider that I was able to place a mortgage calculator that I created in perl on their website, that I did not think should be permitted. I guess they dont check for perl because most kids trying to hack sites dont know perl. I would not want to hurt their site in anyway. They are trying to make money and they are offering a free service. Now if it was Bill Gates web site ... hmmmm :)

        Thanks,
        Kevin
      dorward,

      You are exactly right with your assumption... thanks
      ahhhh.... You are exactly right.... The site that I am trying to do this on is my-space.. I added the "-" so someone wont find this posting on a google. I will give this a try :)

      I always try to find ways to do things when people say it cant be done. I guess I like a challenge :)

      Thanks
Re: capture data - html
by rsriram (Hermit) on Jun 29, 2006 at 07:08 UTC

    Hi, If you want to store the id value to a memory variable, you can have (assuming that the content of href is stored to $str)

    $str =~ /ID=([^\&]+)\&/g

    With the above regex, $1 will contain the ID value. If you want to add the ID to a array, use

    if ($str =~ /ID=([^\&]+)\&/g) {push (@idlist, $1)}

Re: capture data - html
by kevyt (Scribe) on Jun 29, 2006 at 03:48 UTC
    Thanks... I wish I could do that but I dont have the URL

    I have a profile on a website and I am trying to determine who looks at my profile. I only have access to edit a few text boxes and they block java script.

    I have my own website which has perl, so I wanted to place code in the text boxes on one site and save the logs on another.