Re: Perl Programming Logic
by VSarkiss (Monsignor) on Jul 01, 2002 at 18:49 UTC
|
This isn't really a Perl problem, it's a data problem.
You can't really tell how long someone spent reading a web page from looking at logs, in general. The problem is that the log only has the times the browser sent out a GET or POST or other HTTP request, and when the web site responded (generally). You can't tell from that how long someone "interacted" with a site. I can download a Java or Flash game from a site with one GET, then spend several hours playing with nothing getting logged. Similarly, I can retrieve a single page, close the browser, and retrieve another page several hours later. The log can't tell you that I didn't even have the browser open in between those two times.
| [reply] |
Re: Perl Programming Logic
by caedes (Pilgrim) on Jul 01, 2002 at 18:50 UTC
|
It seems that your understanding of the client-server communications for HTTP are a bit inaccurate. All that the server sees when a user goes to the site is "give me xxxxxx" followed possibly by "Give me yyyyyyy". You might interpret that to mean that the user spend the time between viewing xxxx and yyyyy looking at the page, but that isn't neccesarily the case. Another point is that it is impossible to tell when a user "closed the browser", however you can assume that they have left your site after they don't request a document for a given length of time. The whole here being that you have to make reasonable assumptions in order to hopefully get an idea about what might have happened.
As for solving your problem, I would probably split the log files up by IP address (which may or may not stay the same for a given user, but that is another discussion). then set a length of time that you consider to be too long to view one page, say 1 hour. Then whenever one hour goes by for a single user between page request you interpret that to mean the person quit surfing.
I hope this helps you out some. ;-)
-caedes | [reply] |
|
|
By the way, I know this must be possible as I used to work for a company that used a product (relatively expensive, I might add) called "WebTrends for firewalls & VPN's" that would do exactly what I would like to do.
| [reply] |
|
|
It is trivial to produce a number and claim it means something.
It is much harder to produce a number that really means what you have claimed.
The fact that a proprietary product claims to accomplish a goal is not always very good evidence that that goal is, in fact, technically accomplishable.
| [reply] |
Re: Perl Programming Logic
by newrisedesigns (Curate) on Jul 01, 2002 at 19:57 UTC
|
You'll need to either get a better tracking system (other than a log), or you'll have to get creative.
For tracking, might I suggest first pulling out all instances of each user and dumping them into one group. Put all your 192.168.1.214's into one group, and all your other IPs into their own groups. Then, check all the URLs of each person. To make it easier, you could strip out requests for known places like ads.x10.com, seeing that that request is most likely a pop-up and not an intended request.
Creativity steps in here. You will have to assume that a user that requests /index.html /left.html /right.html in 5 seconds just loaded a frameset. Consider that 1 request. Now your user gets /index.html, then /blue.html, then /red.html, then /green.html over a 3 minute period. Four requests, that did not occur in a few-second time frame. The user should be considered "surfing" for those 3 minutes, because those requests are more than likely requests made by a human.
Now bringing relative time into the situation can cause some headaches, too. When I surf, I usually have 3-10 windows open. You will need to find some way to distinguish a clicked link from a cold request and a method to determine what kind of information is held on the requested page. You could do that with LWP, can read through the file to see if there's tags for Shockwave games, streaming video, large amounts of text (online books) and draw conclusions from those results and your request log.
It can be done, but it will be difficult without some other form of recording information. If you have the resources, you could set up a webproxy that monitors clicked links, and observe information that way. You could avoid Perl altogether, install BackOrifice on the machine you want to monitor and watch what your users are viewing.
Anonymonk, might I suggest that you create a user name and stay awhile. :)
John J Reiser
newrisedesigns.com
| [reply] |
Re: Perl Programming Logic
by perigeeV (Hermit) on Jul 01, 2002 at 19:08 UTC
|
To accurately track a person's click path you need more than logs. Log information cannot differentiate between users sharing a proxy, or different users that have been sequentially assigned the same communal IP adddress, like a dialup ISP user.
You can assign a session ID to a user and track that ID number. Super Search for "maintaining state" or some such.
To really know how long someone is viewing a page you would have to use a client refresh at timed intervals. For instance, you could have some javascript that updates a dummy one-pixel image at regular intervals. The refresh would include that users unique ID, thus you just sum the times between each refresh.
| [reply] |
Re: Perl Programming Logic
by Abigail-II (Bishop) on Jul 02, 2002 at 10:12 UTC
|
Your biggest problem is not the logic of the problem, but
the logic of what you want to do.
You are data mining over HTTP log files. HTTP is essentially
a stateless, sessionless protocol. Yet you want to measure
the length of "sessions" somehow.
You're up for a failure. You want to measure something that
isn't really there. And it isn't just "quitting browsing"
that will spoil your day. When I hit "preview" in a minute,
it's likely that it takes a while before the request goes to
perlmonks, perlmonks does what it wants to do, the request is
back, the ad has been fetched and the page is displayed. I'll
switch to IRC, p5p or do some actual work before I return my
attention to the preview page. There might be 20 minutes between
hitting 'preview' and 'submit' on the next page. Did I "browse"
for 20 minutes? No, I probably won't even spend 20 seconds.
Oh, did I mention that the user name for the proxy I'm using
is shared with a whole bunch of people, and that we're rotating
between several proxies? That would really screw up your
analysis, wouldn't it? ;-)
My suggestion: give up on the idea. It's utterly useless,
the data you have can't measure what you want to measure,
and what you want to measure doesn't have much connection
to what you want to know anyway.
Abigail | [reply] |
Re: Perl Programming Logic
by grantm (Parson) on Jul 02, 2002 at 11:48 UTC
|
The author of Analog
has written an article on
how the
web works and what can and can't be determined by analysing
logs.
I have spent quite a lot of time working with both Analog
and WebTrends. I recommend the former (with
Report Magic) for people who
want to understand site usage patterns and the latter for
people who have lots of money and a willingness to base
business decisions on nicely presented but completely
meaningless numbers. Even when I phrase it exactly
like that, it's amazing how many clients go for the latter.
| [reply] |
Re: Perl Programming Logic
by arc_of_descent (Hermit) on Jul 02, 2002 at 13:59 UTC
|
Hi,
If the time spent in retrieving the particular page
justifies your meaning of time spent in viewing
the page, then the value could be of use to you.
For example,
squid proxy server, writes the time spent in ms,
used to retrieve a particular web object, in its logs
--
arc_of_descent
| [reply] |