Logfile parsing (Moved from Q&A)

Russ has asked for the wisdom of the Perl Monks concerning the following question:

This question moved from Categorized Questions and Answers by Q&AEditors
Please do not vote for this node. It will affect the wrong user.
Thank You -- Q&AEditors

Hi, I need anyones help, I need to pull information from web logs which contains the string
"GET /company/newsletter.html?id+"

I need to use regular expressions to pull the numbers after the + sign. I have tried using /\+(\d+)/ so that it grabs the digits after the + sign but that doesn't seem to work.

Anyone know how to do this? You'll be a life saver.

Below is how the data looks in the file:

(All on one line.  Broken here for horizontal conservation -- Editor)
141.85.128.29 - - [25/Jul/2000:05:48:16 -0700] "GET /company/newslette
+r.html?id+3541596 <\n added>
HTTP/1.0" 200 2579 www.xdrive.com <\n added>
"http://us.f12.mail.yahoo.com/ym/ShowLetter?MsgId=8894_591984_4628_766
+_15951_0&YY=91113&inc=50&order=down&sort=date&pos=0&box=Inbox" <\n ad
+ded>
"Mozilla/4.7 [en] (Win98; I)" "141.85.128.29.11162963062035354"
[download]

</code>

Comment on Logfile parsing (Moved from Q&A) Download Code

Replies are listed 'Best First'.
Re: Logfile parsing (Moved from Q&A) by steveAZ98 (Monk) on Jul 27, 2000 at 06:16 UTC
If the id # is all you want then you can get it using this. `$s =~ s/.\+(\d+)\s./$1/;` [download] If you need to get the full line that contains that specific page, including all the information you should be able to use this. `print $s if $s =~ /^.GET\s\/company\/newsletter\.html\?id\+(\d+)\s.$ +/;` [download] Notice that I save the id# in $1 if you need that for further processing. HTH Update: I just read Death to Dot Star by Ovid, so there are probably better ways to do this. If I have a chance I'll see if I can figure something out.	[reply] [d/l] [select]
Re: Logfile parsing (Moved from Q&A) by chromatic (Archbishop) on Jul 27, 2000 at 04:10 UTC
Something like the following ought to do it: `my @ids; while (<LOG>) { if (/id\+(\d+)/) { push @ids, $1; } }` [download] That's not substantially different from what you have, so you should probably post the code you're using so we might have more detail.	[reply] [d/l]
Re: Logfile parsing (Moved from Q&A) by young perlhopper (Scribe) on Jul 27, 2000 at 04:18 UTC
When I'm having trouble with a regex, I try to use as many different correct variations as I can, and see if any of them work. This helps me to rule out bad escaping and the like as possible causes. You might try the following: `/\?id\+(\d+)/ /.id.(\d+)/ /(/d{7})/ /.id.([^ ]+)/` [download] and whatever else you can think of. Try reducing it to the simplest possible regex you can think of that will work, even if you know it won't work in the general case, or even if it isn't as robust as you want it to be. If not of this helps, it's time to run the debugger on your code and make sure the string is getting read in as you think it should be. Good Luck, Mark	[reply] [d/l]
Re: Logfile parsing (Moved from Q&A) by fgcr (Novice) on Jul 28, 2000 at 05:29 UTC
Thanks guys... I ended up using /\/company\/newsletter\.html\?id.(\d+)/; to extract the id. Your suggestions really helped. thanks again. fiorela	[reply]
Re: Logfile parsing (Moved from Q&A) by fgcr (Novice) on Jul 28, 2000 at 05:29 UTC
Thanks guys... I ended up using /\/company\/newsletter\.html\?id.(\d+)/; to extract the id. Your suggestions really helped. thanks again. fiorela	[reply]