multiline regex

jcr has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am trying to read a file consisting of HTTP GETs and POSTs. I've got the GETs working, but I am having problems getting the POSTs working. I want to construct a complete HTTP POST request from the info in the file (plus some other stuff)

Here is what the file i'm reading looks like:

get /servlet/folder/Create

post /servlet/folder/CreateAction
content-type: application/x-www-form-urlencoded
content-length: 38

FolderName=Data+Source+Tests&B1=Create
get /servlet/folder/List

get /pix/jsmenu.js

post /servlet/datacab/FileNewUploadAction
content-type: multipart/form-data; boundary=--------------------------
+-7cf29e18894
content-length: 1632

-----------------------------7cf29e18894
content-disposition: form-data; name="UploadedFile"; filename="mylarge
+file"
content-type: text/plain

Index    Date Created    CCK    Std    suite    city    state    ZIP/P
+ostal    Country    first    title    type of biz    distribution    
+bundle?     6 mo forecast    next 6 month
[download]

I want to isolate (1)the relative URLs as well as the (2) content length and (3) the entire post body.

How can I do this with a simple regex?

This is what I have so far, but it does not work!

open(FILE, $scriptfile) || die "Cannot Open $scriptfile\n$!\n";

undef $/;
$_ = <FILE>;

    if ($_ =~ /post\s(.+\n)(.+\n)content-length:\s(\d+\n\n)(.+\n)/s) {
+print "relURL: $1\nContent length: $3\npost_body: $4\n\n"}
    

close(SCRIPT);
[download]

thanx,

jcr

Comment on multiline regex Select or Download Code

Replies are listed 'Best First'.
RE: multiline regex by bastard (Hermit) on Oct 17, 2000 at 04:44 UTC
As I can see it you have 2 problems: Parsing a text file for action information Composing http requests I will try to address then in that order (tho mabey not very well). 1. Parsing a text file. To start with, is there any need to do a multiline regex? Unless you've formatted the datafile for our sake, one line at a time ough to work just fine. foreach $line (@file) { chop $line; if ($line =~ /^get (.+)$/) { push (@getrequest, $1); } if ($line =~ /^post (.+)$/) { $postrequest = $1; $dataread = "true +"; } if ($dataread eq "true") { if ($line =~ /content-type: (.+)/) { $post{$postrequest}{content-type} = $1; if ($post{$postrequest}{content-type} =~ /; boundary=(.+)/ +) { $boundary = $1; } else { $boundary = ""; } } if ($line =~ /content-length: (.+)/) { $post{$postrequest}{con +tent-length} = $1; } if ($line eq "" && $boundary = "") { $contentnextline = "true" +; } if ($contentnextline eq "true") { $post{$postrequest}{content} + = $line; $contentnextline = "false"; $dataread = "false"; } if ($line eq $boundary && $boundary ne "") { $content_flag = " +true"; } if ($line eq $boundary && $content_flag eq "true") { $content_ +flag = "false"; $dataread = "false"; $post{$postrequest}{content} = \ +@content; } if ($content_flag eq "true") { push (@content, "$line\n"); } + } [download] Now this should (i haven't tested it) create an array and a hash of hashes with all the data you need for part 2. Update: Ack! this post made it through. My browser hung on the submit. I didn't think it made it, and i had to run out before finishing.... Now for part 2. You may not want to do it this way, but use the "LWP" module. It is going to give you the best, cleanest, and simplest way for the http submits. The link above should contain all the docs you need to create the http requests. Just loop through the elements in the get array, and the hash, and use the data to formulate the requests. BTW- I have not tested any of the code. It would probably have to be debugged and tweaked to fit your needs.	[reply] [d/l]
Re: multiline regex by chromatic (Archbishop) on Oct 17, 2000 at 04:26 UTC
You can use the /m or /s modifiers on your regex to match newlines. See perlre for more details. (Search for 'newline' and it'll put you right there.) The former makes the ^ and $ anchors match any new line in your string, and the latter makes . match newlines. Subtly different, but very useful.	[reply]
Re: multiline regex by AgentM (Curate) on Oct 17, 2000 at 01:51 UTC
This sounds a job for CGI.pm. It will give you a nifty hashref in the header() for this kind of stuff. Also, you may want to look into the UNIX `file(1)` command which has a special place in its heart for MIME types. If you're dealing with images ONLY (it doesn't look like it, though), you should use Image::Magick which will cough up the MIME type for images. In any case, there is no need (and it's safer) to use a tried and true module of program (file) for your MIMEing needs. Q: If Outlook automatically opens certain MIME types, and no one is around to see it, does the embedded virus remain dormant? AgentM Systems or Nasca Enterprises is not responsible for the comments made by AgentM- anywhere.	[reply]
Re: multiline regex by jcr (Initiate) on Oct 17, 2000 at 03:45 UTC
Actually, I'd rather not use CGI.pm to do this. It would be nice to know how to match multilines with Perl RegEx. For example, how do you match this: `line 0=don't match me line 1=match me! line 2=don't match me either line 3=match me too! line 4=empty line 5=match me three! line 6=empty` [download] Make all these matched linse belong to 'line 1'. then match another line as well as its corresponding set of lines lines.	[reply] [d/l]

AgentM Systems or Nasca Enterprises is not responsible for the comments made by AgentM- anywhere.