The type of URL's I'm downloading the data from are as follows:
What I want to extract from this is:http://www.domain.com/data/2005/sales/01012005.txt http://www.domain.com/data/2005/sales-jan/01232005.txt http://www.domain.com/data/2005/sales-local/01012005.txt http://www.domain.com/data/2005/sales-outside-jan/01012005.txt ... ...
sales
sales-jan
sales-local
sales-outside-jan
...
...
The regex I have come up with to extract the information is:
$dir = $1 if /\/(\w+(|-\w+|-\w+-\w+))\/\w+\.txt$/;
This regex appears to be working correctly. My question is am I going about it the right way? Could I have shortened the regex somehow?
Thanks,
Mike
Update: Corrected typo in regex that GrandFather found.
Thanks GrandFather and ikegami for the suggestions. Yes, I should have used a different delimiter, the leaning toothpicks are confusing. I understand GrandFather's suggestion, but I'll have to study ikegami's suggestion a bit. Thanks!
Update II: Thanks to all for the great responses / ideas. Thanks to davidrw & YuckFoo for their suggestions on the split. Frankly I hadn't even thought of that. I became so wrapped up in the regex to get the directory, I hadn't even thought about the filename yet. Clearly a case of not seeing the forest for the trees.
In reply to Regex question by cajun
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |