in reply to Re^2: REGEX for url
in thread REGEX for url
use strict; use warnings; for(<DATA>){ print if s/.*a href="(.*)".*/$1/; } __DATA__ <td scope="row">9</td> <td scope="row">SUBSIDIARIES OF THE REGISTRANT</td> <td scope="row"><a href="/Archives/edgar/data/1050122/0000 +92735601000365/0000927356-01-000365-0009.txt">0009.txt</a></td> <td scope="row">EX-21.1</td>
Output:
C:\Users\James\Desktop\perlmonks>perlmonks.pl /Archives/edgar/data/1050122/000092735601000365/0000927356-01-00”0365- +0009.txt
EDIT: It seems that $/ = "</html>"; manipulates the input record seperator in such a way it does completely break the functionality of the simple regex. Do yu have any links to documentation on this $/ = "</html>"; ?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: REGEX for url
by wrkrbeee (Scribe) on Apr 25, 2016 at 21:28 UTC | |
Not sure if this helps, but the full text block, from <html> through </html> appears below. Just using $/ as a way to indicate the end of a record. I apologize for wasting your time.
| [reply] [d/l] |
by Marshall (Canon) on Apr 25, 2016 at 22:24 UTC | |
| [reply] [d/l] |
|
Re^4: REGEX for url
by wrkrbeee (Scribe) on Apr 25, 2016 at 21:09 UTC | |
| [reply] |
by NetWallah (Canon) on Apr 25, 2016 at 21:19 UTC | |
It can also handle multiple URL's.
This is not an optical illusion, it just looks like one. | [reply] [d/l] |
by ExReg (Priest) on Apr 25, 2016 at 22:07 UTC | |
Not able to check it on my machine, but wouldn't a /s be helpful here to be able to pass over the newlines? print if s/.*a href="(.*)".*/$1/s; | [reply] [d/l] |