sri1230 has asked for the wisdom of the Perl Monks concerning the following question:
I have a bunch of div tags in a text file as shown below. What i am trying to do is parse out the different things inside My output should be a hash which should look like. Whats the best way to do this. Can i use regex, html parsing of some sort? Thanks for your help in advance
OUTPUT VAR1 = { 'http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html' => ['Java Architect - Technisource - Richmond, VA', 'Mon, 18 Jan 2010 02:17:16 GMT', 's degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies...'] etc }
INPUT FILE
<div><a class="titlefield" title="Java Architect - Technisource - Richmond, VA" href="http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html">Java Architect - Technisource - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 02:17:16 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies... From Dice - 18 Jan 2010 02:17:16 GMT - save job, email, more... </div>
<div><a class="titlefield" title="Senior Java Developer - Technisource - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-Technisource-Richmond-VA_c21a277e3f459c5334eea3c70a364463.html">Senior Java Developer - Technisource - Richmond, VA</a> <br/><span class="datefield">Thu, 21 Jan 2010 02:06:39 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... architecture experience and Java enterprise knowledge (JEE) - Passion for web technologies and experience... From Technisource - 21 Jan 2010 02:06:39 GMT - save job, email, more... </div>
<div><a class="titlefield" title="Senior Java Developer - TEKsystems - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-TEKsystems-Richmond-VA_4655ae7f56afc20d39daa75b1592767a.html">Senior Java Developer - TEKsystems - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 06:13:02 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> Computer Science, Information Technology, or Business or related work experience/certification. 5+ years of relevant general Information Technology experience... From TEKSystems - 18 Jan 2010 06:13:02 GMT - save job, email, more... </div>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML Parsing /Regex Qstn
by bobr (Monk) on Jan 21, 2010 at 16:35 UTC | |
by sri1230 (Novice) on Jan 21, 2010 at 16:53 UTC | |
by planetscape (Chancellor) on Jan 22, 2010 at 01:02 UTC | |
by sri1230 (Novice) on Jan 21, 2010 at 17:06 UTC | |
by bobr (Monk) on Jan 21, 2010 at 17:38 UTC | |
by sri1230 (Novice) on Jan 21, 2010 at 21:47 UTC | |
|
Re: HTML Parsing /Regex Qstn
by Utilitarian (Vicar) on Jan 21, 2010 at 15:06 UTC | |
|
Re: HTML Parsing /Regex Qstn
by Anonymous Monk on Jan 21, 2010 at 14:44 UTC |