I have a bunch of div tags in a text file as shown below. What i am trying to do is parse out the different things inside My output should be a hash which should look like. Whats the best way to do this. Can i use regex, html parsing of some sort? Thanks for your help in advance
OUTPUT VAR1 = { 'http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html' => ['Java Architect - Technisource - Richmond, VA', 'Mon, 18 Jan 2010 02:17:16 GMT', 's degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies...'] etc }
INPUT FILE
<div><a class="titlefield" title="Java Architect - Technisource - Richmond, VA" href="http://xxx.com/Java-Architect-Technisource-Richmond-VA_0ad856f5dc92e277ee34526dc9d3b973.html">Java Architect - Technisource - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 02:17:16 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... technology standards Ability to keep up with industry trends, relevant system development technologies... From Dice - 18 Jan 2010 02:17:16 GMT - save job, email, more... </div>
<div><a class="titlefield" title="Senior Java Developer - Technisource - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-Technisource-Richmond-VA_c21a277e3f459c5334eea3c70a364463.html">Senior Java Developer - Technisource - Richmond, VA</a> <br/><span class="datefield">Thu, 21 Jan 2010 02:06:39 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> s degree in Information Technology or related systems... architecture experience and Java enterprise knowledge (JEE) - Passion for web technologies and experience... From Technisource - 21 Jan 2010 02:06:39 GMT - save job, email, more... </div>
<div><a class="titlefield" title="Senior Java Developer - TEKsystems - Richmond, VA" href="http://xxx.com/Senior-Java-Developer-TEKsystems-Richmond-VA_4655ae7f56afc20d39daa75b1592767a.html">Senior Java Developer - TEKsystems - Richmond, VA</a> <br/><span class="datefield">Mon, 18 Jan 2010 06:13:02 GMT</span> <span class="labelfield">[Listed at Indeed.com]</span> <br/> Computer Science, Information Technology, or Business or related work experience/certification. 5+ years of relevant general Information Technology experience... From TEKSystems - 18 Jan 2010 06:13:02 GMT - save job, email, more... </div>
In reply to HTML Parsing /Regex Qstn by sri1230
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |