in reply to Re: Help regarding regular expression
in thread Help regarding regular expression

Hi, I am really sorry but I am not that adept in using html::parser. Also the problem is the text I want to extract contains alphabets colon and number (e.g.: GO:1234567). Since I am not adept in html:parsing or xml parsing, I was trying reg ex. I have never used perl before only using it for two days :(
  • Comment on Re^2: Help regarding regular expression

Replies are listed 'Best First'.
Re^3: Help regarding regular expression
by ELISHEVA (Prior) on Aug 06, 2009 at 11:41 UTC

    I'm guessing from the material you posted, you will probably be doing a lot more parsing of gene data in XML format over the next few weeks, months(?), so it is *well* worth your while to learn the correct tools. It isn't as hard as you think, and there are *many* people to help you here, including some who are also doing gene research! The beauty of modules like XML::Twig is that you don't actually need to know how to parse HTML since it does the parsing for you. You just need to learn how to start the process and use the results.

    So I'd start instead by looking up XML::Twig, reading the documentation, and asking about any questions you have here or on a new thread. If you decide to stay with this thread, you might want to update your original post to indicate the change of strategy. Also it would be a good idea to change the title to something like "Using XML::Twig to parse gene data". Such a title would do a better job of attracting the right people to help you.

    If you decide to start a new node, be sure to update Help regarding regular expression with an explanation of your change in strategy and a link to a new node. Also in the new node, link back to this node so that people understand the whole context of the discussion (you'll get much better advice that way). To link to nodes within PerlMonks, you can use [id://NNNN] where NNNN is the node id of the post. (that's the number in the left column on Nodes You Wrote) The title of the node will be displayed automatically.

    I don't recommend asking your questions in reply to this node. People will be less likely to see a deeply nested node, so you won't get the widest help.

    If you have general questions about how to use CPAN or modules, you can also ask in the chatterbox (sidebar to the right). You can also get boatloads of information about XML::Twig (and any other module) using this link: cpan module search. It lets you find all of the PerlMonks nodes (questions, answers, tutorials) that discuss the module you are interested in learning how to use.

    Best, beth

Re^3: Help regarding regular expression
by ig (Vicar) on Aug 06, 2009 at 12:22 UTC

    I agree with what ELISHEVA says, and add that unless your XML/HTML is extremely trivial it will be much less effort to learn the appropriate modules than it will be to write robust regular expressions. It is easy to start writing the regular expressions but parsing XML and HTML is much more complex than it first appears and regular expressions are not up to the task.