in reply to Replacement and conversion of flat file text documents

Generally we (the community, not the Royal 'we') like to see some effort put into a post. You've outlined your problem, but you haven't mentioned any basic research or written about a programmatic approach to the problem.

If you'd put up some basic pseudo-code, that would have helped your node be more useful to readers. Please keep this in mind for your future posts.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

  • Comment on Re: Replacement and conversion of flat file text documents

Replies are listed 'Best First'.
Re^2: Replacement and conversion of flat file text documents
by imhotep (Novice) on Apr 17, 2005 at 20:49 UTC
    Thanks for your comments! I was in a bit of a hurry when I made the post(at work!) but more of a factor in my admittedly poor post is my lack of programming knowledge! I have since had more time to look at it and this is what I make of it! So firstly, I must place the headings in html heading elements "h2" etc, could I use a pattern match e.g. if $line = 1 or more blank lines followed by a single line with letter characters in it(the heading, usually comes in this form) followed by 1 or more blank lines. Then save that line with characters in it in to a variable (say $saved) then print "h2 ($saved) h2". The same for words that are supposed to be in italics(which in this case are underscore characters in the text), I could again use a pattern match but with "s/\_/(html italics element)/g;. The problem with this approach is that the match would convert all instances of underscores into the opening italic element but not the closing element. Does that make sense, basically after the first instance was changed all the letters in the text (after the italic element) would be in italics? Also, how would I place the whole text into html, head, body elements etc, I just can't figure out how to have both opening and closing elements! I know I have not put the problem very clearly, but I hope you can get an idea of what I have to do. I am going to try to work on some pseudo to make it a bit clearer! Thanks!

      If you're new to Programming, then you have a little longer journey .. but if I were going to distill Programming into a little recipe, it would be something like this:

      • Figure out what you're trying to do. When in doubt, leave some stuff out .. you can add it in later.
      • Make a list of the various steps you're going to need to do. Just write it out on a piece of paper. Review that. Think about it. Drink a cup of coffee. Review it again.
      • Get the framework going .. which means, translate some of your 'list' into code, and get something really simple running.
      • Gradually add the elements on your list, testing as you go.
      • Resist the temptation to throw something together that 'mostly' works. Test as rigourously as possible.
      • Once you have a version that works, save it somewhere. I like to use rcs because it's dead simple .. but you can just copy your source code file to a file with the name 'foo.1' or 'foo.2005-04-18'. Once you've done that, you have the freedom to get hacking again, without any worry that you'll 'improve' the original and break it in some mysterious way .. without any way to go back to the version that worked.

      The three features you've listed are

      • Put heading elements like '3 -- Skippy has a Picnic' in 'h2' tags;
      • Wrap text like _this is italicized_ with italic tags; and
      • Wrap big blocks of text in parapgraph tags, making one big HTML page from your input.
      Is this right?

      I'll check back in a bit.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Thanks alex (also my middle name) for getting back to me!

        Yes that is what I have to do!

        I have this,

        #!/usr/bin/perl use English; use diagnostics; print "html body\n"; while ($line = <>) { if ($line =~ m/Emma [[:punct:]]/) { print "h2 $line h2\n"; } if ($line =~ m/CHAPTER (.*)/) { print "h2 $line h2\n"; } if ($line =~ m/Jane Austen/) { print "h2 $line h2\n"; } if ($line =~ m/VOLUME (.*)/) { print "h2 $line h2\n"; } if ( not ( $line =~ m/Jane Austen/) ) { if ( not ( $line =~ m/Emma [[:punct:]]/)) { if ( not ( $line =~ m/CHAPTER (.*)/)) { if ( not ( $line =~ m/VOLUME (.*)/)) { $line =~ s|\b_|<i>|g; $line =~ s|_\b|</i>|g; print $line; } } } } } print "body html\n";
        This should do everything but not the paragraph elements! Any guidance would be great! I have left out the "<" brackets so that it will display properly

        Edit by tye: Add CODE tags