in reply to snakes and ladders

Help would be very much appreciated.

Two things will help you more than anything else.

The first is a decent understanding of algorithmic complexity, colloquially called "big O notation". The value of this is being able to analyze a piece of code and have a sense of how well or poorly it will scale. This is technically a science, but there's an art to it, and the basic rule is "How much work does it take to do things this way?" (The corollary is "How much data do I expect to process?" If that's small, a big big O doesn't matter very much.)

The second is how to combine an efficient tokenizer with a finite state machine. As I've mentioned before, this is an important concept covered in SICP and HOP. In short, you want to process your input document once, probably character by character (and you can make that more efficient if you want), to build an intermediary data structure which represents your document. You can do this even if you have parts of the code you can only evaluate fully after you've processed previous parts of the document (it's how Perl's eval works, after all).

I won't promise you that this will make your code beloved to other hackers (what I've seen doesn't fit my needs from a human factors perspective, but I admit I don't have the experience with it you do), but I can promise you that this is the technique favored by compiler writers as reasonably straightforward but efficient and effective. What you're doing is, essentially, writing a compiler.

Replies are listed 'Best First'.
Re^2: snakes and ladders
by Logicus (Initiate) on Aug 25, 2011 at 05:03 UTC
    >Big O

    Yup, I get that, that's why the system will spit out "hello world" in like 0.03 seconds, but take 0.22 to produce the 400 line output for the forum sections page for instance. More data = more computing = slower result.

    >Efficient tokenizer with a finite state machine.

    So I have to treat it almost like a de-compression algorithm? Take the input letter by letter and inflate it up into something which can then be stored and processed quickly... I think I can see how that works.

    <use><qd>action</></> becomes something like; [TOKEN name="use"] [TOKEN name="qd"] [TOKEN value="action"] [ENDT] [ENDT] then process line by line? I can feel a stack is going to be needed. I + think I need to have a play with getting results from the expected t +oken format (written by hand) so I can be sure how the system will wo +rk post-tokenisation. (then write the tokeniser)
    >beloved to other hackers

    I'm not the sort of person who craves or needs adoration or respect of peers. I'm actually quite a reserved and quiet person that tends to try and blend in where possible, (believe it or not) so I won't be worrying about that very much... besides, this system isn't for high level hackers, it's for dummies like me to make work/life easier, and measured by that standard it works very well!

      ... then process line by line?

      As a proper data structure, not plain text. A data structure has, well, structure. You need that structure to identify which types of tokens you have and what they mean.

      I tend to use objects for this, but an ad hoc hash will serve as well for your experiments.

        Something like the above? (see earlier comment posted before I saw your comment)
      >Big O

      Yup, I get that, that's why the system will spit out "hello world" in like 0.03 seconds, but take 0.22 to produce the 400 line output for the forum sections page for instance. More data = more computing = slower result.

      I'm afraid you're not getting it at all.

      Complexity is not about "there's more input, so it will take longer", it's the indication how it will scale. Does it scale lineary, quadratic, logarithmic, etc.

      A reply falls below the community's threshold of quality. You may see it by logging in.

      So I just did a more complex example by hand, now I need to figure out a way to iterate across such a complex data structure :/

      # <html lang="<qd>lang</>"> # (sql mode="mask" table="users") # <query> # SELECT * FROM users # </> # <mask> # Profile : [link action="profile" # username="<d>username</>" # ]<d>username</>[/link] # <br/> # </> # (/) # </> #TODO: translate above, into this : @_ = ( { type => '<', value => 'html', attr => { lang => ( { type => '<', value => 'qd' }, { type => 'data', value => 'lang' }, { type => '>' } ) } }, { type => '(', value => 'sql', attr => { mode => 'mask', table => 'users' } }, { type => '<', value => 'query' }, { type => 'data', value => 'SELECT * FROM users' }, { type => '>' }, { type => '<', value => 'mask' }, { type => 'd', value => 'Profile : ' }, { type => '[', value => 'link', attr => { action => 'profile', username => ( { type => '<', value => 'd' + }, { type => 'data', value => +'username' }, { type => '>' } ) } }, { type => '<', value => 'd' }, { type => 'data', value => 'username' }, { type => '>' }, { type => ']' }, { type => 'data', value => '<br/>' }, { type => '>' }, { type => ')' }, { type => '>' } );
      What's going to make life fun is the recursive nature of the thing, each line may have attr's which then may have muliple lines to describe them.