Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Need help with filling a complicated data structure

by Lady_Aleena (Priest)
on Nov 16, 2013 at 06:06 UTC ( [id://1062854]=perlquestion: print w/replies, xml ) Need Help??

Lady_Aleena has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone. Before you start reading, have a Cookie and take a deep breath. Ready?

I am trying to make my data more universal by removing as many HTML tags out of my data and munging it later. For the most part I have succeeded. However, there are a few more hurdles I have yet to clear, and the following is one of them.

I would like to take this following data ...

# * is an unordered list # # is an ordered list # #3 a number indicates the value __DATA__ * list 1 unordered item 1 * list 1 unordered item 2 *# list 1 unordered item 2 ordered item 1 *# list 1 unordered item 2 ordered item 2 *# list 1 unordered item 2 ordered item 3 * list 1 unordered item 3 ** list 1 unordered item unordered item 1 ** list 1 unordered item unordered item 2 ** list 1 unordered item unordered item 3 **# list 1 unordered item unordered item 3 ordered item 1 **# list 1 unordered item unordered item 3 ordered item 2 **# list 1 unordered item unordered item 3 ordered item 3 # list 2 ordered item 1 #3 list 2 ordered item 2 # list 2 ordered item 3 #* list 2 ordered item 3 unordered item 1 #* list 2 ordered item 3 unordered item 2 #* list 2 ordered item 3 unordered item 3

... and feed it into this subroutine, which I got help for here, hopefully between lines 6 and 17 ...

sub story { my ($source, $doc_magic, $line_magic) = @_; my $inc = 0; my @sections; my @toc; while (my $line = <$source>) { chomp($line); next if !$line; if ($line =~ /^2/) { my ($number,$text) = split(/ /,$line,2); push @toc, anchor(textify($text), { href => '#'.idify($text) }); $inc++; } push @{$sections[$inc]}, $line; } my $tab = 3; $inc = 0; for my $section (@sections) { if ($section) { section($tab, sub { $tab++; for my $line (@{$section}) { my $line = convert_string($line, $line_magic); line($tab, $line), next if $line =~ /^</; line($tab, "<$line>"), next if $line =~ /^[bh]r$/; $doc_magic->{$1}->(), next if $line =~ /^&\s+(.*)/; blockquote($tab, $1), next if $line =~ /^bq\s(.*)/; item($tab + 1, $1), next if $line =~ /^\*\s(.*)/; item($tab + 1, $2, { value => $1 }), next if $line =~ /^\*(\ +d+)\s(.*)/; item($tab + 1, "<strong>$1</strong> $2"), next if $line =~ / +^\*s\s(.+\:)\s(.*)$/; heading($tab, $1, $2, { id => idify($2) }), next if $line + =~ /^([1-6])\s+(.*)/; paragraph($tab, $line, { class => 'author' }), next if $line + =~ /^by /; paragraph($tab, $line); } $tab--; }); } if ($inc == 0 && @toc > 3) { section($tab, sub { my $class = @toc > 25 ? @toc > 50 ? 'four' : 'three' : 'two'; my $style = @toc > 50 ? 'font-size:smaller' : undef; list($tab + 1, 'u', \@toc, { class => $class, style => $style +}); }, { class => 'contents'} ); } $inc++; } # paragraph($tab,"written by $root_user", { class => 'author' }); }

... which will hopefully feed the following data structures through lines 22 through 42 above. Take another deep breath, by the way ...

my $list1 = [ 'u', [ 'list 1 unordered item 1', [ 'list 1 unordered item 2', { 'inlist' => [ 'o', [ 'list 1 unordered item 2 ordered item 1', 'list 1 unordered item 2 ordered item 2', 'list 1 unordered item 2 ordered item 3' ] ] } ], [ 'list 1 unordered item 3', { 'inlist' => [ 'u', [ 'list 1 unordered item unordered item 1', 'list 1 unordered item unordered item 2', [ 'list 1 unordered item unordered item 3', { 'inlist' => [ 'o', [ 'list 1 unordered item unordered item 3 ordered item 1', 'list 1 unordered item unordered item 3 ordered item 2', 'list 1 unordered item unordered item 3 ordered item 3' ] ] }, ] ] ] }, ] ] ]; my $list2 = [ 'o', [ 'list 2 ordered item 1', ['list 2 ordered item 2', { value => '3' } ], [ 'list 2 ordered item 3', { inlist => [ 'u', [ 'list 2 ordered item 3 unordered item 1', 'list 2 ordered item 3 unordered item 2', 'list 2 ordered item 3 unordered item 3', ] ] } ] ] ];

... into ...

sub list { my ($tab,$type,$list,$opt) = @_; my $tag = $type.'l'; my $open = open_tag($tag,$opt,[@ics,@java]); line($tab,"<$open>"); for my $item (@$list) { if (ref($item) eq 'ARRAY') { item($tab + 1,$item->[0],$item->[1]); } else { item($tab + 1,$item); } } line($tab,"</$tag>"); }

which is dependent on...

sub item { my ($tab,$value,$opt) = @_; my $tag = 'li'; my $open = open_tag($tag, $opt, ['value', @ics, @java]); line($tab, "<$open>"); line($tab + 1, $value); if ($opt->{inlist}) { list($tab + 1, @{$opt->{inlist}}); } line($tab,"</$tag>"); }

... with instructions on how to use them somewhere in here.

I am tired, cranky, and moody. I can't figure out how to munge the lines. A nudge, a whisper, a gentle turning of the head is all I can ask for here. Please just don't ask me to rewrite my list or item subroutines. I use them elsewhere too.

After lists are handled, I still have the table and some inline HTML tags to remove by some sort of munging. Those are both a lot more complicated.

Thanks in advance!

No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re: Need help with filling a complicated data structure
by ig (Vicar) on Nov 17, 2013 at 05:04 UTC

    While you already have two good suggestions, for some reason I couldn't stop wondering how I would have done this, other than by doing it. I am intrigued by the difference in style. There are so many ways to do things...

    Compared to the others, this is verbose, and I am undecided whether this helps or hinders understanding. Perhaps it depends on what one knows and how one thinks. I wouldn't offer it, except that it does produce the 'inlist' attributes.

    use strict; use warnings; use Data::Dumper::Concise; # Lists in the input are prefixed by indications of their level # and type: # # * is an unordered list # # is an ordered list # #3 a number indicates the value # # Lists are contained in 'inlist' structures. # # An inlist structure is an array of two elements: Type and # Content. # # Type is 'o' for an ordered list or 'u' for an unordered list. # # Content is an array of items in the list. # # The elements of the Contents array are either text or array # refs. # # If the list item has no attributes, it is present as text. # # If the list item has attributes, it is present as reference to # an array of two elements: the text of the element and a # reference to a hash of attributes. # # # Lists may be nested to arbitrary depth. A nested list is contained i +n # the value of the 'inlist' attribute of its parent list item. # my $lists = [ new_inlist() ]; # Stack is an array of pointers to inlist structures. Each inlist # structure contains the elements of a list. The stack grows as # items are added to more deeply nested lists. # # # # Bottom of stack is the current level 1 list. # # Every other element of the stack is a pointer to the inlist # structure containing the current list at some level of nesting. # # The top of stack is a pointer to the inlist structure for the # most deeply nested, current list. # # my $stack = [ $lists->[-1] ]; while(my $line = <DATA>) { chomp($line); if($line =~ m/^\s*$/) { unless( @$stack == 1 and # level 1 @{$stack->[-1]->[1]} == 0 # with no contents ) { push(@$lists, new_inlist()); $stack = [ $lists->[-1] ]; } } else { parse_line_and_add_to_list($stack, $line); } } print Dumper($lists); exit(0); # new_inlist returns an empty inlist data structure # sub new_inlist { return([ undef, [] ]); } # parse_line_and_add_to_list parses an input line into level, # type, value and text, then adds an item to the appropriate list, # according to level, creating sub-lists as necessary. # sub parse_line_and_add_to_list { my ($stack, $line) = @_; my ($pre, $value, $text) = $line =~ m/^([*#]*)([^ ]*)? (.*)/; my $level = length($pre); my $type_marker = substr($pre, -1); my $type = { '*' => 'u', '#' => 'o', }->{$type_marker}; die "unknown list type marker $type_marker" unless($type); while($level < @$stack) { pop(@$stack); } while($level > @$stack) { start_sub_list($stack, $type); } my $item = length($value) ? [ $text, { value => $value } ] : $text; my $inlist = $stack->[-1]; $inlist->[0] = $type unless(defined($inlist->[0])); die "inconsistent type on list element" unless($inlist->[0] eq $type); push(@{$inlist->[1]}, $item); } # start_sub_list adds a sub-list to the last item in the list at # the top of the stack. sub start_sub_list { my ($stack, $type) = @_; # Top of stack points to the innermost inlist structure my (undef, $contents) = @{$stack->[-1]}; my $last_item = $contents->[-1]; $last_item = [ $last_item, {} ] unless(ref($last_item) eq 'ARRAY'); my $attributes = $last_item->[1]; die "Attempt to initialize sub-list on an item with a sub-list" if(exists($attributes->{inlist})); $attributes->{inlist} = [ $type, [] ]; $contents->[-1] = $last_item; push(@$stack, $attributes->{inlist}); } __DATA__ * list 1 unordered item 1 * list 1 unordered item 2 *# list 1 unordered item 2 ordered item 1 *# list 1 unordered item 2 ordered item 2 *# list 1 unordered item 2 ordered item 3 * list 1 unordered item 3 ** list 1 unordered item unordered item 1 ** list 1 unordered item unordered item 2 ** list 1 unordered item unordered item 3 **# list 1 unordered item unordered item 3 ordered item 1 **# list 1 unordered item unordered item 3 ordered item 2 **# list 1 unordered item unordered item 3 ordered item 3 # list 2 ordered item 1 #3 list 2 ordered item 2 # list 2 ordered item 3 #* list 2 ordered item 3 unordered item 1 #* list 2 ordered item 3 unordered item 2 #* list 2 ordered item 3 unordered item 3

      ig, this looks like a lot of work, but I am not sure how to incorporate it into the story subroutine. Please see Re^2: Need help with filling a complicated data structure for more, because I can't figure out how to incorporate any of these solves sadly. My weekend has been a wash when it comes to anything I write working.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena

        As you iterate over the lines in a section, you recognize the lines which represent list items, but pass each, separately, to sub item, never producing the data structure you described and never calling sub list.

        What you need to do is detect the entire set of lines that represents a list and pass the set to one of the routines that have been suggested, then pass the resulting data structure to your list sub.

        A start might be to redo the loop over lines in a section to facilitate processing groups of lines and then process all the lines of a list together. Maybe something like the following, untested suggestion might work for you.

        for(my $lineno = 0; $lineno < @$section; $lineno++) { my $line = $section->[$lineno]; if($line =~ m/^\[*#]/) { # This line is the start of a list my $start = $lineno; # Where is the end? my $end = $lineno; $end++ while($section->[$end+1] =~ /^[\*#]/); # Get all the lines that are part of the list my @list_lines = @{$section}[$start..$end]; # Put one of the suggested answers to your initial # request into a subroutine and pass it the set of lines # that represents a single list, getting back the data # structure you requested in your original post. my $internal = parse_list(@list_lines); # Extract the list type and contents from the data # structure my ($type, $list) = @$internal; my $opt = '???'; # Where should $opt come from? # And pass these to the list sub to produce the list. list($tab, $type, $list, $opt); # All the lines of the list have been dealt with. Move # the line number (index into @$section) to the end of # the list then carry on to process the rest of the # section. $lineno = $end; } else { # This line is something other than a list line $line = convert_string($section->[$lineno], $line_magic); line($tab, $line), next if $line =~ /^</; line($tab, "<$line>"), next if $line =~ /^[bh]r$/; $doc_magic->{$1}->(), next if $line =~ /^&\s+(.*)/; blockquote($tab, $1), next if $line =~ /^bq\s(.*)/; row($tab + 1, $1, row_line($2)), next if $line =~ /^\|\s(.+)\s +\|\|(.+)$/; heading($tab, $1, $2, { id => idify($2) }), next if $line = +~ /^([1-6])\s+(.*)/; paragraph($tab, $line, { class => 'author' }), next if $line = +~ /^by /; paragraph($tab, $line); } }
Re: Need help with filling a complicated data structure
by Laurent_R (Canon) on Nov 16, 2013 at 11:05 UTC

    Hi Lady_Aleena

    I am wondering about something: if I understand correctly, your input data has a hierarchy with only 3 levels, and from there, you seem to be willing to build a nested structure with about nine levels of "nestedness". Are you sure you really need such a complicated data structure? It seems to be overkill to me. Wouldn't a simple 3-level nested data structure be more practical?

    Cheers

      Laurent_R, for 1 level it is rather easy to use list.

      list($tab, $type, [$value1, $value2]);

      Now if either of those have a list too, it starts to get complicated.

      list($tab, $type, [ [$value1, { inlist => [ $inlist1_type, [ $value1a, $value1b ] ], cla +ss => $value1class }], [$value2, { inlist => [ $inlist2_type, [ $value2a, $value2b ] ]}], ]);

      And if it goes deeper.

      list($tab, $type, [ [$value1, { inlist => [ $inlist1_type, [ $value1a, $value1b ] ], cla +ss => $value1class }], [$value2, { inlist => [ $inlist2_type, [ [$value2a, { inlist => [ $inlist2a_type, [$value2a1, $value2a2 ] ] +}], [$value2b, { style => 'font-size: small' }] ]]}], ]);

      A lot more than the inner lists goes into the hash refs: class, id, style, lang, various javascript attributes. For list items there is also value, so if $value2's inlist were an ordered list (but not starting at 1), I would do the following, and will have to for some lists...

      list($tab, $type, [ [$value1, { inlist => [ $inlist1_type, [ $value1a, $value1b ] ], cla +ss => $value1class }], [$value2, { inlist => [ $inlist2_type, [ [$value2a, { inlist => [ 'o', [ [$value2a1, { value => '2' }], $value2a2 ]]}], [$value2b, { style => 'font-size: small' }] ]]}], ]);

      For example, I own books 2-5 of Bio of a Space Tyrant by Piers Anthony, so I would like to write the data like...

      #2 Mercenary # Politican # Executive # Statesman

      For for another series of novels, it is even odder.

      #41 The Soldier's of Fear #47 Q-Space

      Writing the subroutines to make HTML lists was kind of fun, but it is a lot of work to use it.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena
Re: Need help with filling a complicated data structure
by hdb (Monsignor) on Nov 16, 2013 at 17:14 UTC

    UPDATE: modified to deal with jumps larger than 1 back up the hierarchy.

    Very nice puzzle. Here is my proposal: (it does not do the { inlist => [... bits, but they can be inserted where indicated)

    use strict; use warnings; use Data::Dumper; my %type = ( '*' => 'u', '#' => 'o' ); sub type { $type{ substr shift, -1 } } my @lines = map { /([*#]*)(\d*)\s+(.*)/; $2?[$1,[$3,{value=>$2}]]:[$1, + $3] } <DATA>; my @lists = ( [] ); my $lasttype = ''; my $lastitem = $lists[-1]; my @stack; for (@lines) { my( $type, $item ) = @$_; if( not $type ) { push @lists, []; $lastitem = $lists[-1]; $lasttype = ''; @stack = (); next; } if( $type eq $lasttype ) { push @$lastitem, $item; next; } if( not $lasttype ) { push @$lastitem, type($type), [ $item ]; $lastitem = $lastitem->[-1]; $lasttype = $type; push @stack, [ $lastitem, $lasttype ]; next; } if( length( $type ) > length( $lasttype ) ) { # modify this block fo +r { inlist => [... push @stack, [ $lastitem, $lasttype ]; $lastitem->[-1] = [ $lastitem->[-1], [ type($type), [$item] ] ]; $lastitem = $lastitem->[-1]->[1]->[1]; $lasttype = $type; next; } for( 1..(length( $lasttype ) - length( $type )) ) { ( $lastitem, $lasttype ) = @{ pop @stack }; } push @$lastitem, $item; } print Dumper \@lists; __DATA__ * list 1 unordered item 1 * list 1 unordered item 2 *# list 1 unordered item 2 ordered item 1 *# list 1 unordered item 2 ordered item 2 *# list 1 unordered item 2 ordered item 3 * list 1 unordered item 3 ** list 1 unordered item unordered item 1 ** list 1 unordered item unordered item 2 ** list 1 unordered item unordered item 3 **# list 1 unordered item unordered item 3 ordered item 1 **# list 1 unordered item unordered item 3 ordered item 2 **# list 1 unordered item unordered item 3 ordered item 3 * list 1 unordered item 4 # list 2 ordered item 1 #3 list 2 ordered item 2 # list 2 ordered item 3 #* list 2 ordered item 3 unordered item 1 #* list 2 ordered item 3 unordered item 2 #* list 2 ordered item 3 unordered item 3 # list 2 ordered item 4
Re: Need help with filling a complicated data structure
by hdb (Monsignor) on Nov 16, 2013 at 20:47 UTC

    Or this:

    use strict; use warnings; use List::Util qw/max/; use Data::Dumper; my %type = ( '*' => 'u', '#' => 'o' ); sub type { $type{ substr shift, -1 } } my @lines = map { /([*#]*)(\d*)\s+(.*)/; $2?[$1,[$3,{value=>$2}]]:[$1, +$3] } <DATA>; my $maxlevel = max map { length $_->[0] } @lines; while( $maxlevel ) { my @indices = grep { $maxlevel == length $lines[$_]->[0] } 0..@lines +-1; while( @indices ) { my $end = pop @indices; my $start = $end; $start = pop @indices while @indices and $indices[-1]==$start-1; my $sublist = [ type($lines[$start]->[0]), [ map { $_->[1] } splic +e @lines, $start, $end-$start+1 ] ]; $lines[$start-1]->[1] = [ $lines[$start-1]->[1], $sublist ] if $ma +xlevel>1; splice @lines, $start, 0, $sublist if $maxlevel==1; } $maxlevel--; } @lines = grep { $_->[0] } @lines; print Dumper \@lines; __DATA__ * list 1 unordered item 1) * list 1 unordered item 2 *# list 1 unordered item 2 ordered item 1 *# list 1 unordered item 2 ordered item 2 *# list 1 unordered item 2 ordered item 3 * list 1 unordered item 3 ** list 1 unordered item unordered item 1 ** list 1 unordered item unordered item 2 ** list 1 unordered item unordered item 3 **# list 1 unordered item unordered item 3 ordered item 1 **# list 1 unordered item unordered item 3 ordered item 2 **# list 1 unordered item unordered item 3 ordered item 3 * list 1 unordered item 4 # list 2 ordered item 1 #3 list 2 ordered item 2 # list 2 ordered item 3 #* list 2 ordered item 3 unordered item 1 #* list 2 ordered item 3 unordered item 2 #* list 2 ordered item 3 unordered item 3 # list 2 ordered item 4

      Thanks hdb for trying to help, however, I can't see how to add your solves to the following.

      sub story { my ($source, $doc_magic, $line_magic) = @_; my $inc = 0; my @sections; my @toc; while (my $line = <$source>) { chomp($line); next if !$line; if ($line =~ /^2/) { my ($number,$text) = split(/ /,$line,2); push @toc, anchor(textify($text), { href => '#'.idify($text) }); $inc++; } push @{$sections[$inc]}, $line; } my $tab = 3; $inc = 0; for my $section (@sections) { if ($section) { section($tab, sub { $tab++; for my $line (@{$section}) { my $line = convert_string($line, $line_magic); line($tab, $line), next if $line =~ /^</; line($tab, "<$line>"), next if $line =~ /^[bh]r$/; $doc_magic->{$1}->(), next if $line =~ /^&\s+(.*)/; blockquote($tab, $1), next if $line =~ /^bq\s(.*)/; item($tab + 1, $1), next if $line =~ /^\*\s(.*)/; item($tab + 1, $2, { value => $1 }), next if $line =~ /^\*(\ +d+)\s(.*)/; item($tab + 1, "<strong>$1</strong> $2"), next if $line =~ / +^\*s\s(.+\:)\s(.*)$/; row($tab + 1, $1, row_line($2)), next if $line =~ /^\|\s(.+) +\s\|\|(.+)$/; heading($tab, $1, $2, { id => idify($2) }), next if $line + =~ /^([1-6])\s+(.*)/; paragraph($tab, $line, { class => 'author' }), next if $line + =~ /^by /; paragraph($tab, $line); } $tab--; }); } if ($inc == 0 && @toc > 3) { section($tab, sub { my $class = @toc > 25 ? @toc > 50 ? 'four' : 'three' : 'two'; my $style = @toc > 50 ? 'font-size:smaller' : undef; list($tab + 1, 'u', \@toc, { class => $class, style => $style +}); }, { class => 'contents'} ); } $inc++; } # paragraph($tab,"written by $root_user", { class => 'author' }); }

      The lists will more than likely always be mixed in with other things like:

      __DATA__ 2 This is a heading. by An Author This is just an opening paragraph, probably about the list. * list 1 unordered item 1 * list 1 unordered item 2 *# list 1 unordered item 2 ordered item 1 *# list 1 unordered item 2 ordered item 2 *# list 1 unordered item 2 ordered item 3 * list 1 unordered item 3 ** list 1 unordered item unordered item 1 ** list 1 unordered item unordered item 2 ** list 1 unordered item unordered item 3 **# list 1 unordered item unordered item 3 ordered item 1 **# list 1 unordered item unordered item 3 ordered item 2 **# list 1 unordered item unordered item 3 ordered item 3 * list 1 unordered item 4 hr This is another paragraph about the list below it. # list 2 ordered item 1 #3 list 2 ordered item 2 # list 2 ordered item 3 #* list 2 ordered item 3 unordered item 1 #* list 2 ordered item 3 unordered item 2 #* list 2 ordered item 3 unordered item 3 # list 2 ordered item 4 bq This is a quote by Quote Author Any maybe some final remarks. & a little magic

      (I'm having a bad weekend all around, nothing is sinking in.)

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena

        Similar to ig's proposal below, I would collect the lines belonging to a list and then call a sub that processes the lines of the list.

        ... $tab++; my @list; for my $line (@{$section}) { my $line = convert_string($line, $line_magic); if( $line =~ /^[*#]+ / ) { push @list, $line; next; } else { process_list( @list ) if @list; @list = (); } line($tab, $line), next if $line =~ /^</; ...
Re: Need help with filling a complicated data structure: Epilog
by Lady_Aleena (Priest) on Jan 19, 2015 at 07:02 UTC

    Dear hdb and ig, I know it is over a year since you did so much work to help me get my data to look the way I wanted it to look. About a month ago, I finally sat down with your work, and forced myself to get it incorporated come hell or high water. Below is the result of your hard work with very little done by me except to do some minor modifications in expanding the scope.

    hdb, I used your work in Re: Need help with filling a complicated data structure to make my lists.

    ig, I used your work in Re^3: Need help with filling a complicated data structure to group the lines together for my lists and tables. I also used it when it came time to parse the tables. Don't worry, I didn't include tables in my original post. I had not used for like you did before now. I had never taken it apart to understand it until your work made me look into it. So thank you for the push I needed to learn something new.

    So, at long last, here is the code which effectively solved my problem with your help. I am so very thankful for your work, sorry it comes so late.

    hdb and ig, you helped me a lot! Thank you again! (I just hope my comments are clear enough, and my code makes sense.)

    No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
    Lady Aleena

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1062854]
Approved by atcroft
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 08:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found