Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I want to split the words with some separator and i want to retain the separator outside the tag.

The separators are ,;—–

Input:

keyword1, keyword2, keyword3, keyword4

The output i need is:

<keyword>keyword1</keyword>, <keyword>keyword2</keyword>, <keyword>key +word3</keyword>, <keyword>keyword4</keyword>

What i have tried is this:

my @key=map(("<KEYWORD>".$_."<\/KEYWORD>") , split(/(?=([,;]|&mdash;|&ndash;|&hyphen;))/,$key));

Is this correct? i want to place the keywords alone inside the tag, spaces and separators should be placed outsid the tag.

Replies are listed 'Best First'.
Re: split keywords
by holli (Abbot) on Mar 19, 2005 at 10:01 UTC
    Like this?
    use strict; my $input = "keyword1, keyword2; keyword3&mdash; keyword4 &ndash; key +word5"; my $output = "<keyword>" . join ("</keyword>, <keyword>", split /\s*(? +:,|;|\&mdash;|&ndash;)\s*/, $input) . "</keyword>"; print $output;
    Or do you want to preserve the original separator in your output?

    Update: separator preserving version:
    use strict; use warnings; my $output = ""; my $input = "keyword1, keyword2; keyword3&mdash; keyword4 &ndash; key +word5"; my @data = split /\s*(,|;|\&mdash;|&ndash;)\s*/, $input; for (my $i=0; $i<=$#data; $i+=2) { $output .= "<keyword>" . $data[$i] . "</keyword>" . ($data[$i+1] ? + $data[$i+1] : ""); } print $output;
    Alternative inspired by bart:
    while( my($val, $sep) = splice @data, 0, 2 ) { $sep = "" unless $sep; $output .= "<keyword>$val</keyword>$sep"; }


    holli, /regexed monk/
Re: split keywords
by graff (Chancellor) on Mar 19, 2005 at 16:22 UTC
    Regarding the code that you tried, it looks like it will place "<keyword>" and "</keyword>" around the separators (commas, etc) as well as around the keywords themselves; looks like you need to study "perldoc perlre" a little more regarding the "zero-width positive look-ahead assertion", because you're using it here in a way that doesn't help your task.

    You want "split" to return all the original input characters, and just put "keyword" tags immediately around each string which does not consist of separator characters. So add some logic to the "map" block, like this:

    my $input = "kw1,kw2; kw3 &mdash; kw4&hyphen;kw5"; # separator is any string consisting of comma, semicolon, # &mdash;, &ndash; or &hyphen;, bounded by 0 or more whitespace: my $sep = qr{ \s* (?: , | ; | \&(?:[mn]dash|hyphen); ) \s* }x; # in the map block, add keyword tags to non-separator items my @out = map { /$sep/ ? $_ : "<keyword>$_</keyword>" } split /($sep)/ +, $input; print join "\n",@out,"";
    In this case, whitespace alone will not trigger a split; a single keyword item could contain multiple words separated by whitespace.
Re: split keywords
by Anonymous Monk on Mar 19, 2005 at 10:45 UTC

    holli, thanks for your reply, you have used comma in the output as default, but what i want is to replace the same character by which we are splitting.