Klammer has asked for the wisdom of the Perl Monks concerning the following question:

Hi perlmonks

After writing a blog post and publishing it, WordPress transforms the given title to a permalink.

Title: "Test title"
Permalink: http://blogname.com/YEAR/MONTH/DAY/test-title

WordPress does some magic to transform the given title (strips out certain characters, replaces others, ...). I checked the WordPress source code, but all I could find was a function called sanitize_title within wp_includes/formatting.php (I'm not good at PHP)
function sanitize_title($title, $fallback_title = '') { $title = strip_tags($title); $title = apply_filters('sanitize_title', $title); if ( '' === $title || false === $title ) $title = $fallback_title; return $title; }
I want to write a script that supports me in writing a blog post offline by generating some html code which I could paste easily in the online editor later.

The thing is, I need the "sanitized title" within some URLs within the generated html code. So Perl has to sanitize the given title in the same way as WordPress to generate valid urls.

I've searched google and Super Search but couldn't find anything useful. Has anybody already written a perl version of "sanitize_title" or a clue where WordPress does its magic to convert a user given title to its final form?

My goal would be a function which I can pass my title to and which returns the sanitized title.

Cheers Klammer

Replies are listed 'Best First'.
Re: WordPress 'sanitize_title'
by moritz (Cardinal) on Sep 01, 2008 at 13:36 UTC
    You can easily find out what it does by feeding the PHP function a string and look at the result. For example you might put in something like this:
    perl -we 'for (1..127) { print "ABC", chr}; print $/'

    Let Wordpress handle that string, and split the result by ABC. Then you have mapping from ASCII to the the corresponding escape sequence.

      Hi moritz

      > Let Wordpress handle that string, ...
      I have to add, I don't host WordPress myself. We're talking about a blog on a hosted platform that uses WordPressMU (MultiUser).

      This means I don't have access to any PHP. Or in other words "I'm just a stupid user without any knowledge what happens behind the scenes". All I have is an online editor where I can paste my html code and publish the post.

      Nevertheless thanks for your answer :)

      Cheers Klammer
        You should be able to enter an URL, and then press the "preview" button; somewhere there you should see the sanitized URL (perhaps in the source code of one of those frames).
Re: WordPress 'sanitize_title'
by Anonymous Monk on Sep 01, 2008 at 14:53 UTC
    So you found sanitize_title (you're good enough at PHP), and we see
    $title = apply_filters('sanitize_title', $title);
    so where is apply_filters? How about you find it? Should be easy since WordPress is open source
      > So you found sanitize_title (you're good enough at PHP)

      Well, I didn't find it by checking the source but during a google research. Someone pointed to the sanitize_title function when it comes to permalink creation.

      So I downloaded the WordPress source code and found it in formatting.php.

      To be honest I also found the apply_filters function in wp-includes/plugin.php when trying to understand what "sanitize_title" does.

      But apply_filters as well as sanitize_title gives me no clue what WordPress does to convert the title. I expected to find some php string manipulation but apply_filters is all greek to me.

      function apply_filters($tag, $value) { global $wp_filter, $merged_filters, $wp_current_filter; $args = array(); $wp_current_filter[] = $tag; // Do 'all' actions first if ( isset($wp_filter['all']) ) { $args = func_get_args(); _wp_call_all_hook($args); } if ( !isset($wp_filter[$tag]) ) { array_pop($wp_current_filter); return $value; } // Sort if ( !isset( $merged_filters[ $tag ] ) ) { ksort($wp_filter[$tag]); $merged_filters[ $tag ] = true; } reset( $wp_filter[ $tag ] ); if ( empty($args) ) $args = func_get_args(); do { foreach( (array) current($wp_filter[$tag]) as $the_ ) if ( !is_null($the_['function']) ){ $args[1] = $value; $value = call_user_func_array($the_['function'], array +_slice($args, 1, (int) $the_['accepted_args'])); } } while ( next($wp_filter[$tag]) !== false ); array_pop( $wp_current_filter ); return $value; }

      My hope and the reason for my posting was, that someone already has written a module/function/... in Perl that accomplished this. It seems this is not the case.

      Cheers Klammer
        Once again you just give up too easily. PHP is no harder than Perl. You're just being lazy.
        sanitize_title applied to a post title by the sanitize_title function, after stri +pping out HTML tags. wordpress/wp-includes/taxonomy.php: $value = apply_filters("pre +_term_$field", $value, $taxonomy); wordpress/wp-includes/default-filters.php:$filters = array('pre_term_s +lug'); wordpress/wp-includes/plugin.php:function apply_filters($tag, $value) +{ wordpress/wp-includes/formatting.php:function sanitize_file_name( $nam +e ) { // Like sanitize_title, but with periods wordpress/wp-includes/formatting.php:function sanitize_title($title, $ +fallback_title = '') { wordpress/wp-includes/formatting.php: $title = apply_filters('sanit +ize_title', $title); wordpress/wp-includes/taxonomy.php: $args['query_var'] = saniti +ze_title_with_dashes($args['query_var']); wordpress/wp-includes/taxonomy.php: $args['rewrite']['slug' +] = sanitize_title_with_dashes($taxonomy); wordpress/wp-includes/taxonomy.php: $value = sanitize_title($va +lue); wordpress/wp-includes/taxonomy.php: $slug = sanitize_title($slu +g); wordpress/wp-includes/taxonomy.php: if ( '' === $slug = sanitize_ti +tle($term) ) wordpress/wp-includes/taxonomy.php: $slug = sanitize_title($nam +e); wordpress/wp-includes/taxonomy.php: $slug = sanitize_title($slu +g, $term_id); wordpress/wp-includes/taxonomy.php: $slug = sanitize_title($nam +e); wordpress/wp-includes/taxonomy.php: $slug = sanitize_title($nam +e, $term_id); function add_filter($tag, $function_to_add, $priority = 10, $accepted_ +args = 1) { global $wp_filter, $merged_filters; $idx = _wp_filter_build_unique_id($tag, $function_to_add, $priorit +y); $wp_filter[$tag][$priority][$idx] = array('function' => $function_ +to_add, 'accepted_args' => $accepted_args); unset( $merged_filters[ $tag ] ); return true; } // Slugs $filters = array('pre_term_slug'); foreach ( $filters as $filter ) { add_filter($filter, 'sanitize_title'); } add_filter('sanitize_title', 'sanitize_title_with_dashes'); function sanitize_title_with_dashes($title) { $title = strip_tags($title); // Preserve escaped octets. $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $ +title); // Remove percent signs that are not part of an octet. $title = str_replace('%', '', $title); // Restore octets. $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $ +title); $title = remove_accents($title); if (seems_utf8($title)) { if (function_exists('mb_strtolower')) { $title = mb_strtolower($title, 'UTF-8'); } $title = utf8_uri_encode($title, 200); } $title = strtolower($title); $title = preg_replace('/&.+?;/', '', $title); // kill entities $title = preg_replace('/[^%a-z0-9 _-]/', '', $title); $title = preg_replace('/\s+/', '-', $title); $title = preg_replace('|-+|', '-', $title); $title = trim($title, '-'); return $title; }
        http://www.w3schools.com/php/func_string_strip_tags.asp
        http://www.php.net/preg_replace
        http://www.php.net/strip_tags
        http://www.php.net/trim
        http://www.php.net/strtolower
        WordPress › Support » How does WP remove accents from Polish characters?
        http://codex.wordpress.org/Function_Reference_2.0.x

        http://adambrown.info/p/wp_hooks/hook/sanitize_title?version=2.6&file=wp-includes/formatting.php
        http://postedpost.com/2008/06/22/hack-wordpress-sanitize-title-function/
        http://postedpost.com/2008/06/23/ultimate-wordpress-post-name-url-sanitize-solution/
Re: WordPress 'sanitize_title'
by Anonymous Monk on Sep 01, 2008 at 15:09 UTC
    So Perl has to sanitize the given title in the same way as WordPress to generate valid urls.

    I don't see how that is possible without cooperation from your host, since that relies on wordpress configuration/plugins

      The script is only executed locally on my pc. It should be a little helper to generate some html code for me during writing a blog posting offline. So I wouldn't need host cooperation.

      When writing it offline I already know which title it will have. So I just need a script that take my title, sanitizes in the same way WordPress would do and generate URLs including the perl-sanitized title.

      Let's say I want to write a blog post with the title "Test title". Then Perl should take some internal date and time variables, sanitize the title in WordPress way and start building the URLs.

      I hope I described it properly :)

      Cheers Klammer
        The host can have any billion number of custom ways to sanitize the title. You have no guarantee its the default WordPress function.