/** * @article Improving WordPress Codex * Search * * @since May 7, 2012 * @package Wordpress * * @tags Codex, mediawiki, * scrapers, * search, * WordPress * */
So I got this bug in my head a while back that there really needed to be a better lookup function for WordPress code information. Maybe its true that everyone who knows how to just greps the source, and the Codex search is only for beginners who don’t care if it takes a while to find what they want. I personally don’t believe that. Yes, looking through the source with ack-grep (or having your IDE do it for you) is going to be the quickest way to find what a given function does, or what order the arguments it takes should go in.
But there’s a few weaknesses to depending on the inline documentation. First, I already mentioned the problem of beginners who may be perfectly capable of putting together a theme, but are intimidated from searching through the source. Second, there are some classes in WordPress that are so complex that even with very good inline documentation, its much easier to follow a Codex entry than to view the source. Compare, for example, the Codex entry for WP_List_Table with the source.
Basic issue, so many people put so much effort into improving the Codex documentation, and so many questions are asked and answered on support forums that could be solved by simply searching the Codex, that I figured making the Codex more searchable would be a win for all involved.
So… the process
First step was to get a list of the pages on the Codex where redirects would be useful. I looked at exporting the Codex to work with locally, as per Eric Mann’s answer here on Stack Exchange. But that seemed like too much overhead for me. I didn’t need page text or revision history, just the page title was enough for me. So, I went with a simpler scraper on scraperwiki which pulled the page titles of every public entry in the Codex.
Running this function over http://codex.wordpress.org/index.php?title=Special:AllPages takes a few minutes, and gives me 4117 pages. Very nice. Now, a lot of these pages are in foreign languages which I don’t want to mess with. Actually, this stuff is not in any real order, so I decided not to try to handle everything with one function. Its probably safer to deal with the “Function Reference” entries separately from the “Template Tags” or “Plugin API” entries. So, I’m searching for titles that contain “Function Reference” and don’t contain a colon, as in es:Function Reference/wp_blahblahblah. Here’s my first stab at getting just the articles that need to be redirected:
Query: select * from `codex_pages` where title like “%Function Reference/%” and title not like “%:%”
That seems workable enough for my purposes. Its possible that I’ve missed some, but this is really just a first pass… There is still a lot of hand-tuning necessary to fix the entries whose titles don’t translate well into Mediawiki’s naming formats (like __(), for one). But creating 1000 good redirects is still useful.
For those who are interested, here’s how I ran the bot. I used Wikimate, an unofficial PHP SDK for mediawiki.
login($username,$password))) {
$error = $wiki->getError();
var_dump( $error );
exit;
}
$x = file_get_contents( 'https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=jsondict&name=wp_codex&query=select%20*%20from%20%60codex_pages%60%20where%20title%20like%20%22%25Function%20Reference%2F%25%22%20and%20title%20not%20like%20%22%25%3A%25%22%20' );
$pages = json_decode( $x );
foreach ( $pages as $i => $page ) {
$title = trim( $page->title, '/' );
$href = trim( $page->href, '/' );
$function_name = end( explode( '/', $title ) );
$function_href = end( explode( '/', $href ) );
if ( $function_name === $title || strstr( $title, '$' ) )
continue;
// check if the page exists or not
$create_page = $wiki->getPage( strtolower( $function_name ) );
if ( $create_page->exists() ) {
echo "E\t$function_name\t$function_href\n";
continue 1;
}
// Create redirect page
if (!$error = $create_page->setText( '#REDIRECT [['.$page->title.']]' ) ) {
var_dump( $create_page->getError() );
} else {
echo "N\t$function_name\t$function_href > $href\n";
}
}
So… does it work? Let’s see:
<form action="http://codex.wordpress.org/"> <input type="text" name="search" /> <input type="submit" value="Search" /> </form>
Try searching for a function name. You should get directed straight to the relevant Codex page, if one exists.