Fixing a small HTML entity encoding issue with HL Twitter

The tweets displayed on the right-hand column of my blog are displayed with an excellent little utility called HL Twtter.

I found a little bug: HL Twitter doesn’t seem to unescape HTML entities when displaying tweets.

I made a minor edit to the plugin’s functions.php file that appears to have resolved the issue. Add the four highlighted lines below to the hl_twitter_show_tweet() function to clean up the tweets a little before displaying them (additional lines provided for context):

	Returns a tweet with all links, hashtags and usernames converted to links
function hl_twitter_show_tweet($tweet) {
	$tweet = preg_replace("/&/", "&", $tweet);
    $tweet = preg_replace('/&#(\d+);/me',"mb_convert_encoding('&#' . intval(\\1) . ';', 'UTF-8', 'HTML-ENTITIES');",$tweet); #decimal notation
    $tweet = preg_replace('/&#x([a-f0-9]+);/mei',"mb_convert_encoding('&#' . intval(0x\\1) . ';', 'UTF-8', 'HTML-ENTITIES');",$tweet);  #hex notation
    $tweet = html_entity_decode($tweet);
	$tweet = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t< ]*)#", "\\1<a href=\"\\2\">\\2</a>", $tweet);
	$tweet = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r< ]*)#", "\\1<a href=\"http://\\2\">\\2</a>", $tweet);
	$tweet = preg_replace("/@(\w+)/", "<a href=\"\\1\">@\\1</a>", $tweet);
	$tweet = preg_replace("/#(\w+)/", "<a href=\"\\1\">#\\1</a>", $tweet);
	return $tweet;
} // end func: hl_twitter_show_tweet

I didn’t use chr() on lines 262 and 263 because it doesn’t support Unicode characters (such as the em-dash I was looking for).

As always, comments and suggestions are most welcome. It wouldn’t surprise me if there were some edge cases I didn’t catch.

Update 7/18: Got in touch with the developer, Luke. At some point, PHP is supposedly phasing out support for executing code in preg_replace but he’ll implement a different fix. Thanks! 🙂