Fixing a small HTML entity encoding issue with HL Twitter
The tweets displayed on the right-hand column of my blog are displayed with an excellent little utility called HL Twtter.
I found a little bug: HL Twitter doesn’t seem to unescape HTML entities when displaying tweets.
I made a minor edit to the plugin’s functions.php file that appears to have resolved the issue. Add the four highlighted lines below to the hl_twitter_show_tweet() function to clean up the tweets a little before displaying them (additional lines provided for context):
/* Returns a tweet with all links, hashtags and usernames converted to links */ function hl_twitter_show_tweet($tweet) { $tweet = preg_replace("/&/", "&", $tweet); $tweet = preg_replace('/&#(\d+);/me',"mb_convert_encoding('&#' . intval(\\1) . ';', 'UTF-8', 'HTML-ENTITIES');",$tweet); #decimal notation $tweet = preg_replace('/&#x([a-f0-9]+);/mei',"mb_convert_encoding('&#' . intval(0x\\1) . ';', 'UTF-8', 'HTML-ENTITIES');",$tweet); #hex notation $tweet = html_entity_decode($tweet); $tweet = preg_replace("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t< ]*)#", "\\1<a href=\"\\2\">\\2</a>", $tweet); $tweet = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r< ]*)#", "\\1<a href=\"http://\\2\">\\2</a>", $tweet); $tweet = preg_replace("/@(\w+)/", "<a href=\"http://twitter.com/\\1\">@\\1</a>", $tweet); $tweet = preg_replace("/#(\w+)/", "<a href=\"http://search.twitter.com/search?q=\\1\">#\\1</a>", $tweet); return $tweet; } // end func: hl_twitter_show_tweet
I didn’t use chr() on lines 262 and 263 because it doesn’t support Unicode characters (such as the em-dash I was looking for).
As always, comments and suggestions are most welcome. It wouldn’t surprise me if there were some edge cases I didn’t catch.
Update 7/18: Got in touch with the developer, Luke. At some point, PHP is supposedly phasing out support for executing code in preg_replace but he’ll implement a different fix. Thanks! 🙂