How do I remove HTML from a string?

There are many applications that allow visitors to enter content that will then get displayed on screen. This can potentially lead to issues if the user enters HTML. The HTML may break the layout of your site, or even serve as a way for someone to steal information from other users on your site. In general, you almost always want to remove HTML from user input. ColdFusion provides a few ways to do that.

The simplest method is to use either htmlCodeFormat() or htmlEditFormat(). These two functions will find any HTML in a string and escape it. So if the user entered <b>TEXT</b>, the < and > characters will be escaped. Normally htmlEditFormat() is used as htmlCodeFormat() automatically inserts <pre> tags around the string.

While this method will escape the HTML, you may prefer to remove it all together. Luckily this is also rather easy using ColdFusion's regular expression support. The following UDF (user-defined function) from CFLib will do just that:

<cfscript>
function stripHTML(str) {
	return REReplaceNoCase(str,"<[^>]*>","","ALL");
}
</cfscript>

So to remove the HTML from a form value, you could do this:

<cfset cleanStr = stripHTML(form.input)>

This question was written by Raymond Camden
It was last updated on May 7, 2006.

Categories

Strings

Comments

comments powered by Disqus