What's the most effective way to clean text pasted from Microsoft Word?
A common problem in forms arise when content is pasted from Microsoft Word. Characters sometimes become corrupted and do not store themselves well in the backend. One way to correct it is with a simple UDF (user-defined function) called deMoronize. This UDF will clean up the broken content and replace it with safer characters.
<cfinclude template="udf.cfm">
<cfset cleanText = deMoronize(form.text)>
This question was written by Raymond Camden.
It was last updated on March 21, 2006 at 2:26:06 PM EST.
Categories
Comments
Comment made by PaulH on March 22, 2006 at 2:15 AM
better to recognize that it's windows-1252 encoding which is a superset of iso-8859-1 & act accordingly. what if those folks actually wanted those chars encoded just as they were?
Comment made by Raymond Camden on March 22, 2006 at 9:31 AM
Well, good point, but the entry does ask how to clean it. Obviously if you don't want to clean it, modify it, etc, you wouldn't do this.