How can I get user and Search Engine Friendly URLs?
Due to the dynamic nature of ColdFusion websites, you may find that some of your pages end up with a long string of url parameters that make the pages less then intuitive and user friendly. Along with looking out for your human visitors, you may also want to optimize your dynamic URLS for more favorable search engine treatment. While this recipe is not the place to debate search engine optimization techniques, we are running under the assumption that search engines seem to prefer more "human readable" URLS over dynamic URL strings.
Lets take a look at some sample URLS:
http://www.coldfusioncookbook.com/index.cfm?event=faq
works much better as:
http://www.coldfusioncookbook.com/faq
and
http://www.coldfusioncookbook.com/index.cfm?event=showentry&id=1
works much better as:
http://www.coldfusioncookbook.com/entry/1/How-do-I-mail-the-contents-of-a-form?
So how can you make your URLS more readable? 1) Write some custom code using the cgi.path_info variable, or 2) Have your web server do the work for you.
1) By using cgi.path_info and a little custom code, you can have ColdFusion parse down your more complex URLs in favor of something more simple.
It's really two parts. First we have to recognize the "weird" URL form - and once we do - we then parse it.
To begin with - whenever a URL comes in with the form, http://host/filename.cfm/stuff/at/the/end, your web server will recognize that "filename.cfm" is the file you want. It will then take the "extra" stuff and store it in a CGI variable, path_info. Sometimes - this CGI variable will also contain the filename. Luckily, Michael Dinowitz wrote a nice little article showing sample regex to "clean" this value. I don't seem to see a "direct" link to his article, but it on the House of Fusion website. (Look for the article, "Search Engine Safe (SES) URLs.) In this article he has a full blown UDF for dealing with the values, but I'm going to focus just on the regex. This example below shows it in action:
<cfoutput>
cgi.path_info=#cgi.path_info#<br>
stripped: #pathInfo#
</cfoutput>
You don't have to worry too much about the regex, it basically just handles removing any potential filename from the CGI variable. I'm not seeing any filename on my Apache or IIS server, but I know I've seen it in the past.
At this point we have a pathInfo variable that will store any information that added to the end of our filename. How do we parse this? Obviously you have a ColdFusion list using the / character are a delimiter. In my example above, "http://host/filename/stuff/at/the/end", my pathInfo variable would have: "/stuff/at/the/end". How I parse that is up to the application. In BlogCFC, I check the length of the value (using listLen and / as the delimiter) to make sure the length is 4. The first three values refer to the date and the last item refers to the alias.
You may want to use a format that is like typical URL variables. Something like: http://host/filename.cfm/product/323. In this form, the URL is simply another way of saying: http://host/filename.cfm?product=323. To parse this form, I would have to loop over the list and set URL variables. Here is a sample that will do that:
var pathInfo = reReplaceNoCase(trim(cgi.path_info), '.+\.cfm/? *', '');
var i = 1;
var lastKey = "";
var value = "";
if(not len(pathInfo)) return;
for(i=1; i lte listLen(pathInfo, "/"); i=i+1) {
value = listGetAt(pathInfo, i, "/");
if(i mod 2 is 0) url[lastKey] = value;
else lastKey = value;
}
//did we end with a "dangler?" if((i-1) mod 2 is 1) url[lastKey] = "";
return;
}
What are we doing here? As I mentioned before, we begin by looking for stuff after the final slash. If we find nothing, we exit the function. (Normally a UDF returns something. A return statement by itself just means to leave the function without returning anything at all.)
Next we treat the value as a list and loop over it. We want to do things in twos - in other words, the first item is a variable, the second is a value. We simply check our list counter, i, and on odd numbers, we store the value as "lastKey", and on even numbers, we write to the URL scope. (UDFs should never directly access variables outside their own scope. Except when they should. ;) This code assumes an even number of values. So what happens if the pathInfo variable is odd? (Ex: /products/5/foo) We treat this then as a "empty" variable and create the value in the URL scope with an empty string. This could be used as a flag. So for example, /productid/5/short, could mean set url.productid to 5, which is the database record to load, and "short" simply means show the shorthand version of the content.
2) As far as having your web server do the work for you, The solution is configure your web server so that the actual URL is intercepted and then displayed it as a more readable or friendly URL.
Appache
Apache has mod rewrite capabilities. You can set the rewrite rules in a .htaccess file. This file is good for whatever folder you place it in. So by placing the following code in a .htaccess file located at the root of your website, you would accomplish the url rewrites as shown in the sample URLS above.
RewriteRule faq /\?event=faq [PT]
RewriteRule entry/([0-9]+)/.* /\?event=showentry&id=$1 [PT]
IIS
IIS does not have built in rewrite functionality, but you can add it with Ionic's free ISAPI Rewrite Filter : http://cheeso.members.winisp.net/IIRF.aspx
Credit Note: Raymond Camden helped write part of this entry.
This question was written by Jeremy Petersen.
It was last updated on July 6, 2006 at 11:03:44 AM EDT.
CFML Referenced
Categories
Comments
Comment made by Martin on July 25, 2006 at 9:02 AM
The Michael Dinowitz article you mentioned can be found at http://www.fusionauthority.com/Article1.cfm/ArticleID=4226
Comment made by Matt on July 27, 2006 at 7:21 AM
I am right in assuming this using this url (as opposed to one including the file extension) is only possible with web server rewrite?
http://www.coldfusioncookbook.com/entry/104/How-can-I-get-user-and-Search-Engine-Friendly-URLs?
With this rule: RewriteRule entry/([0-9]+)/.* /\?event=showentry&id=$1 [PT]
Apache effectively processes this as
http://www.coldfusioncookbook.com/?event=showentry&id=104
there by calling index.cfm?
I wondered if it might be possible to use onRequestStart to do something...
session.requestString = cgi.script_name & cgi.PATH_INFO; if (FindNoCase("ack/",session.requestString) NEQ 0) { getPageContext().forward("index.cfm"); }
I get 404 because I guess JRUN is checking the file exists or something?
Comment made by Raymond Camden on July 27, 2006 at 7:25 AM
As far as I know, yes, you have to use an outside tool to do "full" SES URLs like that.
Comment made by Jeff Howden on August 14, 2006 at 2:37 AM
FWIW, I've used ISAPIRewrite from Helicon with great success -- <a href="http://isapirewrite.com/">http://isapirewrite.com/</a>.
Comment made by H Jaber on October 17, 2006 at 10:15 AM
Really nice explanation and excellent concept. Can you show an example of how to actually use this within an application? I am having trouble figuring out exactly what to change to adapt this udf to my app.
Thanks
Comment made by Raymond Camden on October 17, 2006 at 10:20 AM
That may be a bit hard - as every site would have it's own form of SES urls and their own way of parsing it.
Comment made by Sinuy on June 8, 2007 at 1:55 AM
hi Ray i dont get it.
it's that mean,we'll have to rely on the default server's setting(.htaccess) for the re-write capability on a shared-hosting server?
Comment made by Raymond Camden on June 8, 2007 at 8:46 AM
Sinuy - rewriting is really a web server job, not CF. So for "full" rewriting, you really need a web server to handle it, like Apache, or a plugin for IIS.
Comment made by Tobie on June 27, 2007 at 3:58 PM
Is this also possible on Unix using websphere?
Comment made by Raymond Camden on June 28, 2007 at 8:27 AM
Tobie - it should work. Just try it. ;)
Comment made by Duane Hardy on February 11, 2008 at 7:47 PM
I am reading from a webservice and need to add dynamic code to the end of a url link. For instance the webservice data may look like this:
<a href="www.url.com">Hello</a>
and I need:"?code=#id#" added to the end.
<a href="www.url.com?code=#id#">Hello</a>
Is there a coding method to do this upon reading the webservice data?