PHP
Permanently Redirecting URLs using 301 Headers in Static and Dynamic Sites
Permanently redirecting traffic using a 301 header is the proper and fail-safe way to ensure that your web-visitors continue to find content that has permanently moved to a new locations. Search engine ranking is crucial for any business. Web infrastructure changes and changes in best-practices can lead to a change in the URI of content items. Being that Search Engine indexes take time to update (10 days to 3 months), it is important to continue to maintain the old URLs until all the external referrers are updated.
Why there may be a need to use new paths
It is probable you have gained some knowledge and technologies have changed since you last overhauled your website and you may have a new naming system for your paths that's uniform and that hopefully embeds keywords in your document paths to maximize your ranking advantage. This situation makes it necessary for a web producer to create new URL paths for existing documents while keeping the old URLs active until everyone is updated.
How to implement
You cannot reliably depend on client technology to handle any tasks beyond the display of simple standards-compliant markup. Therefor you can employ a number of methods on the server to implement 301 Header redirects as follows:
Server-side ImplementationWindows IIS with Active Server Pages (ASP)
<%@ Language=VBScript %>
<%
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", "http://newpagelocation.com/path"
Response.End
%>
PHP
<?php
header("HTTP/1.1 301 Moved Permanently");
header("Location: /my-new-path");
exit();
?>
ColdFusion
<CFHEADER statuscode="301" statustext="Moved Permanently">
<CFHEADER name="Location" value="http://newpagelocation.com/path">
mod_rewrite (Apache Web Server)
rewriteEngine on
rewriteRule ^contact\.php$ http://newpagelocation.com/path [R=permanent,L]
Placing this code in the template and testing for the requested URL to see if the page has been redirect can add a heavy overhead that could slow or even cause errors in your system. It is better to create placeholder pages with the URL of the removed pages and place the 301 redirect code for each case.
NB: If your CMS can schedule publishing and archival of pages, you can determine when to retire the redirect for a given page (estimate 6 months for all Search Engines to update their indexes)
For example, to implement this in Drupal:
- Create regular node of any node type
- Select the the PHP content type
- Place the following code in it and modify the destination URL to the new path alias (for engines to index the new name as opposed to the node ID)
Create a path alias for this document bearing the URl of the OLD path Save and publish the document
_________________________
How it works
Since you create a document with the same path as the old location, when the old URL is called, the redirecting node will 301 redirect it to the new location thereby enabling you to maintain your old URLs while promoting your new paths and indicating to Search Engines that your page has moved.
NB: Sending a 301 header redirect is the best and proper way to redirect permanently moved pages.
- Client-side Redirection
- As explained at the beginning of this document, the only reliable and proper way to redirect traffic from a non-existent page and properly signal the new location is to use the 301 Header redirect. In the absence of access to server technologies, the next best thing is to use client-side scripting and META REFRESH headers to redirect traffic. These methods do not indicate a permanent change or URI and bookmark and index driven agents such as search engines and link referrers will not know that your content has moved unless a human-being manually modified the path.
-
- Javascript
<script type="text/javascript">
window.location.href='http://newpagelocation.com/path';
</script>
META REFRESH
<html>
<head>
<meta http-equiv="refresh" content="0;url=http://newpagelocation.com/path">
</head>
<body>
This page has moved to <a xhref="http://newpagelocation.com/path"> The content has moved to http://newpagelocation.com/path</a>
</body>
</html>
Removing ?SESSID from Drupal URLs After Login and Search
?PHPSESSID= is appearing in indexed URLs and all path requests after I do some kind of form submission (Login or search) in Drupal 4.66.
Drupal 4.7 Comment
This issue seems to have been resolved in Drupal 4.7.
After upgrading to Drupal 4.7 from Drupal 4.66, this issue has stopped occurring. It seems that this problem was caused by the previous application of a security fix that upgraded Drupal 4.65 to Drupal 4.66. If you are using version 4.66, please continue reading the rest of this article for detailed coverage of this issue as well as the suggested solutions.
Description of Problem
The problem of the session ID appearing in Drupal URLs is both unsightly and potentially damaging to your SEO objectives and any successes made. I am writing this document because after getting 300 of my web pages indexed by google, it was impossible to ignore the case and impage of having session IDs show in my indexed URLs. Also, I begun to get blank pages when I call some URLs or attempt to surf my website as a logged-in user; notably login (/user) and search submissions.
I did some online reading to find out why this was happening and was actually undoing the benefits of using clean URLs in Drupal because it made the URLs look like they were querystring type URLs from a dynamic application that did not pay attention to SEO requirements regarding the syntax of acceptable URL paths.
Attempted Resolution
After isolating the instances that caused the SESSION ID appear in the URL, and since there was no clearly documented configuration switch that could get rid of this bother, I decided to look within the many files that run Drupal path generation to see if I could find a line referring to SESSID since this portion of the URL did not seem to change and was probably a hard-coded prefix of that variable. I did not find it after downloading the code and searching it using a desktop application.
Next, I suspected that maybe .htaccess had a mention that was calling session IDs having read in Drupal support forums that this session ID was being embeded to pass state between pages for users. This though did not explai why page indexing by google had to have this since the spidering bot did not and could not login.
Why Do Indexed URLS have SESSID?
I concluded that there are three instances that cause the SESSID to appear int he URL
- Calling the login or user information URL - .../user
- Using the search form
- And browsing as a logged in user (related to the first cause)
In my opinion, the only purpose this ID could serve is that of maintaining login sessions and as for other surfing instances (such as search bot indexing trips), this was not necessary.
I concluded that this appeared in indexed URLs because I had a login link at the top of the screen and after the bot calls that URL, there is a session ID created and used in all the subsequently indexed paths. I quickly removed the login link to curb this problem in the short term and ensure that all google indexes from then on did not include the Session ID.
Solution
To solve this problem, I knew that I could look for the file that was creating this and somehow remove of change the line to exclude the session ID. I searched the Drupal knowledgebase and found at least two existing bug reports that relate to this: 'PHP Session ID in Google' and 'Some URLs get ?PHPSESSID added to them'. These two 'solutions' did not work for be either because of my hosting situation (Site5), or because the submitted path was not compatible with Drupal 4.66. I persistently got a 500 error for modifying .htaccess
Working Solution -Modifying Drupal's common.inc
I decide to edit the /includes/common.inc file in Drupal to remove/modify thecode that writes the session ID into the URL:
Between line 170 and line 189 I commented out /* */ the struck-out section of code on line 184 and line 187 to eliminate the inclusion of SESSION ID in any URL. There is a mention in an existing bug report on the Drupal website that this section is unnecessary and should be removed [http://drupal.org/node/4109#comment-38607]
---------------------
function drupal_goto($path = '', $query = NULL, $fragment = NULL) {
if ($_REQUEST['destination']) {
extract(parse_url($_REQUEST['destination']));
}
else if ($_REQUEST['edit']['destination']) {
extract(parse_url($_REQUEST['edit']['destination']));
}
$url = url($path, $query, $fragment, TRUE);
if (ini_get('session.use_trans_sid') && session_id() && !strstr($url, session_id())) {
$sid = session_name() . '=' . session_id();
if (strstr($url, '?') && !strstr($url, $sid)) {
$url = $url /* .'&'. $sid*/;
}
else {
$url = $url /*.'?'. $sid*/;
}
}
---------------------
This removed session IDs from my URLs although I still get occassional blank pages when searching or loggin in, which I currently attribute to the theme that I am using (Simple2) because I previously tested with a different theme and did not get blank pages. Once I build a new theme, I will change this to eliminate the second problem, but in the mean-time, anypages that will be indexed will not have the urgly SESSION ID in the URL and hopefully when google re-indexes the old pages, it will clear the session IDs.
Recommended Solution
Being that this problem is a PHP Session issues, the recommended solution is to block the use of session IDs in the URL to maintain state. This will require all applications to use cookies instead of session IDs to maintain state from one page to another.
Add the following lines in the .htaccess file
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0
Common instructions say that you should just place them in the .htaccess file. When I try to do that, I get a 500 error and the error goes away if I place it between PHP tags to be:
<IfModule mod_php4.c>
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0
</IfModule>
NB: Drupal settings.php includes some run-time initialisation settings. It is NOT recommended to repeat the same setting in multiple places as this could create a conflic and cause lengthy troubleshooting.
Here are the settings in Drupal's settings.php that do the same thing that the previously discussed additions to .htaccess will do.
ini_set('session.use_only_cookies', 1);
ini_set('session.use_trans_sid', 0);
If you need more information, please contact us for any Web Production, eMarketing and Infrastructure advice and services
SEO within the Drupal CMS framework - Inbuilt Drupal Features that enhance Search Engine Optimization
Considerations for migrating to Drupal from a static website or other CMS
Web Content Management Systems have unfairly been blamed for killing traffic by compromising Search Engine Ranking as a result of the so-called Search Unfriendly URLs. to properly migrate a well-indexed static website to Drupal or any CMS without compromising your ranking, it is necessary to maintain existing paths. In my opinion, using a good CMS has so many advantages that the misplaced fear of machine URLs (no-longer an issue with mod_rewrite and ISAPI Rewrite).
Advantages of using a CMS
- Findability of content in development and production - Even the meticulously organized and labeled static folder structure can leave one wondering where his/her content is located; and just because you know where your pages are does not mean that your visitors know your structure.
- Automation of routine procedures - it's a life-saver when the schedule automatically publishes and archives content for you, especially when it's time-sensitive (events, announcements, press releases etc)
- Although server-side includes SSI may seem like they make it possible to create one piece of content and reuse it in multiple places, Web CMS makes it possible to add logic to the locations where you would like the content to appear
These and many other smaller advantages that one discovers while using a CMS have the potential of improving your SEO efforts (when properly used) by availing related content and links along with certain pages. Also, CMS make it possible to track content accesses at a granular level that regular logs cannot provide. As such, here is what I propose when migrating a static website to Drupal:
- Clean URLs: This feature is indispensable in making sure than even legacy search engines can access and index your pages
- GsiteMap: Google Sitemaps make it possible for site-managers to guide the way that google indexes pages on a website by providing priority guidelines. When using a WCMS, content is created and manipulated (published, archived, accessed) in a very dynamic manner that availa a virtually limitless linking structure. This makes it very difficult for you (the site owner) to create a list of documents in XML and submit the file to google on a regular basis. This module will ensure that google is informed of your (healthy) constantly updated website and any new pages you publish as well as how to find them. In addition, the WatchDog module in Drupal can log all the times when google visits and indexes a page.
- URL aliases: This feature in Drupal enables you to control the path-name of your pages thus enabling you to recreate already indexed static paths in your new CMS so that search results continue to point to your content in the new CMS home.
- Click tracking: I am the kind that constantly monitors traffic and wants to know what's working ad what's not, so by using this nice module, I can track upto 10 channels to know which traffic sources are bringing in the most traffic, and comparing these numbers as well as results from the tracker and browscap module, you can establish what is causing a perceived increase or decrease in traffic instead of guessing.
- Even without the click module, Drupal Tracker logs the referrers and will tell you who is bringing in the most traffic. Compare this with your past server-logs and you will not only be able to know the traffic sources, but also which pages are popular.
- PHPOpentracker: As tricky as it can be to properly install and configure, this application can show you exactly what paths your traffic is taking thereby helping you decide what pages need help/updates
Error 1067 while installing Apache 1.3.34 and PHP 4.4.2 on Windows Server 2003/XP
This is a report ofthe challenges, error message 1067 in Apache 1.3.34 and solution that I found to this problem while typing to get PHP 4.4.2 to run an Apache module.
Having installed WAMP using version 2.0.55 of Apache, and now recognising the fact that Apache 2.0.55 does not come with inbuilt SSL support (due to export restrictions of some governments and the pervasiveness of Open Source applications such as WAMP/LAMP), I decided to install the stable version of 1.3.x which has better SSL support. Also, I have experienced some problems and apparent instability running PHP and mySQL on Apache 2.0.55
In this procedure, I decided to replicate the version combinations of Site5 (a popular web host) with regard to the fact that there would be a slight version number difference being that Site5 is running LAMP and the same exact versions of AMP are not compiled for Windows.
Error 1067
I was able to install and test apache with no problems at all. Since I knew that PHP 4.4.2 worked well in my other WAMP installation, I made the lazy mistake of trying to borrow too much from that installation. I carried some of the files and php.ini file from that instllation and in hindsight, it seems that php4ts.dll file that I was trying to run was not wholly compatible with the rest of the files. As usual, I ran the binary installation and then attempted to bring-in the extensions and includes from my other installation to this setup.
After struggling and failing, I decided to read some online documents that I have previously relied upon to help me find my oversights and mistakes:
- http://httpd.apache.org/docs/1.3/windows.html
- http://www.thesitewizard.com/archive/php4install.shtml
- http://www.neothermic.com/tutorials/ApacheTutorial.php (clear and concise)
Solution
The steps seemed very clear and direct on all of them but I kept getting the same error message 1067 for which there was not clear diagnostics from Apache or anyone else. The details of the Windows error message (Yes, Windows reported that Apache had failed and asked me if I wanted to report the error), ther was a reference to php4ts.dll. At least I knew that Apache was looking for PHP, and based on my experience with error 1251 in which having the wrong file at the wrong place caused me argony, I decided to refresh the folders by downloading a fresh copy of the PHP application.
I renamed the existing folder and named the new folder as c:\php , save the old php.ini file in the right place and made changes to the folder references (I decided to keep the /sapi folder intact). I relaunched Apache and it worked like a charm.
I intend to downgrade my other WAMP installation to 1.3.34 to support SSL as well as implement virtual hosts so that I can generate multiple LOG files (something that multi-site hosting of Drupal cannot provide in Apache logs since it is running on a layer above the host - Apache 1.3.34)


