Apache

Removing ?SESSID from Drupal URLs After Login and Search

?PHPSESSID= is appearing in indexed URLs and all path requests after I do some kind of form submission (Login or search) in Drupal 4.66.

Drupal 4.7 Comment

This issue seems to have been resolved in Drupal 4.7.
After upgrading to Drupal 4.7 from Drupal 4.66, this issue has stopped occurring. It seems that this problem was caused by the previous application of a security fix that upgraded Drupal 4.65 to Drupal 4.66. If you are using version 4.66, please continue reading the rest of this article for detailed coverage of this issue as well as the suggested solutions.

Description of Problem

The problem of the session ID appearing in Drupal URLs is both unsightly and potentially damaging to your SEO objectives and any successes made. I am writing this document because after getting 300 of my web pages indexed by google, it was impossible to ignore the case and impage of having session IDs show in my indexed URLs. Also, I begun to get blank pages when I call some URLs or attempt to surf my website as a logged-in user; notably login (/user) and search submissions.

I did some online reading to find out why this was happening and was actually undoing the benefits of using clean URLs in Drupal because it made the URLs look like they were querystring type URLs from a dynamic application that did not pay attention to SEO requirements regarding the syntax of acceptable URL paths.

Attempted Resolution

After isolating the instances that caused the SESSION ID appear in the URL, and since there was no clearly documented configuration switch that could get rid of this bother, I decided to look within the many files that run Drupal path generation to see if I could find a line referring to SESSID since this portion of the URL did not seem to change and was probably a hard-coded prefix of that variable. I did not find it after downloading the code and searching it using a desktop application.

Next, I suspected that maybe .htaccess had a mention that was calling session IDs having read in Drupal support forums that this session ID was being embeded to pass state between pages for users. This though did not explai why page indexing by google had to have this since the spidering bot did not and could not login.

Why Do Indexed URLS have SESSID?

I concluded that there are three instances that cause the SESSID to appear int he URL

  • Calling the login or user information URL - .../user
  • Using the search form
  • And browsing as a logged in user (related to the first cause)

In my opinion, the only purpose this ID could serve is that of maintaining login sessions and as for other surfing instances (such as search bot indexing trips), this was not necessary.

I concluded that this appeared in indexed URLs because I had a login link at the top of the screen and after the bot calls that URL, there is a session ID created and used in all the subsequently indexed paths. I quickly removed the login link to curb this problem in the short term and ensure that all google indexes from then on did not include the Session ID.

Solution

To solve this problem, I knew that I could look for the file that was creating this and somehow remove of change the line to exclude the session ID. I searched the Drupal knowledgebase and found at least two existing bug reports that relate to this: 'PHP Session ID in Google' and 'Some URLs get ?PHPSESSID added to them'. These two 'solutions' did not work for be either because of my hosting situation (Site5), or because the submitted path was not compatible with Drupal 4.66. I persistently got a 500 error for modifying .htaccess

Working Solution -Modifying Drupal's common.inc

I decide to edit the /includes/common.inc file in Drupal to remove/modify thecode that writes the session ID into the URL:

Between line 170 and line 189 I commented out /* */ the struck-out section of code on line 184 and line 187 to eliminate the inclusion of SESSION ID in any URL. There is a mention in an existing bug report on the Drupal website that this section is unnecessary and should be removed [http://drupal.org/node/4109#comment-38607]

---------------------

function drupal_goto($path = '', $query = NULL, $fragment = NULL) {
if ($_REQUEST['destination']) {
extract(parse_url($_REQUEST['destination']));
}
else if ($_REQUEST['edit']['destination']) {
extract(parse_url($_REQUEST['edit']['destination']));
}

$url = url($path, $query, $fragment, TRUE);

if (ini_get('session.use_trans_sid') && session_id() && !strstr($url, session_id())) {
$sid = session_name() . '=' . session_id();

if (strstr($url, '?') && !strstr($url, $sid)) {
$url = $url /* .'&'. $sid*/;
}
else {
$url = $url /*.'?'. $sid*/;
}
}

---------------------

This removed session IDs from my URLs although I still get occassional blank pages when searching or loggin in, which I currently attribute to the theme that I am using (Simple2) because I previously tested with a different theme and did not get blank pages. Once I build a new theme, I will change this to eliminate the second problem, but in the mean-time, anypages that will be indexed will not have the urgly SESSION ID in the URL and hopefully when google re-indexes the old pages, it will clear the session IDs.

Recommended Solution

Being that this problem is a PHP Session issues, the recommended solution is to block the use of session IDs in the URL to maintain state. This will require all applications to use cookies instead of session IDs to maintain state from one page to another.

Add the following lines in the .htaccess file

php_value session.use_only_cookies 1
php_value session.use_trans_sid 0

Common instructions say that you should just place them in the .htaccess file. When I try to do that, I get a 500 error and the error goes away if I place it between PHP tags to be:

<IfModule mod_php4.c>
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0
</IfModule>

NB: Drupal settings.php includes some run-time initialisation settings. It is NOT recommended to repeat the same setting in multiple places as this could create a conflic and cause lengthy troubleshooting.

Here are the settings in Drupal's settings.php that do the same thing that the previously discussed additions to .htaccess will do.

ini_set('session.use_only_cookies', 1);
ini_set('session.use_trans_sid', 0);

If you need more information, please contact us for any Web Production, eMarketing and Infrastructure advice and services

 

Drupal 6: installing, testing, troubleshooting and configuring the new version of the powerful CMS

Installing Drupal 6 from scratch

The next major version of the ever-popular CMS framework (yes, that is Drupal) is in your hands to install for testing or production deployment (depending on how many modules you want to use in your production environment).

Modules and add-ons

As you may expect, or have come to expect, module builders and themers are yet to churn-out new, and ported and updated version of existing modules and give-away themes. As such, if you are already running your site on a previous version of Drupal (Drupal version4.7x or 5.x), you should weigh your options before replacing your existing production version with the new version. Personally, I run pre-release and release versions of Drupal on a secondary environment for 1 to 4 months (or until my critical modules are ported - those that will affect access to content, and SEO), before I begin to roll-out or upgrade the newest version to the production server. This delay also enables me to thoroughly learn the workings, features and other changes; this properly equips me to quickly deploy the application and respond support my clients.

Step by Step Instructions

  • Download a fresh copy of Drupal version 6 from drupal.org.
  • Uncompress the source and upload it to your LAMP/WAMP server (Apache, MySQL, and PHP are requirements to exploit the many powerful features of Drupal)
  • Access your Database Management Application and create a new database, or identify and existing database in which you Drupal 6 will create database tables. Write down the database name, username (with ALL privileges to that DB) and password, you will need them below
  • * As usual, make sure that you also create a /files folder and make it writable (CHMOD 666 or more). This is the location where the application will write uploaded files to. If you do not create this directory at this time, you will be warned by the Drupal 6 installer to do so after you begin the installation process
  • Open your browser (my skewed recommendation is to always use Firefox) and go to the URL that defines the location where you uploaded your Drupal 6 source files - You will see a familiar Drupal installation screen with a global progress list on the left, and directions on the right
  • Assuming that you are installing in English, click on the first link to start. If you do not have a writable /files directory within your Drupal installation folder, you will see the following error message ("The directory files does not exist. To proceed with the installation, please ensure that the files directory exists and is writable by the installer. If you are unsure how to create this directory and modify its permissions, please consult the on-line handbook or INSTALL.txt.")
  • Once you have created a 'files' directory and given the correct permissions (666 - read/write or above), try again and you will proceed to the next step
  • Using the database name, username and password that you created, or noted when you created or identified the database, fill in the three fields provided at this stage. You have the option to expand to the advanced options if you want to define the name of your database server (if it is not on the same IP/server as your Drupal web server), database port, and the table prefix (if you are using an existing database, or intend to share the database with other applications, then you should define a table prefix to help you identify the tables by application or installation version)
  • The installer will display a progress meter while it writes to the file system (settings.php), and creates database tables to enable Drupal 6 to run
  • When done, the next screen will require you to enter elementary account and site information. This is a departure from earlier versions of Drupal in the following ways
    1. In previous versions, generic information was used and this information needed to be added after installation
    2. Previously, if you did not create the first user (UID=1) as soon as possible, as well as the first node, someone else could potentially hijack your installation and site/server (could run PHP code) by calling the installation URL and maybe secretly creating a controlling account
    3. In Drupal 6, you can choose the name of the admin account, and there is a password security evaluation metre that tells you how strong your password is, and if the repeated password matches
    4. There is a configuration test that tells you if you are able to enable and use 'clean URLs' based on whether your server configuration (Apache mod_rewrite) is enabled
    5. The last feature on this page is the ability to have Drupal 6 alert you whenever there are updates and upgrades to your current installation - this feature ensures that you are running the latest and greatest version (if you choose to install), or you are at least aware of the option
  • When you are done adding the requested information, you will see a message in the next screen informing you that your installation has been successful
congratulations, you now have a working Drupal 6 installation

Drupal clean and friendly URLs paths

An effective, flexible and useful Web Content Management System (WCMS) feature

Like many leading Web Content Management Systems, Drupal is especially flexible and powerful when it comes to providing administrators and content managers and publishers with the ability to easily create and maintain custom and flexible user-friendly paths and URLs/URI. Despite the typical assumption that a dynamic CMS application has to suffer from the inability to generate or accommodate the custom paths, Drupal makes it possible to match the flexibility and choice of a static, file-system-based web-site content path system while maintaining the power and flexibility of a Web Content management System.

General Requirements of a WCMS friendly path system

  1. To enable content creators and optimizers to create and apply human readable URL paths that are relevant to the page content in keywords and inference.
  2. Create URLs that are short and easy to remember for marketing programs and printed materials
  3. Allow the creation of path aliases to enable traffic channels to be tracked by using different paths for different sources
  4. Facilitate the option to create paths that mimic the file-system based URLs delimited by "/". This is useful for the migration of already established physical paths that are logical, semantic and have high pageRank
  5. Make the selected paths independent of the destination GUID so that one established path can be pointed at any location, and changed as necessary
  6. Ability to create more than one alias per physical content item: Especially in the migration, there are many pages that are redundant, and for the first 6-12 months, we will need to maintain 301 and 302 header redirects from many paths to one single resource

Basic features of a WCMS path alias management tool

  1. Integrate with the content creation work flow to enable content publishers to select a suitable path and verify that the path does not already exist - Paths by definition of URI have to be unique, and an effective path management system has to be able to alert the publisher when he/she attempts to create a duplicate path/URL
  2. The alias management tool itself will provide the option to edit and modify existing URL paths to enable the creation of paths without having to create a content object (e.g http://cmsproducer.com/nature-photography pointing to idonny.com), or to change the destination of an existing path/destination pair
  3. The friendly URL has to be able to mask characters that are reserved by the web-server/Operating System to eliminate the restriction not to use "-" in friendly paths
  4. The paths do not need to have file extension *.aspx, *.asp, *.php, *.htm, *.html unless the user chooses to create them or other kinds of extensions to mimic a legacy URL that is being replicated. - Ideally, all paths should be folder-level to eliminate the need for a technology specific extension that will be a liability if future choices of a CMS platform or technology. For instance, if a CMS requires the use of an extension, it will unnecessarily expose the technology in use (*.aspx, *.asp, *.php, *.htm, *.html), and if in future the site-owner chooses to migrate to a different web-server technology, it will prove unnecessarily difficult or impossible to maintain the same URLs/paths without having to translate them
Specify a friendly URL during content creation Listing of existing paths with the option to edit them or to create new ones Edit a selected path alias

Creating a path alias from page title content - Urlify vs Pathauto Drupal Modules

There are overwhelming reasons to NOT use the dynamically generated paths that Web Content Management System (CMS) or any other ECM or document management. Whether it's for SEO or for human users to be able to track and access content based on URL paths, it is always important to create short, content subject related URL paths that are easy to manage.

Various web server platforms feature various ways to handle and translate paths between the dynamic data laden path, to the desirable user-friendly path. In Drupal (running on Apache รข?? mod-rewrite or IIS with ISAPI rewrite), the Path module enables the content creator and manager to define and control a path alias for every content item (node).

In addition to this basic facility to define and manage the alias, there are other modules and additions that make it easy and to an extent automate the creation of path aliases based on already existing descriptive information such as a node title, or descriptions.

Pathauto Module

The pathauto module generates a URL path from the title of a page an it has been notoriously deemed as the cause of some performance issues in the earlier versions of Drupal 4.6. In addition, it generates very long aliases based on all the contents of a node title and this makes the aliases difficult to handle and remember (although it may be an advantage when it comes to SEO).

Urlify Module

The Urlify module in Drupal generates a URL based on the words in the node title based on some rules that can be created in the module settings to exclude certain common words that rarely carry any unique meaning for a page (such as: and, or, not, maybe, etc).

Comparison or pathauto and urlify

The main difference between urlify and pathauto is that unlike pathauto, urilfy generates the alias as a suggestion pre-populated into the alias section that the content publisher can edit before publishing the content. This is an advantage in that it allows control and oversight. Urlify handles aliases on a none-by-one basis (which may be considered slow); but it makes for cleaner and more responsible URL creation.

Pathauto automates the creation of path aliases, but this seemingly efficient process leaves the entire process to software functions that do not regard the nuances and considerations or Search Engine Optimisation (SEO) that a human editor will take into consideration. Considering that URL path aliases are meant for human readers, it is recommended to use an approach that allows for page by age (node by node) determination of the path alias.

Valid XHTML 1.0 Strict
This site is accepts Oped ID authentication for login
This Website is Built Using Semantic Markup and Cascading Style Sheets (CSS)
Some usage rights are reserved, please contact us for approval before using it