Google Search Appliance: Report Duplicate DNS Names
Many Web sites can be reached using both a www and non-www version of the host name in the URL. Sites can also be reached using secure (https) or non-secure (http) protocol. This can lead to unnecessary document duplication in the Search Appliance index because each combination of www/non-www and https/http is considered a unique document. It also creates unnecessary load on your server. We can avoid this by notifying the Search Appliance of duplicate DNS host names.
Ideally, a server will redirect ("URL rewrite") the nonstandard versions to the preferred, canonical version. For example, this server will respond to the URL "http://umn.edu/" by returning an HTTP 302 (Moved) status code and a redirect header "Location: http://twin-cities.umn.edu/". In this case, we would not need to notify the Search Appliance of duplicate hosts; it does not index documents that return error codes.