Google Search Appliance: Understand How it Works
The Google Search Appliance (GSA) builds a searchable index of all University web pages. It includes a web crawler that goes from page to page, branching out by following links on the pages it finds, eventually indexing nearly all of our public web content. The Search Appliance indexes both web pages and other types of content like Microsoft Office and PDF files. There are approximately two million findable documents in the index. When users send search queries, the Search Appliance answers by looking up the search terms in the index and sorting the results by their relevance.
If you run a webserver that is crawled by the Search Appliance, then in your website’s server logs, you may see visits with the hostname googlewb.oit.umn.edu or google90.oit.umn.edu. If those logs include the user-agent, then you would see the Search Appliance as "gsa-crawler".
Benefits of a Google Search Appliance
There are several reasons why a GSA makes more sense for the University than to send our local searches through Google.com:
- Only University of Minnesota results show up, so visitors don’t see ads or results from other universities.
- A local search appliance allows us to include content from affiliate websites, such as gophersports.com, that would be excluded if only searching sites ending in “umn.edu.”
- Web visitors will get search results tailored to the U of M. For instance, the Twin Cities campus uses the term "residence halls" for on-campus housing, while many people would instead search for "dorms." With the GSA, we can tell search to return a link at the top of the search results for Housing & Residential Life when someone searches for "dorms.
- We have the option to filter out unnecessary content. For example, it’s not necessary to index both a news article and its print or mobile version. Indexing only the main article helps that article be better positioned and more easily found in search results.