Google Search Appliance: Searchable and Excluded Content Types
Google is retiring the Google Search Appliance at the end of 2018. In anticipation, the University of Minnesota will adopt Google's Custom Search Engine. Please contact firstname.lastname@example.org with questions about this change.
Our Google Search Appliance (GSA) license allows us to index up to 3 million documents. We could quickly reach this limit if we did not exclude problematic web pages from the index.
Currently excluded content
Pages that have limited search value, for example, due to duplication.
Pages that recursively link to themselves.
Pages whose URLs contain session data.
Binary files, such as ZIP archives.
URLs containing a '?' character.
If you cannot find your documents with the Search Appliance, or in the Search Appliance's Index Diagnostics, you can submit the URLs to Search Appliance support by emailing them to email@example.com.
Non-HTML document types included in the index
Microsoft Office: Word (.doc), PowerPoint (.ppt), Excel (.xls)