Technology Help Website: When will a retired page drop out of search results?

Searching, Crawling, and Indexing of retired pages

Our Technology Help website page retirement process puts pages behind authentication, causing the webserver to return a 403 error to search crawlers.
  • After a document has been indexed, a 403 error does not result in immediate removal from the search index.
  • Retired pages are likely to drop from the search index in less than 12 days.
A page will be dropped after the search crawlers receive a 403 error on four consecutive attempts to retrieve the page.
 
For the Google Search Appliance, used for IT@UMN searches and University of Minnesota search, a page may take approximately one month to drop from the search index, following the pattern described in this Google Search Appliance documentation.
 
Scenario
Recrawl Attempts
Document Removal from the Index

The search appliance encounters an error during crawling that could be a server timeout error (500 error code) or forbidden (403 errors).

First recrawl attempt: 1 Day
Second recrawl attempt: 3 Days
Third recrawl attempt: 1 Week
Fourth recrawl attempt: 3 Weeks

The document is removed if the search appliance encounters the error for the fourth time.

 

 

Last modified

Changed

TDX ID

TDX ID
2985