How to remove a web page from Google's index
Google’s search index is a database made up of millions of web pages from which, over time, Google’s search bots have crawled and gathered information.
When you perform a search in Google, the search engine trawls through its index and returns what it believes to be the most relevant results based on an algorithm of factors.
In most cases, website owners want their web pages to appear in Google’s search results to help drive traffic to their website. However, there are cases where you don’t want a URL to appear in search results or want to remove one that already does. I’ve shared some examples of why you’d do this below.
The process of removing an existing URL from Google’s index is known as deindexing. In this article I'll cover both how to remove an existing URL from Google’s index and how to avoid future URLs appearing, which is where most people go wrong.
I’ve recently performed this task for our this website and I was surprised at how difficult it was to find the correct process. So, in good old marketing fashion, if you find a gap for content… fill it! I present to you a guide to deindexing URLs from Google’s index and how to avoid future indexing of URLs that you don’t want to appear in search results.
Why would you need to deindex a URL or domain from Google?
There are many reasons why you wouldn’t want a web page to appear in Google. Here are just a few.
You have two pages with very similar content. Canonical URLs can help in these cases, but if the purpose of the page is, for example, a Google Adwords campaign that reuses existing site content, you may not want to show this page in organic search results. This could (although it’s unlikely) be classed as duplicate content. In a lot of cases, your Google Adwords landing pages will be focused heavily on generating conversions, and this may mean the styling differs from the rest of your website.
Development/testing versions of your URLs or domain
If the URL in question is for a preview or development version of your website, you do not want development versions appearing in Google alongside your live website.
Private web pages
Pages you only want a user to see once they are logged in shouldn’t be indexed by Google.
Maybe you have content that needs updating, but you don’t have time to do it just now. You could remove the page from your website or request Google deindex the page temporarily.
How to remove a URL or domain from Google’s index
To remove a URL from Google’s search results, follow the step-by-step guide below. This process will remove the URL from Google’s index temporarily for 6 months. You must follow the next stage – stopping URLs or domains from appearing in Google’s index – to ensure that when the temporary deindex is lifted, Google doesn’t reindex the URL.
Step 1) Create a list of the URLs or domains you want to remove from Google’s index.
Step 2) Go to your Google Search Console account and sign in – https://search.google.com/search-console/
If you do not have a Google Search Console account, you can create one from the link above. It will ask you to add a property to your search console. For guidance, see how to add a Google Search Console property.
Step 3) Select the property that contains the URL you want to remove from Google’s index. You may have multiple properties. For example, you may have properties for each subdomain or each version of your domain – https://www, http://www, no http/s, or no www. The property you need is the one that matches the URL that you want to remove from Google. Click that link and look at which domain version that URL is on.
Step 4) Click Removals from the left-hand menu.
Step 5) Click the red button labeled New Request.
Step 6) Enter the full URL (including http/s and www if required – depending on the property you have chosen).
Step 7) If you only want to remove the URL choose Remove this URL only. If you want to remove the root domain and every URL within it, choose Remove all URLs with this prefix. To do this only enter the root domain, not a full URL path for an internal page. For example, I’d enter https://www.contensis.com if I wanted to remove all URLs on this domain. I’d enter https://www.contensis.com/product/features to remove just this URL.
Step 8) Click Next.
Disclaimer: This request is temporary. Google will remove the URL from its search index now, but after 6 months it will be able to reindex the URL. Therefore it is vitally important you move straight to the next stage below.
How to stop URLs or domains from appearing in Google’s search results
Before you do anything
Before you do anything else, you must check your robots.txt file to ensure that you are not already blocking Google from crawling the URL in question. If you are, you need to remove this. This may sound odd, however, if Google cannot crawl your website, it cannot find the new noindex tag that you will place in the header of a particular page. This tag will tell Google not to index the page (more on this later).
I have a disallow instruction in my robots.txt so why has Google still indexed some of my URLs?
A common misconception is that you can instruct Google not to index your URLs by adding a disallow instruction in your robots.txt file. This does not always work. In fact, in most cases if this is the only place you have the instruction, you are likely to find you do have some URLs indexed by Google. In most cases you can find your robots.txt file by adding /robots.txt to the end of your domain. If you cannot find it here, ask your web developer to locate the file for you.
Example of disallow instructions in the BBC website.
There are two ways in which Google can find a website URL. The first is when you submit your sitemap to Google through the search console, or request index of a particular URL. The second is when Google follows a link from another website to yours. When Google takes the latter path to your website, it won’t always see your robots.txt file to know which URL you do not want it to index. This is the cause for most of the issues people have faced when finding URLs in Google’s index that shouldn’t be there.
To confirm this, the following statement is from the robots.txt FAQ section in Google’s Search Central guidelines and help documentation:
If I block Google from crawling a page using a robots.txt disallow directive, will it disappear from search results?
Blocking Google from crawling a page is likely to remove the page from Google's index.
However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.
How to correctly request a noindex from Google
To correctly request a noindex from Google you need to do two things. First, you need to ensure that you are not using a disallow instruction for this URL in your robots.txt file already. Check carefully because you may be disallowing a whole folder in which the URL is located. The second thing you need to do is add a specific tag.
To correctly request a noindex from Google, or other search engines for that matter, you need to do one of the following.
Add a meta robots tag
Place this in the <head> section of your URL – the page you do not want Google to index.
Add X-Robots-Tag HTTP header
This can be used as an element of the HTTP header response for a given URL.
For full and current instructions on how to add these tags and what to include, see Google’s Robots meta tag, data-nosnippet, and X-Robots-Tag specifications
Which method should I use?
The following is a quote from Google and can again be found in Google’s Search Central documentation
- robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization. If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
- robots meta tag: Use it if you need to control how an individual HTML page is shown in search results (or to make sure that it's not shown).
- X-Robots-Tag HTTP header: Use it if you need to control how non-HTML content is shown in search results (or to make sure that it's not shown).
If you have removed a disallow from your robots.txt in order for Google to see your new noindex tag, you may want to put it back afterward. As soon as you see Google has removed your URL from its search results (you may have to check this manually in Google search), put your disallow instruction back into your robots.txt file if required.