Looking at Website Site Indexing

What is Website Indexing?

Having your website indexed is a good thing. It means that search engines know what pages and content your website has, and can display it to their users via their search results pages. Search engines send out their spiders or web crawlers to find new content on the internet, and then index the content in their system. These spiders are particularly looking for new URLs, changes that have been made on a website, and links to and from other pages and websites.

Google and other search engines can take a while to index a new website, as well as any new changes that have been made. You can prompt them to recrawl your site, but the truth is, they will do it when they can and want to. Sometimes you may receive information about Google site index issues, which asks you to correct the problem. But sometimes these aren't actually problems you need to be concerned about or fix.

The CMS and our website builder platform have been built with SEO in mind, along with other factors. This means that we have set up 'rules' that we ask search engines to follow when indexing the sites built on our platform. For instance, checkout and shopping cart pages should not be indexed, as there is no need for them to be found by search engines and shown in search results. So, we block these two page types in robots.txt and do not include them in sitemaps. This might mean you receive a message from Google saying that there is an error with your site indexing, that pages are being blocked and you should fix them. However, this isn't actually an issue that affects your website search engine rankings, but rather Google being overzealous about wanting every single page indexable by their spiders.

Another thing we do is offer our users the ability to make individual pages on their websites 'hidden,' hidden and searchable', or 'blocked.'

Another thing we do is offer is the ability for website owners to make individual pages on their websites 

  • hidden (searchable) - a page is hidden from the menu and is not added to the sitemap, but can be accessed if you share the hidden URL with anyone or through a link on your website. It can be found through your onsite search box and in sitemaps for external search engines to include.
  • hidden (but accessible) - are password protected, hidden from the menu and are accessible by password only, but can be viewed by those who know the URL. If you link to a "hidden but accessible" page within your public pages, then Google may trigger a reasonable warning that you have linked to a page that you have also stated is not to be indexed. 
  • blocked - strong recommendation to Google that it does not index this page, though it can override this suggestion. The page will not be able to be seen by anyone, not even robots, so the content is not indexed, but can be edited in the CMS.
Any blog that is hidden or outside the future or expiration date will also get a noindex status and be excluded from sitemaps. Links can also have instructions added to them to tell Google not to look at the link. This is typically added to our cart and checkout links to stop Google robots going there.

These things are done for many different valid reasons, but once again it will affect the ability of a search engine to index a website, possibly resulting in a notification to correct the site index issues.

Common Website Indexing Terms

 It can be very concerning to receive such an email from Google though, so let us explain what some of the common issues and terms you may come across:

  • Blocked by robots.txt - Google has found your page, but there is an instruction for it to not show this page in search results, such as a shopping cart
  • 404 and other 4xx messages - there is an error and the page cannot be shown, usually because it has been deleted which is a 404
  • Nofollow - this is an instruction on a link from one page to another to not use the link for page ranking in search results
  • Noindex - an instruction to stop Google from indexing a page in web results, such as a shopping cart
  • Crawl errors - when a search engine bot cannot read your content or access a page, such as due to a 404 or missing page
  • Duplicate content - having identical content on pages within your own website is okay, though not ideal. Having identical content on your website and on a completely different domain name is seriously frowned upon by Google
  • Sitemap - a file that is created automatically by the CMS for your website which lists details about the pages, videos, files and content, along with the relationship between them all. A sitemap is a good thing that helps search engines index your website, and you should have one.
  • Broken links - these are links from one page to another within your site or another site that are not working, likely because the link URL is incorrect or no longer exists
  • Redirects - this is an instruction to a search engine to stop showing or visiting a given URL, and instead to show and visit another. Often used when a page has been deleted or a product is no longer available and customers are redirected to a replacement
  • Google penalty - this is a website ranking punishment from Google to a website because there is conflict between Google's marketing practices and what has been happening with the website. Things that will result in a Google penalty include excessive reciprocal linking between sites, cloaking and showing different content to different users, keyword stuffing and hidden text. 

Solving Website Index Issues

Before you do anything, you firstly need to identify if what Google has identified as issues are actually issues you need to solve. You will likely find that most are not something you need to be concerned about. There are a few you should take action and fix such as:

  • 404 errors - this means there is a missing page from your website, likely deleted at some stage. Under the SEO tab in the CMS, you will need to add a redirect from the 404 error page to a suitable live page, such as your home page.
  • Broken links - swap out the old hyperlink with a live one, or remove the link altogether. Avoid renaming SEO filenames. It's great to set a filename initially, but if you change the filename to something better, then the old filename becomes a broken link. We will automatically fix your sitemap and menu structure to use any new filenames, but any where you have linked to these pages using the link wizard will need to be relinked, so that the new link filename will be used. You can optionally add redirects from old URLs to new URLs. If you require help fixing up your broken links, contact support who can repair any linking issues for our minimum support fee.
  • Duplicate content - all content on your site should be 100% original to your website only, so rewrite if necessary. 
  • Google penalty - something major has been picked up by Google and they have dropped your rankings for keywords or the entire site. This is something you should contact an SEO professional about and ask them to investigate for you.

For further information, we recommend viewing our SEO resources or contacting a website expert for paid assistance. 

Tags: seo  

Posted: Friday 9 September 2022