Fork me on GitHub

<< Back to FAQ overview

Turn off indexing of certain web pages

If you don't want some of your web pages to be indexed, you can tell our crawler by appending user-agent sdbot-robot to your robots.txt, as described in The Robots Exclusion Protocol:

    User-agent: sdbot
    Disallow: /folder/

Geting a “Out of disk space” but have some free space

In addition to storing crawled data the ES need room for log files and temporary data during indexing. For this reasons crawling will stop with a "Out of disk space" error if you have less then 4 GB of free disk space on partition /boithoData.

Please see the Increasing the size of the virtual disk in VMware section for example on how to increase a virtual disk.

What is the robot name of the Intranet crawler?

It is "sdbot".

Correct syntacs to allow the crawler to index "folder".

    User-agent: sdbot
    Allow: /folder/


Copyright © Searchdaimon AS. All rights reserved.