|
|
|
|
|
<< Back to documentation overviewVMware Player installationThis is a guide to installing the free version of Searchdaimon ES on the free VMware® Player. Step 1: Check system requirements
Searchdaimon ES runs on a 64-bit operating system. To run 64-bit operating systems under VMware, a 64-bit cpu with virtualization enabled is required (Intel VT or AMD-V). Some PC providers require you to enable virtualization in BIOS. Some older 64-bit processors doesn't support virtualization. Step 2: Short download and installation instructions
Next: setup and configuration of Searchdaimon ES. Step 2: Download and installation with screenshots
Next: setup and configuration of Searchdaimon ES. From consoleHow to manually configure your IP-address from the console
Log in as user 'setup'. You don't need a password.
Select "Network configuration".
Select eth1.
Uncheck dhcp and enter your IP-address, netmask and gateway. DemoSee the demo section for more examples.Increasing the size of the virtual disk in VMwareYou may increasing the size of the ES virtual data disk. If you have ES 2.3 or newer all you have to do is to increase the second hard drive in VMware. Se VMWare KB Article http://kb.vmware.com/kb/1004047 for hove this is done on your platform. Then reboot, and ES will detect and handle the rest automatically. If you don't see two disks please contact support for instructions.Exchange with Outlook 2008 or 2010 in WindowsWhen you click on a url from a Exchange connector in ES's search result page the email should open Outlook. This is done by crafting a special url in the format "outlook:000000003eb852348...". Sometimes Outlook don't register the "outlook:" url handler correctly. If this is the case noting will happen when you click on the link ( or the browser try to open a webpage ). We have created a small program that fixes this, without doing anything more. It can be downloaded below. Please read the README.txt for instructions. Download fix for url scheme in Windows: Outlook2007Scheme.zip The program creates and sets the registry key HKEY_CLASSES_ROOT\outlook\shell\open\command to the correct path for outlook.exe. Full source code is included. Microsoft ExchangePreparing the Exchange serverTo crawl Exchange you normally have change to tings on the Exchange server.
Setting up the ES partGo to add manually in the Collections/Resources menu and select the Exchange connector.
FAQ (Frequently Asked Questions)A lsit of frequently asked questions is avalibal at /documentation/faq/. Overview
Collections/Resources -> Overview
Overview is where you check status and configure your collections. Collections are grouped by type of crawler (SMB, Exchange etc.).
To recrawl a collection, select Crawl now. The collection will immediately start to recrawl if the Collection manager is allowed to crawl in this time interval (see Collection manager). To configure the collection, select Manage. You can also see some statistics there. Test-collections are managed from Crawler extensions (under Connectors). Remote installed crawlers generate Pushed collections, and should be managed from the remote host. ManageEdit collectionIn the Edit collection tab you can edit your collection settings in the same way you configured it when creating the collection. Details here. Advanced managementUnder Advanced management you'll find the following functions:
Result customization
StatisticsShows how many documents are crawled every second.
VMware ESX installationThis is a guide to downloading and installing the free version of Searchdaimon ES on a VMware® ESX Server. There is also a video tutorial showing the complete installation process. Step 1: Check system requirements
This version of Searchdaimon ES comes as a Virtual Appliance, and has been tested on VMware ESX Server. It may also work on other virtualization platforms supporting the Open Virtualization Format. VMware ESX Server and VMware Infrastructure Client can be downloaded from www.vmware.com. Searchdaimon ES runs on a 64-bit operating system. To run 64-bit operating systems under VMware, a 64-bit cpu with virtualization enabled is required (Intel VT or AMD-V). Some PC providers require you to enable virtualization in BIOS. Some older 64-bit processors doesn't support virtualization. Step 2: Download and installation
Next: setup and configuration of Searchdaimon ES. Find the IP-address
Tip: If the console screen is black, the screensaver has been activated. Click your mouse pointer in the middle of the screen, and move it around. Press Ctrl-Alt to exit the console.
The ES console visible from a VirtualBox session. The IP-address is visible on the third line.
Opensearch v1.1Method search
Example: http://demo.searchdaimon.com/webclient2/api/opensearch/1.1/ search?query=test&start=1 Username: demo Tips:
You may add the possibility for your users to subscribe to search results as rss feeds, by adding a link to api/opensearch/1.1/search?query=test to the head section of your result page. <link rel="alternate"
type="application/rss+xml"
title="test - Searchdaimon search"
href="api/opensearch/1.1/search?query=test" />
Getting startedTo log on, enter the IP-address of Searchdaimon ES (http://{IP-address}/) in your favorite browser. The address can be obtained from your system administrator.
Your username and password should be the same as when logging on to your computer. Tip: Adding this IP to trusted sites in Internet Explorer enables you to open MS Office documents in MS Office when clicking on them in your browser. IntroductionSearchdaimon ES is a free enterprise search solution, suitable for corporate use, adding search to webpages, OEM or all of the above. If you haven’t tested ES yet you should try the demo at /pages/demo/ and watch a short video tutorial to get a feel about what this is about: /pages/demo/introduction_video/. IntroductionIt is very easy to write your own server side connector. One of the strengths of the ES is the ability to write your own connectors in Perl, which run directly on the ES server. These connectors only needs to download the data from the source, then all data converting will be handled by the ES. Scan
Collections/Resources -> Scan
SMB supports scanning for Windows shared folders. This can be a good alternative if you don' want to add your shares manually. Fill out the IP address on the server to be scanned, or a range to scan more than one computer (ex: "192.168.1.0/24"). Ping-scanning is much faster if you are scanning a network, but some computers may not answer these types of ping requests.
After the scan has completed, it's easy to add new collections.
VirtualBox installationThis is a guide to installing the free version of Searchdaimon ES on the free Oracle VirtualBox® . Step 1: Check system requirements
Searchdaimon ES is a 64-bit system. If you have a 64-bit cpu you most enable virtualization (Intel VT or AMD-V).
Most PC providers require you to enable virtualization in BIOS.
Some older 64-bit processors doesn't support virtualization.
Step 2: Short download and installation instructions
Next: setup and configuration of Searchdaimon ES. Step 2: Download and installation with screenshots
Next: setup and configuration of Searchdaimon ES. From webadminConfiguration -> Settings ->Network configurationYou can change the servers network settings from the Configuration Settings ->Network configuration. Searchdaimon v2.1
Method search
Example: http://demo.searchdaimon.com/webclient2/api/sd/2.1/ search?query=test&start=1 Username: demo SearchingSearching is of course the most important aspect of ES. Start by writing one or more words that define what you are looking for. You can then drill down with filters and sorting.
Filtering and sortingYou can restrict your search to type of document, data source, date and meta-information such as contacts, customers, sales and projects. You can also sort on date or relevancy.
In the above picture, you can see the results of the query "enterprise search". The search has been further broken down to only include documents from the "Sales" collection. You can also filter the search to only include documents from a file type like Excel or PowerPoint, or from a date interval like this year or older than two years. CollectionsCollections are sources of documents. This might be shared files, your e-mail, or a CRM-system. Collections will appear as tabs in the search result. Clicking on a tab will filter out all other collections. SuggestSearchdaimon ES suggests query words while you are writing. The words proposed are fetched from documents the user has access to, so that domain and product names, which you can't find in traditional dictionaries, are included.
Spell checkingThe ES can propose correctly spelled words if you have misspelled a word. As for Suggest, the dictionary is built from indexed documents.
Inflections and stemmingSearching for "car" also shows documents containing "cars", etc.
Next: Also see the demo for more examples.Different versions, virtual applianceES comes in four different versions. One for installing on VMware ESX server, one for installing on the stand alone VMware Player, one for installing on Sun/Oracle VirtualBox and one general for all OVF compatible platforms. The different VMware versions requires 64-bit cpu with virtualization enabled. VirtualBox version can be run on a 32-bit cpu. All is available from the download section: Download ES Quickstart & administrator guide in Norwegian (Pdf)Opening files from network shares in Internet Explorer, FireFox or Google ChromeThe security model in many browsers prevent opening of files stored on Windows shares by default. To enable that you typecly have to find a way to allow the browser to open file:// url's.
The ES crawler APIThe ES connector API require you to make a Perl package that exports at least the subroutine crawl_update(). crawl_update() is called at regular intervals to see if it is any new data available. It shall inspect its source data and determents if new data have arrived. If so, it uses add_document() to add it to the search index. The data added to the ES is always referred to as a “document”, regardless of type and source. Open the administration panelUse a web browser to navigate to your administration panel at http://{IP-address}/admin . Username is 'admin', and password 'water66'. Follow the instructions from the First time wizard. This will allow you to change network settings and add the primary usersystem. Add manually
Collections/Resources -> Add manually
Add new collection.
When adding a new collection, you need to specify which crawler to use. Crawlers included in the standard version:
Specifying detailed informationThe details needed can vary between crawlers. For instance the SMB crawler:
When everything is filled in, click "Add collection". The collection will now appear in Overview, and will immediately start to fetch documents if we are allowed to crawl at this time of day (see Collection manager).
Add primary usersystemThe ES can be integrated with Microsoft Active Directory for handling authenticating and authorizing of end-users. We currently do not support any other user systems, or the possibility to add users to the ES directly. But you can use the ES without a users system to make public search functions, like search on a website. For legacy reasons you most setup a Active Directory as primary user system, even if you don't have an Active Directory. If you don't have an Active directory, please see below how to get true this step. Plain v1.0Method suggest
Example: http://demo.searchdaimon.com/webclient2/api/plain/1.0/suggest?prefix=s Username: demo Method cacheReturns local copy of requested document.
Example: http://demo.searchdaimon.com/webclient2/api/plain/1.0/cache/
Administration2/2195/?signature=580553829&time=1247055462
Where Administration2 is the collection parameter, and 2195 is the document parameter. Searching without logging in (anonymous / public search)
Searching without logging in (anonymous user) In the administration panel, go to Overview. Select "Manage" on the collection you want to be publicly available. Go to the "Advanced management" tab, and scroll down to the "Edit settings for Administration" section. Under "Anonymous collection", select "Enable anonymous search", and click "Submit changes". The changes will take effect immediately. To access the public interface, add "/public" to your url: http://<ip-address>/public 32-bit cpu issuesSearchdaimon ES is a 64-bit system. If you have a 64-bit cpu you most enable virtualization (Intel VT or AMD-V). Most PC providers require you to enable virtualization in BIOS. Some older 64-bit processors don’t support virtualization. If you only have a 32-bit cpu your only option is to use VirtualBox. Example: A Twitter connector using the Twitter json API and PerlThis example will show you how to make a custom connector for the ES. We will be crawling Twitter, a public data source, so we don’t have to worry about authenticating and data permissions. Twitter has an http api where you can see the latest twits for a user. This is done by crafting a special url in the format http://twitter.com/statuses/user_timeline/{USER}.{FORMAT}. For example CNN Breaking News have twiter page http:// twitter.com/cnnbrk . Making Rss and Json available from the following url's.
Getting startetStart by selecting the “Connectors” section in the ES admin. Then create a new connector by clicking on the “Create a new connector” button. The new connector will be issued a default name. So our first step is to change this to something reasonable. At the settings and parameters tab, set name to “MyTwitter” and click the “Update” button.
To make this connector as general as possible we are going to have with twitter screen name to index as an parameter. To do so we must first go to the settings and parameters tab and add a parameter called “screen name”.
At the configure test collection tab, set screen name to the twitter screen name you want to crawl. In this case “cnnbrk”.
The codeThen go to edit source tab where we will write the actual sorce code. The ES will have filed in some example code, but we don’t need that now. So start with removing all source code in crawl_update() so you get a clean routine like this.
sub crawl_update {
my (undef, $self, $opt) = @_;
};
The $opt variable is a hash reference containing all input options. For example the screen name we configured above will be at $opt->{'screen name'} . You can see the content in $opt by adding the following line to crawl_update(). warn "Options received: ", Dumper($opt), "\n"; At this point it's smart to test that the framework is working as exspected. Update the crawl_update() so you get:
sub crawl_update {
my (undef, $self, $opt) = @_;
warn "Options received: ", Dumper($opt), "\n";
};
Then click the save and run button below the code window.
The errors about mysql and bbdn can safely be ignored. You are not using threads and persistent bbdn connection. ImplementingBack at the edit source window we can start to implement the Twitter connector. We will be using the Cpan modules JSON::XS, use Date::Parse; and LWP::Simple in this connector. So first we add refferanses to them at the top of the source just below the other "use" and our statements.We gets: use Crawler; our @ISA = qw(Crawler); use LWP::Simple qw(get); use JSON::XS qw(from_json); use Date::Parse; Then we wil modefy crawl_update() to crawl Twitter. We build the url to the json feed. Then uses get() and from_json() to download and decode it.
my $jurl = "http://twitter.com/statuses/user_timeline/" . $opt->{'screen name'} . ".json";
my $t = from_json(get($jurl));
Finally we loop thru the json data, format it correctly, and submit is to the ES.
for my $usr (@{$t}) {
my $content = $usr->{text};
my $url = "http://twitter.com/" . "$usr->{user}{screen_name}/statuses/$usr->{id}";
next if $self->document_exists($url, 0);
my $substr = substr($content, 0, 50);
my $title = "$usr->{user}{name}: $substr ..";
my $created_at = str2time($usr->{created_at});
warn "Adding $title";
$self->add_document((
content => $content,
title => $title,
url => $url,
type => "tapp",
acl_allow => "Everyone",
last_modified => $created_at,
));
}
Click Save and Run. Hopefully you will see something like this.
Finally all we have to do is to enable anonymous search of this collection. Go to the Settings and parameters and select accesslevel as a input field. Then at the Configure test collection tab set accesslevel to "Anonymous". Click on the Public search page button in the left top corner and you will se the search page. Search for something.
Full code
package Perlcrawl;
use Carp;
use Data::Dumper;
use strict;
use warnings;
use Crawler;
our @ISA = qw(Crawler);
use LWP::Simple qw(get);
use JSON::XS qw(from_json);
use Date::Parse;
##
# Main loop for a crawl update.
# This is where a resource is crawled, and documents added.
sub crawl_update {
my (undef, $self, $opt) = @_;
warn "Options received: ", Dumper($opt), "\n";
my $jurl = "http://twitter.com/statuses/user_timeline/" . $opt->{'screen name'} . ".json";
my $t = from_json(get($jurl));
for my $usr (@{$t}) {
my $content = $usr->{text};
my $url = "http://twitter.com/" . "$usr->{user}{screen_name}/statuses/$usr->{id}";
next if $self->document_exists($url, 0);
my $substr = substr($content, 0, 50);
my $title = "$usr->{user}{name}: $substr ..";
my $created_at = str2time($usr->{created_at});
print "Adding $title\n";
$self->add_document((
content => $content,
title => $title,
url => $url,
type => "tapp",
acl_allow => "Everyone",
last_modified => $created_at,
));
}
};
sub path_access {
my ($undef, $self, $opt) = @_;
# During a user search, `path access' is called against the search results
# before they are shown to the user. This is to check if the user still has
# access to the results.
#
# If this is irrelevant to you, just return 1.
# You'll want to return 0 when:
# * The document doesn't exist anymore
# * The user has lost priviledges to read the document
# * .. when you want the document to be filtered from a user search in general.
return 1;
}
1;
Download the full source code at: http://www.searchdaimon.com/files/code%20examples/Simple%20Twitter%20connector.txt End users
Users -> End users
To make it possible to search at all, you have to active those users who should have access. Under End users, all users in the primary user system are listed, so it will be easy to click on those users you want to activate. Just remember to click "Update user access" when done.
User system: Setup Microsoft Active DirectoryTips: The ES needs a user account that can access Microsoft Active Directory and the resources you want to crawl. We recommend that you setup a separate user account for the ES. You can then tie down security later by giving this account only "read only" access to the different systems. The ES uses Ldap to connect to Microsoft Active Directory. Ldap is enabled as default in Windows server. If you have a standard setup of Active Directory you will only need to specify:
Exemple
Verify the user systemIf you are using Microsoft Active Directory go to “End users” and verify that you can list users. If you can’t you may have to go over the settings for your ad again. You will find this settings as “User systems” in the main menu. Enable users to loginSelect End users, and select which users should have search access.
You have to active those users who should have access. Her you see the Under End users, all users in the primary user system are listed. Click on those users you want to activate, and remember to click "Update user access" when done. User systems
Users -> User systems
To run Searchdaimon ES, you need at least a primary user system. Many companies already run Microsoft Active Directory. It is also possible to install secondary user systems, and map their users to the primary system.
User system: Don’t use a user system or single sign-onIf you don’t have a Microsoft Active directory you can just name it "Fake ad" and use the following values to get true this step.
Using thus values no end-users can log in, but you can allow everyone search access, as described in the faq at Searching without logging in (anonymous user) . Settings
Configuration -> Settings
Main settingsIn the Main settings tab, you can change license or administrator password. It is recommended to run the production version of Searchdaimon ES, but it is also possible to run the testing version. The development version is available by agreement only. Collection manager
Crawling and recrawling can take up a lot of resources. Every collection is scanned for new content. This means that every file on your server is checked for changes, new e-mail gets downloaded, and other content is scanned for updates. This can slow down performance of your content servers while crawling is active. If this is an issue for you, we recommend limiting crawling to nighttime. Advanced settingsThese are values used internally in the search engine. Do not make changes here unless you know what you are doing. Network configurationIn this tab you can change the servers network settings.
Statistics & logs
System -> Statistics & Logs
You can see which users are the most active, what the most popular queries are, and how many searches are performed every day. These are log files for running processes, and are meant for debugging. The query log shows the last 50 searches.
Add collectionsYou can add collections manually, or by scanning. Tips: If your active directory is sdtest.local and your username is sdes most servers need your username to be sdtest\sdes . Mark thet it is "\", not "/". Phone home
Help -> Phone home
If you need help, our job may become easier if you activate Phone home. Then it will be possible for us to log in and perform maintenance if necessary. Contact us before you activate it.
Try the first searchUse a webbrowser to log in as a normal user at http://{IP-address}/. or http://{IP-address}/public if you have enabled public search in any collections. |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||