Stuff we can't be without - Searchdaimon AS

Google Mini gets discontinued

Runar Buvik — 2012-07-05T20:59:46+00:00

As part of Google's spring cleaning 2012 the Google Mini will be discontinued beginning July 31 2012. The Mini was Google entry level search appliances. Now the only offer will be the full Google Search Appliances that starts at approximately 30 000 $ or using the Google Site Search (pricing from 100 $) which is a low end hosted search with much less functionality.

More info is available at the official Google blog.

This also arise the question on what Google will be doing related to enterprise search in the future. There used to be a virtual edition of the Google Search Appliance, but that was discontinued for unknown reasons in 2008.

Now only the full Google Search Appliance is left, and the full version in many ways lack the features and flexibility of what the competitors are offering at a lot lower prices.

Conrad Wolfram at TED. Teaching kids real math with computers

Runar Buvik — 2012-06-30T17:57:26+00:00

Her at Searchdaimon we do a lot of math. Unfortanly it dos't look much like the math we learned at school.

The math we do is't done by hand, with a pencil on a pice of paper. The questions is't simple problems where the only skill needed is to break the question down to a format that can be inserted into on of the formulas in the formal book.

Conrad Wolfram, director at Wolfram Research and brother of Wolfram Alpha founder Stephen Wolfram has some great point on why we should be reform the teaching of mathematics.

Great article: The Six Commandments of Search Implementation

Runar Buvik — 2012-06-30T17:30:18+00:00

Your search engine is't a sql database or a content reposetory. Search engines normaly don't have Joins, ACID (atomicity, consistency, isolation, durability), locking, two-phase commit, or transaction journals.

Paul Nelson, cto at Search Technologies has written a great article about what you shouldn't try to force your search engine to do : http://www.searchtechnologies.com/searchchronicles/six-search-engine-commandments.html .

Free servers!

Runar Buvik — 2012-06-29T10:56:35+00:00

Searchdaimon and our sister companys is moving offices. If anyone need some old servers, hit us up. (Local pickup in Oslo only. Sorry, no shipping ).

Tip: Skip HAL and Kudzu with Amazon ec2 images

Runar Buvik — 2012-06-05T17:05:36+00:00

Don’t start HAL daemon and Kudzu at startup when making Amazon ec2 ami's. Both is installed by yum group install base on Red Hat/CentOS/Fedora, but are not needed, and may make booting slow.

Starting HAL daemon:

[FAILED]

The haldaemon may take upwards to 5 minutes to try to start. Then complains about a bus error.

Starting Kudzu:

[ OK ]

Kudzu for some reason makes the network unusual on first startup.

A beautiful filename, 251 char long

Runar Buvik — 2011-10-21T12:48:39+00:00

A customer of us recently complained that he couldn't find a specific file, even when searching for word that he knows was in it. This is't a total uncommon question. Sometime the user don't have permission to the file, or the location is't indexed yet or some other problem. So our cto Runar Buvik asked what the name of the file was, so he could take a look.

- It's Protocol_Amending_the_Agreements_Conventions_and_Protocols_on_Narcotic_Drugs _concluded_at_The_Hague_on_23_January_1912_at_Geneva_on_11_February_1925_and_19 _February_1925_and_13_July_1931_at_Bangkok_on_27_November_1931_and_at_Geneva_ on_26_June_1936.doc.
- Protocol_Amending_the_Agree...

Yes, the name was in fact "Protocol_Amending_the_Agreements_Conventions_and_Protocols _on_Narcotic_Drugs_concluded_at_The_Hague_on_23_January_1912_at_Geneva_on_11_ February_1925_and_19 _February_1925_and_13_July_1931_at_Bangkok_on_27_November_ 1931_and_at_Geneva_on_26_June_1936.doc ". That 251 characters long! After some investigation it turned out that the underlying filesystem, ntfs, allow filename as long as 255 characters, but Windows refused to serve this file by SMB. Instead we got a "No such file or directory" error, even if opening the folder as a network share in Windows Explorer and clicking on the file.

There actual is such a treaty name according to Wikipedia, but that dos't mean that the file need to be named the same. Please keep you filenames below 128 characters people, or you will be in trouble sooner or later!

Bdw, The ES supports filenames up to1024 characters. Longer then that, and is't probably just noise anyway.

OpenMP, automatic threading

Runar Buvik — 2010-10-28T14:27:50+00:00

Tired of creating threes and writing code to manage deadlocks and work queues? Search is cpu intensive, and we uses a lot of threads. For example indexes are sorted in parallel, and the pages that go on the result page is fetched from the disk and processed in parallel. We started out creating threads manually, but that i slow going in C. We have now almost entirely changed to OpenMP, and haven't looked back since.

Initializing a large array in parallel is as easy as this.

int main(int argc, char *argv[]) {
        const int N = 100000;
        int i, a[N];

        #pragma omp parallel for
        for (i = 0; i < N; i++)
                a[i] = 2 * i;

        return 0;
}
Example from Wikipedia.

OpenMP will decide how many threads to use.

The Regex Coach, interactive regex testing

Runar Buvik — 2010-10-28T11:47:29+00:00

We are using a lot of regular expressions her at Searchdaimon. Regex are used through Lex and Yacc to pars queries, pars html and to make the snippets on the result page. It is also heavily used to extract and validate data, tags and entropies in the crawlers.

Her I am testing out a regex to extract email addresses and names from documents. The names and email addresses could then be added as attributes to the document, to enable filtering in the search results. Constructing regexs like this using only a text editor and relaying on try and fail won't be easy.

Link: http://www.weitz.de/regex-coach/

Blekko, internet search engine

Runar Buvik — 2010-10-28T10:52:14+00:00

Start-up company Blekko have made some revolutionary innovation in the field of internet search. Using their invention “slashtags” you can easy filter and sort your results. For example a search for “Apple Computers” gives you the results you would get in Google. But you can also add slastags to filter the results:

Apple Computers /shop – Gives prizes and shopping opportunities
Apple Computers /history – Gives pages about the history of apple
Apple Computers /finance – Hits from forbes.com, businessweek.com
Apple Computers /date – Newest pages first
http://www.apple.com/ /seo – Graphs showing seo and link info

We think slashtags is a great idea, and may change the way we use the web. Read more at http://searchengineland.com/blekko-a-new-search-engine-that-lets-you-spin-the-web-47215 or get a beta invite at http://blekko.com/.

There’s No Such Thing As A Google Killer, but both Blekko and Wolfram|Alpha have made great leap forward in the field of search technology, and may help to thin out Google’s dominance.