Things We Like

While our main blog is all about Searchdaimon we also offer you our rough look, thoughts and raves about the search industry in general, computer programming and tips & tricks in general. From time to time we also come across companies and websites that make our life easier, or are just freaking awesome.



A beautiful filename, 251 char long

A customer of us recently complained that he couldn't find a specific file, even when searching for word that he knows was in it. This is't a total uncommon question. Sometime the user don't have permission to the file, or the location is't indexed yet or some other problem. So our cto Runar Buvik asked what the name of the file was, so he could take a look.

- It's Protocol_Amending_the_Agreements_Conventions_and_Protocols_on_Narcotic_Drugs _concluded_at_The_Hague_on_23_January_1912_at_Geneva_on_11_February_1925_and_19 _February_1925_and_13_July_1931_at_Bangkok_on_27_November_1931_and_at_Geneva_ on_26_June_1936.doc.
- Protocol_Amending_the_Agree...

Yes, the name was in fact "Protocol_Amending_the_Agreements_Conventions_and_Protocols _on_Narcotic_Drugs_concluded_at_The_Hague_on_23_January_1912_at_Geneva_on_11_ February_1925_and_19 _February_1925_and_13_July_1931_at_Bangkok_on_27_November_ 1931_and_at_Geneva_on_26_June_1936.doc ". That 251 characters long! After some investigation it turned out that the underlying filesystem, ntfs, allow filename as long as 255 characters, but Windows refused to serve this file by SMB. Instead we got a "No such file or directory" error, even if opening the folder as a network share in Windows Explorer and clicking on the file.

There actual is such a treaty name according to Wikipedia, but that dos't mean that the file need to be named the same. Please keep you filenames below 128 characters people, or you will be in trouble sooner or later!

Bdw, The ES supports filenames up to1024 characters. Longer then that, and is't probably just noise anyway.


OpenMP, automatic threading

Tired of creating threes and writing code to manage deadlocks and work queues? Search is cpu intensive, and we uses a lot of threads. For example indexes are sorted in parallel, and the pages that go on the result page is fetched from the disk and processed in parallel. We started out creating threads manually, but that i slow going in C. We have now almost entirely changed to OpenMP, and haven't looked back since.

Initializing a large array in parallel is as easy as this.

int main(int argc, char *argv[]) {
        const int N = 100000;
        int i, a[N];

        #pragma omp parallel for
        for (i = 0; i < N; i++)
                a[i] = 2 * i;

        return 0;
}
Example from Wikipedia.

OpenMP will decide how many threads to use.


The Regex Coach, interactive regex testing

We are using a lot of regular expressions her at Searchdaimon. Regex are used through Lex and Yacc to pars queries, pars html and to make the snippets on the result page. It is also heavily used to extract and validate data, tags and entropies in the crawlers.

Her I am testing out a regex to extract email addresses and names from documents. The names and email addresses could then be added as attributes to the document, to enable filtering in the search results. Constructing regexs like this using only a text editor and relaying on try and fail won't be easy.

Link: http://www.weitz.de/regex-coach/


Blekko, internet search engine

Start-up company Blekko have made some revolutionary innovation in the field of internet search. Using their invention “slashtags” you can easy filter and sort your results. For example a search for “Apple Computers” gives you the results you would get in Google. But you can also add slastags to filter the results:

  • Apple Computers /shop – Gives prizes and shopping opportunities
  • Apple Computers /history – Gives pages about the history of apple
  • Apple Computers /finance – Hits from forbes.com, businessweek.com
  • Apple Computers /date – Newest pages first
  • http://www.apple.com/ /seo – Graphs showing seo and link info

We think slashtags is a great idea, and may change the way we use the web. Read more at http://searchengineland.com/blekko-a-new-search-engine-that-lets-you-spin-the-web-47215 or get a beta invite at http://blekko.com/.

There’s No Such Thing As A Google Killer, but both Blekko and Wolfram|Alpha have made great leap forward in the field of search technology, and may help to thin out Google’s dominance.


 
   
  Categories
  Uncategorized
  Feeds
  Rss

Copyright © Searchdaimon AS. All rights reserved.