Archive for the 'Performance' Category
I had a Football (American Football, that is, not soccer) coach who always used to drill into our heads what happens when one assumes something about our opponent for that week; he’d get all worked up, hoist up his coaching shorts (you know the ones, they should be banned…), puff out his chest, give you [...]
September 22nd, 2009 | Posted in Apache, Lucene, Performance, Solr | No Comments
Andrezej Bialecki is giving a webinar for Lucid on Apache Lucene performance on Thursday. More info is available at:
Lucid Imagination » Understanding Lucene Performance – Free online workshop.
September 1st, 2009 | Posted in Lucene, Performance | No Comments
Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel.
Daniel Tunkelang has written up an interesting post on the new Open Relevance Project that me and a few other Lucene people are starting up and I thought I would respond here:
Little late to the conversation, but I think maybe we should back [...]
May 18th, 2009 | Posted in Apache, Lucene, Mahout, Open Relevance, Performance, Solr, machine learning, relevance | 2 Comments
Apache Solr 1.3.0 has been released. This version contains many, many improvements and bug fixes. High on my list are things like a good first step on distributed search support, integrated spell checking, support for Lucene’s “More Like This”, and the much needed Data Import Handler. Of course, one can’t forget about the numerous performance [...]
September 17th, 2008 | Posted in Apache, Lucene, Performance, Solr | No Comments
Text Processing: Why Servers Choke : Beyond Search
If you’ve been wondering how slow Lucene is, this paper gives you some metrics. The data seem to suggest that Lucene is a very slow horse in a slow race.
Are we reading the same paper? This hardly says Lucene is a slow horse in the race. What it [...]
September 7th, 2008 | Posted in Performance | 1 Comment
Apache Hadoop Wins Terabyte Sort Benchmark (Hadoop and Distributed Computing at Yahoo!)
Congrats to the Hadoop team! Score one for Open Source!
July 3rd, 2008 | Posted in Apache, Hadoop, Java, Map Reduce, Performance | 1 Comment
Jeff’s Search Engine Caffè
Copyright and distribution issues
Let’s say for a minute that a web search track is interesting. A major barrier to improvements in academic and open source web search is the lack of large-scale (hundreds of millions or even billions of pages) test collections that evolve over time. GOV2 is a static crawl of [...]
May 22nd, 2008 | Posted in Lucene, Performance, Search, TREC, queries, relevance | No Comments
For a while now, I have been trying to get my hands on TREC data for the Lucene project. For those who aren’t familiar, TREC is an annual competition for search engines that provides a common set of documents to index, queries to execute and judgments to check your answers to see how good an [...]
May 18th, 2008 | Posted in Apache, Java, Lucene, Nutch, Performance, Search, Solr, TREC, relevance | 9 Comments
FeatherCast » Blog Archive » Episode 43: Lucene
I did a FeatherCast today with Rich Bowen. Dang, he is quick at editing…
February 21st, 2008 | Posted in Apache, ApacheCon, Hadoop, Java, Lucene, Mahout, Nutch, Performance, Search, Tika, feathercast, machine learning | No Comments
Yahoo Search Wants to Be More Like Google, Embraces Hadoop
Hadoop is an open-source implementation of Google’s MapReduce software and file system. It takes all the links on the Web found by a search engine’s crawlers and “reduces” them to a map of the Web so that ranking algorithms can be run against them.
Ahem, Hadoop [...]
February 20th, 2008 | Posted in Apache, Hadoop, Java, Mahout, Map Reduce, Performance, machine learning | No Comments