Archive for the 'machine learning' Category

Introducing Apache Mahout Article posted on IBM devWorks

My first article on Apache Mahout was just published on IBM devWorks.  It’s targeted at people just getting started with machine learning and Mahout.   You can read the article at Introducing Apache Mahout.  Feedback welcome.

Mippin: Powered by Apache Mahout

Sean Owen, Apache Mahout committer has put up a brief post stating that Mippin is using Apache Mahout for it’s recommendation system.
Read it: Mippin Blog: May We Recommend….

Natural Language Processing Virtual Reading Group | Google Groups

Natural Language Processing Virtual Reading Group | Google Groups
A few people in the NLP LinkedIn group have decided to start up a virtual reading group.  All are welcome.

SF Bay Area Lucene/Solr Meetup

Just wanted to follow up on last night’s Lucene/Solr Meetup in San Francisco.
First off, special thanks to all the speakers (Jason Rutherglen, Michael Busch, Erik Hatcher and all the lightning talks.)  We had a lot of excellent talks ranging from low level Lucene details on payloads and real time search to high level discussions on [...]

Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel

Copying TREC is the Wrong Track for the Enterprise | The Noisy Channel.
Daniel Tunkelang has written up an interesting post on the new Open Relevance Project that me and a few other Lucene people are starting up and I thought I would respond here:
Little late to the conversation, but I think maybe we should back [...]

Interview with Mike Klaas | Lucid Imagination

Interview with Mike Klaas | Lucid Imagination.
Looks like my interview with Solr committer Mike Klaas is up.  Check it out, some interesting discussion about Worio’s use of Solr and also some discussion on machine learning in the context of search.

Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog

k-means and other EM-like algorithms are trivial to parallelize because all the heavy computations in the inner loops are independent.
via Speeding up K-means Clustering with Algebra and Sparse Vectors « LingPipe Blog.
This is exactly what Apache Mahout does.  We have parallelized versions of a bunch of clustering algorithms, including k-means

Hadoop, Analytical Software, Finds Uses Beyond Search – NYTimes.com

Hadoop, Analytical Software, Finds Uses Beyond Search – NYTimes.com.
Nice writeup on Hadoop in the NYT today.  Of course, Hadoop is often used to power machine learning, too, which is the premise behind using it on Apache Mahout.

Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox

Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox
Mark Miller shows how to add Lucid’s Lucene ecosystem search as a Firefox plugin.  Now you can search all the Lucene project (and subproject) archives, website, wiki from the comfort of your browser plugin.

GSOC 2009 at the ASF: Looking for students interested in Lucene

SummerOfCode2009 – General Wiki
It’s that time of year again.  Time for students to sign up for Google Summer of Code.  Gist of it:  Get paid to work in Open Source for the summer.
I’ve signed up to mentor for Apache Mahout.  We are looking for students interested in implementing cutting-edge machine learning algorithms, optionally using Hadoop [...]