I've been involved in machine learning research for the past three years. I've had the opportunity to work with some exceptional people and attend conferences like ICML.
Here is a list of my personal machine learning projects which I've made publically available.
- Fast k-means
An implementation of the paper "Using the Triangle Inequality to Accelerate k-Means" by Charles Elkan, presented at ICML 2003.
-
Image Segmentation based on Union-Find
A simple, but fast, image segmentation program which takes a user-supplied comparison function and segements an image based on pixel equality. Only takes one pass to segment the image and scans left to right, top to bottom.
-
A Kernel Mixture of Gaussians clustering algorithm
An implementation of the paper "Kernel Trick Embedded Gaussian Mixture Model" by Jingdong Wang, Jianguo Lee, and Changshui Zhang. A mistake in the paper's update equations was corrected during the implementation. The code is provided as Matlab files and has a simple GUI for exploring the algorithm. This is a simple implementation which uses a direct eigenvalue decomposition of the kernel matrix and, as such, scales as O(n^3) with the data size. There are optimizations to reduce this running time. See the paper for details.
-
Fast one-dimensional k-means clustering
This is a small C implementation of the k-means algorithm specialize for clustering one dimensional (scalar) data. It was originally created for segmenting images, but there are almost certainly more uses. It runs in time (roughly) O(k log n) per iteration versus O(kn) for a conventional k-means. It also avoids recomputing the full means during each iteration. This code can cluster a 2000x1250 image in 1.6 seconds (converging in 33 iterations) on a 3.06GHz Xeon.
-
O(1) Solution to Least Common Ancestor and Range Minimum Query problems
This is a direct C implementation of algorithm presented by Michael Bender and Martin Farach-Colton in their paper "The LCA Problem Revisited". This code was written to be as clear and concise as possible. It is heavily commented and can serve as a basis for further specialization.