Alex Ksikes, Ph.D.

pdf version

CAREER GOAL My goal is to work with passionate like-minded people and to build great products which would positively impact the world.

INTEREST Text Mining, Information Retrieval, Exploratory Search, Applied Machine Learning, Big Data, Web Development, User Interfaces


Ph.D. in Computer Science, University of Cambridge, Jan 2014
Towards Exploratory Faceted Search Systems
supervisor: Zoubin Ghahramani

M.Eng. in Computer Science, Cornell University, June 2003
GPA = 3.92 / 4.0

B.Sc. in Computer Science, Cornell University, June 2002
Dean's list Spring 1999 and Fall 2002


Dec 2015 - present
Elite Quantitative Research Consulting
Machine Learning Engineer at EQRC. Particular focus is on high-frequency trading and big data machine learning.

Nov 2013 - Dec 2015
Elastic (formerly Elasticsearch)
Machine Learning Engineer at Elastic. Led the machine learning effort at Elastic. Came up with novel solutions and wrote several plugins for exploratory search and classification. Also taught the Elasticsearch core Paris training.

Jan 2009 - Dec 2013
University of Cambridge
Ph.D. candidate, full scholarship Isaac Newton Trust and EPSRC.

June 2009 - Sept 2009
Microsoft Research, Redmond
Intern at Microsoft Bing working on learning to rank web search results. Ensemble methods were performed on large library of models and on datasets of millions of documents. Our method showed a significant improvement of NDCG and was tested for production use.

Jan 2008 - May 2011, Newport Beach
co-founder, CTO and sole developer. The goal of Chiefmall was to build a social network for home improvement service providers. I designed and built the whole product (see project section for more details). The startup received angel funding in September 2010.

May 2006 - Sept 2006
University of California, Berkeley
Visiting Researcher, worked on text mining and search interfaces.
BioText Search Engine: beyond abstract search published in Bioinformatics.

June 2005 - Oct 2005
University of California, Irvine
Institute for Genomics and Bioinformatics
Visiting Researcher, worked on kernel methods for the analysis of chemical information.
One- to Four- Dimensional Kernels for Small Molecules and Predictive Regression of Physical, Chemical, and Biological Properties published in JCIM.

June 2004 - Sept 2004
Intern, Xerox Research Center Europe, Grenoble
Here is a presentation of Ensemble Selection given at Xerox.

Jan 2003 - June 2003

Sept 2002 - Dec 2002
Teacher Assistant on CS 100: Intro to Computer Programming at Cornell


Mathematics: Multivariable Calculus, Analysis, Linear Algebra, Abstract Algebra, Algorithmic.

Programming: Preferred programming language is Python. I also like to program in C, Java and Scheme. Fluent in Unix tools. Other skills in HTML, CSS, PHP and SQL.

Language: French (native) and English (fluent)

research related

Cloud Mining - [source code]

Cloud Mining automatically builds cool faceted search interfaces for your data. Simply get your data in a specific format, provide a custom look and feel for each search result and Cloud Mining does the rest.

See it in action on IMDb (1M movies) or on DBLP (1.2M computer science references) or on the whole of MEDLINE (20M articles with abstracts).

fSphinx - [source code]

fSphinx is a layer on top of Sphinx to perform faceted search. fSphinx powers the underlying backend of Cloud Mining and works with SimSearch (see project below) in order to combine full text with item based search.

Here is presentation of fSphinx given at the Sphinx Search Day conference, 2012.

SimSearch - [source code]

SimSearch is an item based retrieval engine with Bayesian Sets. Bayesian Sets is a new framework for information retrieval in which a query consists of a set of items which are examples of some concept. The result is a set of items which attempts to capture the example concept given by the query.

Biomed Search

Biomed Search was the largest search engine to look up images specifically in the biomedical domain. The system searches within captions and referring texts to images. Over one millions images were indexed. The projects was created in 3 months of works. Lucene was used for retrieval and for the web framework. All the back end work was programmed in python.

Here is a presentation of Biomed Search given at the University of Cambridge.

Biotext Search Engine

I also contributed to UC Berkeley's Biotext Search Engine. Search within captions and abstracts of biomedical images. More information on this publication.

web related

MLSS Admin - [source code]

The admin we used at Cambridge for the summer school. It let us browse through the fast amount of candidates, comment on each applicant and assign scores. Try a demo, how to use it, get the source code on github.

The admin has been successfully used to organize the Machine Learning Summer School 2009, 2010, 2011 and 2012 so far.


Chiefmall started as a project to index all contractors in the USA. The idea was to form the basis of a social network for service professionals. All active licensed contractors in California and in 8 other states were listed. Contractors would be able to claim their listing thereby building up their own website. Some screenshots of the account, company description, service locations, portfolio, adding a caption, get jobs or getting your license verified.

The website also featured an admin with the state categories, and what they are referring to, approving or banning companies, search logs or support with knowledge database.

Wikitrivia - [source code]

Random automatic quizzes using wikipedia. An interesting experiment released before the word mashup was first coined and which received over 50,000 hits on the first three days it launched. WikiTrivia is now open source.


Relevant machine learning courses I took at Cornell:

COMS 478: Machine Learning with Golan Yona (A-)
COMS 482: Theory of Algorithms with Jon Kleinberg (A)
COMS 578: Empirical Method in Machine Learning with Rich Caruana (A+)
COMS 678: Advanced Topics in Machine Learning with Thorsten Joachims (A)
COMS 790: Independent Research with Rich Caruana (A+)
COMS 778: Topics in Machine Learning with Rich Caruana (A-)
COMS 750: Evolutionary Computation and Design Automation with Hod Lipson (A-)

Linear Algebra with Anil Nerode (A+), Multivariable Calculus (A), Groups and Geometry with Birgit Speh (A+), Mathematical Physics with Kusse (A-)


Alex Ksikes (2013). Towards Exploratory Faceted Search Systems, Ph.D. Thesis. University of Cambridge, U.K. [dspace] [pdf]

Marti A. Hearst, Anna Divoli, Harendra Guturu, Alex Ksikes, Preslav Nakov, Michael A. Wooldridge, and Jerry Ye. BioText Search Engine: beyond abstract search, Bioinformatics, 2007 [scholar] [pdf]

ChloƩ-Agathe Azencott, Alex Ksikes, S. Joshua Swamidass, Jonathan Chen, Liva Ralaivola, and Pierre Baldi. One- to Four- Dimensional Kernels for Small Molecules and Predictive Regression of Physical, Chemical, and Biological Properties, Journal of Chemical Information and Modeling (JCIM), 2006. [scholar] [pdf]

Rich Caruana, Alex Niculescu, Geoff Crew and Alex Ksikes, Ensemble Selection from Libraries of Models, Proceedings of the 21st International Conference on Machine Learning (ICML), 2004. [scholar] [ps] [pdf]


Implementation of some Machine Learning algorithms with source code ...

A java implementation of a neural net. Solving the 8 queens problem with a genetic algorithm (from CS478). Implementation of the k-means algorithm using MDL for model selection. An experiment with evolving artificial neural networks. An implementation of the Gibbs sampling algorithm to detect motifs in sequences (from CS478). Source code of Shotgun aka Ensemble Selection. An implementation of greedy agglomerative clustering with basic graphing. Quick and dirty k-nearest neighbor.

Even more projects with source code on my github page

Transcripts and references available upon request