| ||||
| CAREER GOAL | My goal is to work with passionate like-minded people and to build great products which would positively impact the world. | |||
| INTEREST | Text Mining, Information Retrieval, Applied Machine Learning, Web Development, User Interfaces | |||
| EDUCATION |
University of Cambridge, Department of Engineering, Oct 2005 - in completion 2013
Master of Engineering in Computer Science,
Cornell University, June 2003
B.Sc. in Computer Science with Minor in Applied Math,
Cornell University, June 2002 Math Sup/Spé, Lycée Polyvalent Ort, Strasbourg, France, Sept 1997 - Jan 1999 Scientific Baccalaureate with a Mention, Lycée International, Strasbourg, France, July 1997
Graduated with highest Honors, Mayfair High School, Los Angeles, June 1995 |
|||
| EXPERIENCE |
June 2009 - Sept 2009
Jan 2008 - May 2011
May 2006 - Sept 2006
June 2005 - Oct 2005
June 2004 - Sept 2004
Sept 2002 - Dec 2002 |
|||
| SKILLS |
Mathematics: Multivariable Calculus, Analysis, Linear Algebra, Abstract Algebra, Algorithmic. Programming: Preferred programming language is Python. I also like to program in C, Java and Scheme. Fluent in Unix tools. Other skills in HTML, CSS, PHP and SQL.
Language: French (native) and English (fluent) Hobbies: Tennis, Squash, Golf, Sailing, French Literature, Art, History, Traveling |
|||
|
PROJECTS research related |
Cloud Mining automatically builds cool faceted search interfaces for your data. Simply get your data in a specific format, provide a custom look and feel for each search result and Cloud Mining does the rest. See it in action on IMDb (1M movies) or on DBLP (1.2M computer science references) or on the whole of MEDLINE (20M articles with abstracts). fSphinx - [source code] fSphinx is a layer on top of Sphinx to perform faceted search. fSphinx powers the underlying backend of Cloud Mining and works with SimSearch (see project below) in order to combine full text with item based search. Here is presentation of fSphinx given at the Sphinx Search Day conference, 2012. SimSearch - [source code] SimSearch is an item based retrieval engine with Bayesian Sets. Bayesian Sets is a new framework for information retrieval in which a query consists of a set of items which are examples of some concept. The result is a set of items which attempts to capture the example concept given by the query. Biomed Search was the largest search engine to look up images specifically in the biomedical domain. The system searches within captions and referring texts to images. Over one millions images were indexed. The projects was created in 3 months of works. Lucene was used for retrieval and web.py for the web framework. All the back end work was programmed in python. Here is a presentation of Biomed Search given at the University of Cambridge. I also contributed to UC Berkeley's Biotext Search Engine. Search within captions and abstracts of biomedical images. More information on this publication. | |||
|
PROJECTS web related |
The admin we used at Cambridge for the summer school. It let us browse through the fast amount of candidates, comment on each applicant and assign scores. Try a demo, how to use it, get the source code on github. The admin has been successfully used to organize the Machine Learning Summer School 2009, 2010, 2011 and 2012 so far. Chiefmall started as a project to index all contractors in the USA. The idea was to form the basis of a social network for service professionals. All active licensed contractors in California and in 8 other states were listed. Contractors would be able to claim their listing thereby building up their own website. Some screenshots of the account, company description, service locations, portfolio, adding a caption, get jobs or getting your license verified. The website also featured an admin with the state categories, and what they are referring to, approving or banning companies, search logs or support with knowledge database. Google Modules was an iGoogle gadget directory which was released before Google's official'. The site still receives about 25 gadget submissions per day and has over 1700 del.icio.us bookmarks. Google Modules was co-produced with my friend Philipp Lenssen. Random automatic quizzes using wikipedia. An interesting experiment released before the word mashup was first coined and which received over 50,000 hits on the first three days it launched. WikiTrivia is now open source. |
|||
| COURSES |
Relevant machine learning courses I took at Cornell: COMS 478: Machine Learning with Golan Yona (A-)COMS 482: Theory of Algorithms with Jon Kleinberg (A) COMS 578: Empirical Method in Machine Learning with Rich Caruana (A+) COMS 678: Advanced Topics in Machine Learning with Thorsten Joachims (A) COMS 790: Independent Research with Rich Caruana (A+) COMS 778: Topics in Machine Learning with Rich Caruana (A-) COMS 750: Evolutionary Computation and Design Automation with Hod Lipson (A-) Linear Algebra with Anil Nerode (A+), Multivariable Calculus (A), Groups and Geometry with Birgit Speh (A+), Mathematical Physics with Kusse (A-) |
|||
| PUBLICATIONS |
Marti A. Hearst, Anna Divoli, Harendra Guturu, Alex Ksikes, Preslav Nakov, Michael A. Wooldridge, and Jerry Ye. BioText Search Engine: beyond abstract search, Bioinformatics, 2007
Chloé-Agathe Azencott, Alex Ksikes, S. Joshua Swamidass, Jonathan Chen, Liva Ralaivola, and Pierre Baldi. One- to Four- Dimensional Kernels for Small Molecules and Predictive Regression of Physical, Chemical, and Biological Properties, Journal of Chemical Information and Modeling (JCIM), 2006.
Rich Caruana, Alex Niculescu, Geoff Crew and Alex Ksikes, Ensemble Selection from Libraries of Models,
Proceedings of the 21st International Conference on Machine Learning (ICML),
2004. |
|||
| SOURCE CODE |
Implementation of some Machine Learning algorithms with source code ... A java implementation of a neural net. Solving the 8 queens problem with a genetic algorithm (from CS478). Implementation of the k-means algorithm using MDL for model selection. An experiment with evolving artificial neural networks. An implementation of the Gibbs sampling algorithm to detect motifs in sequences (from CS478). Source code of Shotgun aka Ensemble Selection. An implementation of greedy agglomerative clustering with basic graphing. Quick and dirty k-nearest neighbor. Even more projects with source code on my github page |
|||
| Transcripts and references available upon request | ||||