|CAREER GOAL||My goal is to work with passionate like-minded people and to build great products which would positively impact the world.|
|INTEREST||Text Mining, Information Retrieval, Exploratory Search, Applied Machine Learning, Big Data, Web Development, User Interfaces|
M.Eng. in Computer Science,
Cornell University, June 2003
B.Sc. in Computer Science,
Cornell University, June 2002
Dec 2015 - present
Nov 2013 - Dec 2015
Jan 2009 - Dec 2013
June 2009 - Sept 2009
Jan 2008 - May 2011
May 2006 - Sept 2006
June 2005 - Oct 2005
June 2004 - Sept 2004
Sept 2002 - Dec 2002
Mathematics: Multivariable Calculus, Analysis, Linear Algebra, Abstract Algebra, Algorithmic.
Programming: Preferred programming language is Python. I also like to program in C, Java and Scheme. Fluent in Unix tools. Other skills in HTML, CSS, PHP and SQL.
Language: French (native) and English (fluent)
Cloud Mining automatically builds cool faceted search interfaces for your data. Simply get your data in a specific format, provide a custom look and feel for each search result and Cloud Mining does the rest.
fSphinx is a layer on top of Sphinx to perform faceted search. fSphinx powers the underlying backend of Cloud Mining and works with SimSearch (see project below) in order to combine full text with item based search.
SimSearch is an item based retrieval engine with Bayesian Sets. Bayesian Sets is a new framework for information retrieval in which a query consists of a set of items which are examples of some concept. The result is a set of items which attempts to capture the example concept given by the query.
Biomed Search was the largest search engine to look up images specifically in the biomedical domain. The system searches within captions and referring texts to images. Over one millions images were indexed. The projects was created in 3 months of works. Lucene was used for retrieval and web.py for the web framework. All the back end work was programmed in python.
Here is a presentation of Biomed Search given at the University of Cambridge.
I also contributed to UC Berkeley's Biotext Search Engine. Search within captions and abstracts of biomedical images. More information on this publication.
The admin we used at Cambridge for the summer school. It let us browse through the fast amount of candidates, comment on each applicant and assign scores. Try a demo, how to use it, get the source code on github.
Chiefmall started as a project to index all contractors in the USA. The idea was to form the basis of a social network for service professionals. All active licensed contractors in California and in 8 other states were listed. Contractors would be able to claim their listing thereby building up their own website. Some screenshots of the account, company description, service locations, portfolio, adding a caption, get jobs or getting your license verified.
Random automatic quizzes using wikipedia. An interesting experiment released before the word mashup was first coined and which received over 50,000 hits on the first three days it launched. WikiTrivia is now open source.
Relevant machine learning courses I took at Cornell:COMS 478: Machine Learning with Golan Yona (A-)
COMS 482: Theory of Algorithms with Jon Kleinberg (A)
COMS 578: Empirical Method in Machine Learning with Rich Caruana (A+)
COMS 678: Advanced Topics in Machine Learning with Thorsten Joachims (A)
COMS 790: Independent Research with Rich Caruana (A+)
COMS 778: Topics in Machine Learning with Rich Caruana (A-)
COMS 750: Evolutionary Computation and Design Automation with Hod Lipson (A-)
Linear Algebra with Anil Nerode (A+), Multivariable Calculus (A), Groups and Geometry with Birgit Speh (A+), Mathematical Physics with Kusse (A-)
Chloé-Agathe Azencott, Alex Ksikes, S. Joshua Swamidass, Jonathan Chen, Liva Ralaivola, and Pierre Baldi. One- to Four- Dimensional Kernels for Small Molecules and Predictive Regression of Physical, Chemical, and Biological Properties, Journal of Chemical Information and Modeling (JCIM), 2006. [scholar] [pdf]
Rich Caruana, Alex Niculescu, Geoff Crew and Alex Ksikes, Ensemble Selection from Libraries of Models, Proceedings of the 21st International Conference on Machine Learning (ICML), 2004. [scholar] [ps] [pdf]
Implementation of some Machine Learning algorithms with source code ...
A java implementation of a neural net. Solving the 8 queens problem with a genetic algorithm (from CS478). Implementation of the k-means algorithm using MDL for model selection. An experiment with evolving artificial neural networks. An implementation of the Gibbs sampling algorithm to detect motifs in sequences (from CS478). Source code of Shotgun aka Ensemble Selection. An implementation of greedy agglomerative clustering with basic graphing. Quick and dirty k-nearest neighbor.
Even more projects with source code on my github page
|Transcripts and references available upon request|