10. December 2015

Hamiltonian Monte Carlo

There have been a number of papers at NIPS that use Hamiltonian Monte Carlo, and I thought I’d share a Javascript implementation of the algorithm that I wrote a couple years ago. It can be a fairly opaque algorithm when described mathematically, and I found it really useful to see it working. It turns out it’s kind of mesmerizing.

more

24. June 2015

Get out on the ship!

When considering data analysis questions, I often think of this passage from “The Wizard War” by R.V. Jones, head of British scientific intelligence during World War II.

more

22. June 2015

A graduation speech

Since Cornell is such a big place, departments have individual graduation ceremonies where we can give students more individual recognition. I was recently invited by the Information Science students to give the faculty address for our department. Here’s a lightly edited transcript.

more

23. February 2015

Mallet past present and future

There was a conversation on Twitter about the current state of Mallet. My goal for Mallet is that it should do a few things very well. Future development will focus on making the process of using machine learning easier and more informative. Also, be sure to use the current GitHub version.

more

17. February 2015

Using phrases in Mallet topic models

Bag-of-words models are surprisingly powerful, but there are often cases where several words are really a single semantic unit. How we handle these terms can have a major impact on how well we can model a text corpus. Several years ago, while working on a project involving NIH grants and associated papers, I implemented some tools for combining multiple tokens into single tokens as a preprocessing step. In this post I’ll demonstrate how I identify and use multi-word terms in Mallet.

more