Is thinking like a search engine possible? (Yandex search personalisation –...
About a year ago, I participated in the Yandex search personalisation Kaggle competition. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was...
View ArticleLearning to rank for personalised search (Yandex Search Personalisation –...
This is the second and last post summarising my team’s solution for the Yandex search personalisation Kaggle competition. See the first post for a summary of the dataset, evaluation approach, and some...
View ArticleFirst steps in data science: author-aware sentiment analysis
People often ask me what’s the best way of becoming a data scientist. The way I got there was by first becoming a software engineer and then doing a PhD in what was essentially data science (before it...
View ArticleHopping on the deep learning bandwagon
I’ve been meaning to get into deep learning for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my...
View ArticleLearning about deep learning through album cover classification
In the past month, I’ve spent some time on my album cover classification project. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my...
View ArticleThe wonderful world of recommender systems
I recently gave a talk about recommender systems at the Data Science Sydney meetup (the slides are available here). This post roughly follows the outline of the talk, expanding on some of the key...
View ArticleMiscommunicating science: Simplistic models, nutritionism, and the art of...
I recently finished reading the book In Defense of Food: An Eater’s Manifesto by Michael Pollan. The book criticises nutritionism – the idea that one should eat according to the sum of measured...
View ArticleThe hardest parts of data science
Contrary to common belief, the hardest part of data science isn’t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable...
View ArticleThe joys of offline data collection
Many modern data scientists don’t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the Reef Life...
View ArticleWhy you should stop worrying about deep learning and deepen your...
Everywhere you go these days, you hear about deep learning’s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data...
View ArticleDiving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions
Background: I have previously written about the need for real insights that address the why behind events, not only the what and how. This was followed by a fairly popular post on causality, which was...
View ArticleCustomer lifetime value and the proliferation of misinformation on the internet
Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You’ve...
View Article