data science

"Readings in Database Systems": wisdom from Michael Stonebraker

and two other guys--updated and free online.

As I tweeted last July, I always learn so much about both the past and future of database computing from recent Turing Award winner Michael Stonebraker. I recently learned that the latest edition of Readings in Database Systems, also known as the “Red Book,” is available for free online under a Creative Commons license—or at least the introductions to the readings are. With most of these being by Stonebraker, and quite up-to-date, I consider these 43 pages required reading for anyone…

Data wrangling, feature engineering, and dada

And surrealism, and impressionism...

In my data science glossary, the entry for data wrangling gives this example: “If you have 900,000 birthYear values of the format yyyy-mm-dd and 100,000 of the format mm/dd/yyyy and you write a Perl script to convert the latter to look like the former so that you can use them all together, you’re doing data wrangling.” Data wrangling isn’t always cleanup of messy data, but can also be more creative, downright fun work that qualifies as what machine learning people call…

My data science glossary

Complete with a dot org domain name.

Lately I’ve been studying up on the math and technology associated with data science because there are so many interesting things going on. Despite taking many notes, I found myself learning certain important terms, seeing them again later, and then thinking “What was that again? P-values? Huh?”