Sunday, February 22, 2015

Machine Learning In Your Browser (Sort Of)

Azure Machine LearningMachine Learning

This month, Microsoft GA’d their offering into the cloud machine learning space, Azure Machine Learning. For those that have done data mining in analysis services, both the workflow and the mining models should be familiar. 

In A Breath

If you haven’t already been briefed, machine learning is often used as a way to do predicative analysis, based on historical data.

The workflow typically looks like:

  1. Get a data set where many traits may (or may not) relate to an outcome.
  2. Apply a genre of mining algorithm / model (e.g. Bayesian inference, neural nets, etc…).
  3. Train your model with some of your data (say 80%).
  4. QA your model with the rest of your data (e.g. remaining 20%).
  5. If you’re happy with #4, you can now query your model and ask questions like “given these inputs, what is the predicted output”.

Here’s a tactical example; in the given tutorial, the author is able to train a neural networks model to detect malignant vs. benign tumors with an accuracy of 96%. It does so by looking at traits on a data set (sample Clump Thickness, Uniformity of Size/Shape, Bland Chromatin, Bar Nuclei etc…) from of the datasets hosted by UC Irving specifically for machine learning.

That’s extremely accurate as far as predictive analysis goes. It’s worth mentioning that because of the data that was used to train the model, you could more accurately describe the model as “able to predict malignant vs. benign tumors for patients of Wisconsin during the late 1980s to early 1990s with an accuracy of 96%”.

That blurb about causation vs. correlation aside the model is still extremely accurate, especially if you continue to retrain them with current data.

Ubiquitous Machine Specialization

Speakers like Marco Annunziata (below) are quick to point out that machines today that benefit from good analytics “aren’t just intelligent, they are brilliant.”

The mining model that we talked about earlier benefits from the experience of a oncology lab technician who’s seen tens of thousands of samples. It’s also incredibly accessible (can be exposed over http), can scale near linearly (unlike our technician), and can be retrained on new data in a matter of hours.

Even if you don’t trust the computer to act as a specialist, they’re a great validation component offering prompts to the user, letting them know when they’re stepping outside of the norm (e.g. “this is normally malignant, you sure it’s benign?”).

Benefits for the Business

It usually takes a while for these kinds of innovations to permeate the office, and Its worth mentioning that other forms of accessible cloud machine learning like PredictionIO have been around since 2013.

But these technologies have never been more accessible. They’re now even more approachable with:

  1. No install footprint locally (no database engines, BI IDEs, etc…). You can also read that is “no IT involvement required”.
  2. No capital investment in infrastructure or licensing (why buy a BI stack when you can rent one). Licenses for Analysis Services used to be quite spendy, if you have less than 10GB of data it’s now free.
  3. Little to no knowledge in either statistics or programming required (although a little knowledge of both will help you go further).

These advances definitely benefit the startup and the small project efforts too. Capabilities that used to have huge price tags now have their costs tied only to their usages, allowing both software and product developers to experiment and prototype with machine learning to see where it can benefit their users.

If you haven’t already, walk through a machine learning tutorial to at least get a feel for what types capabilities are present.

Best,
Tyler