Azure Worldwide Data Centers as of July 30, 2017

On August 17th, I presented Azure ML: From Novice to Predicting the Future in 1 Hour to the IndySA User Group. I usually try to post before giving a talk but time didn't allow for that, so I'm instead posting this information after-the-fact. As I promised in my talk, I'm providing a bunch of links and regurgitating some of the information in my talk here.

I'm also going to dual-purpose this blog post. There should be some answers to some dumb questions in here that I wished I had known before I got started. If you want me to add anything else, just let me know in the comments and I'll gladly add more to the FAQ below.

Azure ML: From Novice to Predicting the Future in 1 Hour

Summary:

Artificial Intelligence, Machine Learning, Deep Learning, Data Science. These are all buzzwords that most developers are aware of and even have some level of understanding. However, these are also areas that most developers aren't very knowledgeable in, and are inexperienced with.

Join us this month as we cover an incredibly brief primer on these before jumping into the deep end with Azure ML, the Platform-as-a-Service Machine Learning offering in Microsoft Azure. This presentation assumes some basic web and database knowledge, but does not assume any prior Azure or Machine Learning experience.

This talk should give you everything you need to know to go off on your own and create your own Machine Learning models in Azure quickly and with impressive levels of accuracy. We will not, however, get deep into Data Science at all. In fact, you will probably leave this talk with more questions than you had coming into the talk! Ah, such is technology as you learn more and more...

Key Info to Getting Started Quickly with Easy Answers

Here's some info to help get you started. Whether you want to call this an FAQ or Answers to Dumb Questions, hopefully this will save you some of the many hours I spent learning these things the hard way!

Q: How much does Azure ML Cost?

Well, that depends. You can get to over $10k per month. But the good news is that you can get started for free and the cheapest paid offering costs ~$110 per month. While you can find full pricing details here, there are two main things that you pay for:

  1. Azure ML Studio - this is the web app that you create your Azure ML experiments in and where you can deploy them from. This is free for the free tier or $10/mo for the paid tier.
  2. Azure ML Hosting - this is where your trained models live and you have to do this if you want to actually consume your trained models. This has a free tier with no SLAs and the paid tiers start at $100 per month.

Q: What are Workspaces, Experiments, and Projects?

Don't worry too much about the nuances of these things initially. To get started, create a Machine Learning Workspace, which will have you create the pre-requisite Storage Account and Web Service Plan. The Workspace is free but these other two cost money. The Storage Account is dirt cheap (less than a nickel per gig) but the Web Service Plan is the thing that can cost $100/mo. You'll see this when you create it. Once these things are created, you mostly don't care too much about them other than just for organization purposes as you'll then mostly care about your Experiments that will be inside of this one Workspace and hosted on this one Web Service Plan.

An Experiment in Azure ML is essentially the workflow that you build and create. Each ML workflow is a single Experiment. You can easily create copies of the Experiments if you want to do some crazy changes but also keep a prior copy.

Q: What are all of those widgets you used?

The main widgets I was using were:

  • Import Data
  • Split Data
  • Boosted Decision Tree Regression
  • Train Model or Tune Model Hyperparameters (the latter trains the model with multiple configurations to find the best configuration)
  • Score Model
  • Evaluate Model

Q: What algorithms should I use?

Always start out with either the Boosted Decision Tree Regression (if you're trying to predict values) or the Boosted Decision Tree Classification if you're trying to identify classifications based on parameters. Why? Because they're very good but they're also very fast. Models using these algorithms can train over a few million records in a matter of just a few minutes but some of the slower algorithms can take hours to train of much less data.

Q: What should my model look like?

A minimal model might look like this (click for high-res):
http://res.cloudinary.com/jax/image/upload/v1503337028/MinimalAzureMLStudioModel_zey3xo.png
Azure ML Experiment

A model that would compare two different algorithms, utilizing the Tune Model Hyperparameters widget, might look like this (click for high-res):
Azure ML Experiment

Q: The limitations are 100 modules in an experiment for the free tier but these diagrams you have shared are so much smaller. 100 sounds crazy large. What gives?

Well, I did a lot of things with data to prepare it before I showed the demo in order to minimize the workflow but also to shorten the length of my talk, since it was already longer than I wanted. Additionally, I baked a lot of data customization into my SQL query in my Import Data module.

However, sometimes these aren't possible or aren't enough. You might need to bring in other widgets for more data customization. Consider some of the following scenarios where you might want to bring in several modules to tweak data:

  • Some records have missing values. You might then decide to take an average over the entire set of data, or a targeted subset, and use that average as a value for this particular record. This could use a few modules to accomplish.
  • Some records may have inconsistent data (Indiana, IN, IND, etc). You might then to use a few modules to normalize the data by using specific lookups.
  • All of your data might not come from the same data source. You might want to pull flight data in from one database, some weather data from another database database, some special events calendars from an Excel file, etc. and ultimately clean and join all of this data together before it ever reaches your model. As this grows, your Experiments can become quite large pretty quickly.
  • Data post-processing using either logic or other things to further enrich your data.
  • You want to setup additional web service inputs/outputs for things like automated retraining.
  • Custom modules for anything from custom R scripts tweaking data to custom algorithms, because you're just that amazing (at which point, I'm honored that you're even reading this!).

Q: I created experients but I can't find them now!

You probably have multiple Workspaces and are in the wrong one. From the Studio portal, there's a Workspace selector near the top-right of the screen where you can select other Workspaces by Region. Unfortunately, you have to first pick the correct region. I wish they'd just list them all here (perhaps only if you have fewer than 10 Workspaces, but that would cover the far majority of people).

Q: Okay, I trained a model. Now what?

  1. Get your experiment to the point where it has only one Algorithm (don't use the "dual architecture" comparing two algorithms).
  2. Use the Visualize feature off of your Score Model widget to make sure the training seems good at a glance. Optionally use the Visualize feature of the Evaluate Model module to view your metrics.
  3. Under the Set Up Web Service menu item near the bottom of the screen, select Predictive Web Service [Recommended]. This will transform your experiment and create a second tab in your workflow called "Predictive experiment".
  4. Run this newly-transformed experiment.
  5. Under the Deploy Web Service menu item near the bottom of the screen, select Deploy Web Service [New] Preview, and ensure your browser doesn't block a popup window. NOTE: This step and following steps might differ slightly for the Free tier. I'm documenting the paid tier. You should be able to figure it out, though.
  6. Give the new web service a name and select your Web Service Plan you created with your Workspace. Click Deploy.
  7. On the resulting page, click on the Test Web Service link to see the simple Web UI that allows you to throw data at the trained model, or click on the Use Web Service link to get your web service's endpoint information, sample code, and API Key.
  8. Profit.

Resources