The installer creates an Azure ML work space containing three experiments: 

  1. The first experiment retrains and evaluates the predictive model.
  2. The second experiment scores opportunities using the trained predictive model.
  3. The third experiment is for produces prescriptive insights using feature ablation on top of the trained predictive model. 


The input to the predictive model consists of data from the opportunity as well as related information from the original lead, associated customer account and products.

We handle the data differently depending on its type (e.g. numeric, categorical, or text). First, we clean the data to fill in reasonable defaults for any missing values. For example:

  • When quantities are missing, we replace them with zeros.
  • When categories are missing, we replace them with a special "missing" category.

We then Featurize the data in the following manner:

  • We encode categorical variables with 1-hot encodings.
  • We encode text variables using a bag of words, and then hashing.

The "hashing trick" limits the feature dimension, which reduces over-fitting, and makes the problem more tractable for the learning algorithm.

Predictive Model Algorithm

We train a two-class boosted decision tree on the input features to predict if an opportunity will be won or lost.

Prescriptive Model Algorithm

An algorithm called "feature ablation" runs on top of the predictive model to give further insights to the factors which influenced the score.
Feature ablation refers to selectively removing sets of features, and determining the impact on the score.

  1. First, we score the opportunity using the entire set of features, creating a base score.
  2. Next, the algorithm iterates through each pre-defined group of features.
  3. For each feature group, the algorithm creates a new opportunity, which is the same as the original, but all features in the group have been replaced with the default value.
  4. We then score this new opportunity using the same original model, producing a new score.
  5. We then compare the new score to the base score.
    • If the new score is lower than the base score, that means the features which were removed were positively contributing to the score by the magnitude of the difference between the new score and the base score.
    • Inversely, if the new score is higher than the base score, the features were negative contributors.
  6. Once all feature groups have been tested independently, we rank the positive and negative influences based on the magnitude of the score differences.


Dynamics CRM captures a myriad of information about each opportunity and all related entities (accounts, sales people, partners, etc.). We were challenged to distill and pare down the data to predict within a limited compute budget, while preserving enough information to maintain accuracy.

DNN Model for vast data 

In order to build the initial version of this model, leading researchers and developers from Microsoft Research (MSR) built a deep neural network (DNN) model using state of the art modeling tools and techniques. Deep neural networks are well suited for consuming the vast amount of diverse data and automatically distilling features that capture relationships between related entities. We then trained this network on the Microsoft's own sales data, and deployed it internally as a proof of concept, where we demonstrated the utility of opportunity scoring with our own sales organization.

Simplification and Operationalization in AML

Following the internal pilot, MSR partnered with Dynamics CRM and Azure ML to scale the model for use by external customers. In doing so, data scientists from Azure ML used techniques to filter through the diverse input data to find the most salient feature set. These techniques included ablation metrics as described above, as well as computing metrics over individual features and subsets of features. Once the feature set was reduced, the team was able to leverage a boosted tree model and tune it to achieve similar performance to the original DNN. The boosted tree was chosen for the final model because it is able to achieve a faster turnaround when retraining.

Last edited Jun 3, 2016 at 6:59 PM by prashdesh, version 15