Overview
In today's smartphone world, mobile games are able to attract large audience resulting large number of game studios developing mobile games.

People have large variety of games to play resulting higher churn rate and revenue loss.
Problem Statement
Predict whether a particular user will abandon the game or not.
Our Approach
To solve the problem, we use IntelliHub . Lets see how IntelliHub works:

- First user needs to uploads the train as well as test data.
- It is stored in the cloud.
- Data from cloud then goes to PhoenixML (a Machine Learning Module in IntelliHub ). In PhoenixML, you have 4 ML Libraries (Weka, H2O, TensorFlow & Scikit-learn) to choose from.
- Using PhoenixML, you train a model.
- You can evaluate using RMSE, accuracy and other different metrics.
- You can then use the above model for prediction.
Data Definition
The data set contains information about 2000 players. The variables in the data set are:
- User_id: Player unique identification number
- Achievements: Number of times a player was rewarded for successfully completing a level collecting all possible coin
- Challenges: Number of challenges completed
- No_of_sessions: Number of times a particular player opened the game
- Session_length: Total time a player played the game (in minutes)
- Coin_earned: Coins collected while playing a game
- Coin_spent: Coins spent either to buy something or get extra life
- Player_activity.Bal:Either player is churned or is active (Label)
Studio gives the flexibility of choosing features upto 20 columns.
Model Building
- We want to create a model in which it can predict either a particular player will churn in the near future or not.
- This can help to run player retention campaign for that particular player.
- We will be building classification models using two algorithms: NaiveBayesMultinomial & RandomForest on IntelliHub .
We will evaluate & compare each model and accordingly choose the best model for our test dataset.
While doing our modeling with PhoenixML we will be using some terminologies:
- Training Data : The data you use for training your model. It contains all the information you have collected about the problem statement.
- Test Data : The data you use for testing the model. You can make predictions on this data.
- Features : These are all the columns in your dataset which you use for training your model.
- Class/Label : This is the column which identifies the particular record and the one you want to predict.
- Accuracy : This is the percentage of data correctly predicted when you apply the model to your data.
First we need to create an App from console and enable API for PhoenixML.In Studio, we can build model in two ways:
- Using SDK (For Developer)
- Using Interface (For Non-Developer)
Using SDK
If you want to develop a model using SDK, just copy API key from IntelliHub Console.

Connect To Studio
Description
You can access the services provided by enabling API for PhoenixML. IntelliHub provides StudioClient where you have to pass your APP KEY as an argument.Code
import studio
c = studio.StudioClient("YOUR API KEY")
import com.spotflock.StudioClient;
StudioClient c = new StudioClient("YOUR API KEY");
Upload Train And Test Files
Description
As Studio is a cloud platform, It stores Train and Test Files remotely. File upload API will return file storage locations from Cloud Storage in response.
Upload Train File
train_file_store_response = c.store("path/to/train/file")
train_data = train_file_store_response["fileUrl"]
JSONObject train_file_store_response = c.store("path/to/test/file");
JSONObject train_data= train_file_store_response("fileUrl");
System.out.println(train_data.toString());
train_data file url
'/spotflock-studio-prod/xxxxx@xxxxxxxx.com/1551936734455-player_train.csv'
Upload Test File
test_file_store_response = c.store("path/to/test/file")
test_data = test_file_store_response["fileUrl"]
JSONObject test_file_store_response = c.store("path/to/test/file");
JSONObject test_data = test_file_store_response("fileUrl");
System.out.println(test_data.toString());
test_data file url
'/spotflock-studio-prod/xxxxx@xxxxxxx.com/1551936725437-player_test.csv'
Classification Model
Description
This API would enable you to train a Classification model. The model takes some time to be trained and thus the job status has to be checked. Once the job is completed, the job output API would give you the model info.
Arguments
lib | Library for training the model. Currently we are supporting spotflock and weka libraries. |
service | Valid parameter values are classification and regression. |
model_name | Model name and with this name model will be saved. |
algorithm | Algorithm by which model will be trained. |
dataset_url | Train dataset file location in Spotflock storage. |
label | Label of the column in train dataset file. |
train_percentage | % of data will be used for training and model will be tested against remaining % of data. |
features | Column names list which is used to train regression model. |
save_model | If True, model will be saved. |
Code
train_response = c.train("classification", "RandomForest", train_data, "player_activity",
["Achievements","Challenges","Session_length","No_of_sessions","Coin_earned","Coin_spent"],
"Player Churn Model - RandomForest","weka", 80, True)
train_response
{
'code': 200,
'data': {'jobId': 395,
'appId': 1555904646616,
'name': 'weka_classification_train',
'library': 'weka',
'service': 'Classification',
'task': 'TRAIN',
'state': 'RUN',
'startTime': '2019-04-25T11:26:33.864+0000',
'endTime': None,
'request': {'library': 'weka',
'config': {'datasetUrl': '/spotflock-studio/xxxxxxx@gmail.com/1556191223344-Player_Log.csv',
'algorithm': 'RandomForest',
'saveModel': True,
'label': 'player_activity',
'features': ['Achievements',
'Challenges',
'Session_length',
'No_of_sessions',
'Coin_earned',
'Coin_spent'],
'name': 'Player Churn Model - RandomForest',
'trainPercentage': 70,
'params': {}}}}}
Similarly, we will build another model using NaiveBayesMultinomial algorithm.
Code
train_response_NBM = c.train("classification", "NaiveBayesMultinomial", train_data, "Player_activity",
["Achievements","Challenges","Session_length","No_of_sessions","Coin_earned",
"Coin_spent"],"Player Churn Model - NaiveBayesMultinomial","weka", 80, True)
train_response_NBM
{
'code': 200,
'data': {'jobId': 396,
'appId': 1555904646616,
'name': 'weka_classification_train',
'library': 'weka',
'service': 'Classification',
'task': 'TRAIN',
'state': 'RUN',
'startTime': '2019-04-25T11:26:59.094+0000',
'endTime': None,
'request': {'library': 'weka',
'config': {'datasetUrl': '/spotflock-studio/xxxxxxx@gmail.com/1556191223344-Player_Log.csv',
'algorithm': 'NaiveBayesMultinomial',
'saveModel': True,
'label': 'player_activity',
'features': ['Achievements',
'Challenges',
'Session_length',
'No_of_sessions',
'Coin_earned',
'Coin_spent'],
'name': 'Player Churn Model - NaiveBayesMultinomial',
'trainPercentage': 80,
'params': {}}}}}
lib | Library for training the model. Currently we are supporting spotflock and weka libraries. |
service | Valid parameter values are classification and regression. |
modelName | Model name and with this name model will be saved. |
algorithm | Algorithm by which model will be trained. |
datasetUrl | Train dataset file location in Spotflock storage. |
label | Label of the column in train dataset file. |
trainPercentage | % of data will be used for training and model will be tested against remaining % of data. |
features | Column names list which is used to train regression model. |
saveModel | If True, model will be saved. |
Code
JSONArray features = new JSONArray();
features.put("Achievements");
features.put("Challenges");
features.put("Session_length");
features.put("No_of_sessions");
features.put("Coin_earned");
features.put("Coin_spent");
JSONObject params = new JSONObject();
params.put("lib","weka");
params.put("saveModel",true);
params.put("trainPercentage",80);
params.put("modelName","Player Churn Model - RandomForest");
String response = c.train("classification","RandomForest", trainData,
"Player_activity", features, params);
JSONObject trainResponse = new JSONObject(response);
trainResponse
{
"code": 200,
"data":
{
"jobId": 444,
"appId": 1555944250593,
"name": "weka_classification_train",
"library": "weka",
"service": "Classification",
"task": "TRAIN",
"state": "RUN",
"startTime": "2019-04-29T06:52:47.976+0000",
"endTime": null,
"request": {
"library": "weka",
"config": {
"datasetUrl": "/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv",
"algorithm": "Logistic",
"saveModel": true,
"label": "Player_activity",
"features": ["Achievements",
"Challenges",
"Session_length",
"No_of_sessions",
"Coin_earned",
"Coin_spent"],
"name": "Player Churn Model - RandomForest",
"trainPercentage": 80,
"params": {}
}
}
}
}
Similarly, we will build another model using NaiveBayesMultinomial algorithm.
Code
JSONArray features = new JSONArray();
features.put("CurrentLoanAmount");
features.put("Credit_Score");
features.put("MonthlyDebt");
features.put("Yr_Credit_His");
JSONObject params = new JSONObject();
params.put("lib","weka");
params.put("saveModel",true);
params.put("trainPercentage",80);
params.put("modelName","Player Churn Model - NaiveBayesMultinomial");
String response = c.train("classification","NaiveBayesMultinomial", trainData,
"Player_activity", features, params);
JSONObject trainResponseNBM = new JSONObject(response);
trainResponseNBM
{
"code": 200,
"data":
{
"jobId": 445,
"appId": 1555944250593,
"name": "weka_classification_train",
"library": "weka",
"service": "Classification",
"task": "TRAIN",
"state": "RUN",
"startTime": "2019-04-29T06:52:47.976+0000",
"endTime": null,
"request": {
"library": "weka",
"config": {
"datasetUrl": "/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv",
"algorithm": "NaiveBayesMultinomial",
"saveModel": true,
"label": "Player_activity",
"features": ["Achievements",
"Challenges",
"Session_length",
"No_of_sessions",
"Coin_earned",
"Coin_spent"],
"name": "Player Churn Model - NaiveBayesMultinomial",
"trainPercentage": 80,
"params": {}
}
}
}
}
Get Train Job Status
Description
The train/predict jobs take some amount of time to be completed and so their status can be checked with this API.Code
train_job_status_response = c.job_status(train_response["data"]["jobId"])
train_job_status_response
{
"jobId":438,
"appId":1555904646616,
"Name":"weka_classification_train",
"Library":"weka",
"service":"Classification",
"task":"TRAIN",
"state":"FINISH",
"startTime":"2019-04-25T11:26:33.864+0000",
"endTime":"2019-04-25T11:26:38.689+0000"
}
train_job_status_response_NBM = c.job_status(train_response_NBM["data"]["jobId"])
train_job_status_response_NBM
{
"jobId":439,
"appId":1555944250593,
"name":"weka_classification_train",
"library":"weka",
"service":"classification",
"task":"TRAIN",
"state":"FINISH",
"startTime":"2019-04-29T05:55:00.499+0000",
"endTime":"2019-04-29T05:55:02.234+0000"
}
JSONObject trainJobStatusResponse = c.jobStatus(trainResponse.getJSONObject("data")
.get("jobId"));
System.out.println(trainJobStatusResponse.toString());
trainJobStatusResponse
{
"jobId":438,
"appId":1555944250593,
"name":"weka_classification_train",
"library":"weka",
"service":"classification",
"task":"TRAIN",
"state":"FINISH",
"startTime":"2019-04-29T05:55:00.499+0000",
"endTime":"2019-04-29T05:55:02.234+0000"
}
JSONObject trainJobStatusResponseNBM = c.jobStatus(trainResponseNBM.getAsJsonObject("data")
.get("jobId"));
System.out.println(trainJobStatusResponse.toString());
trainJobStatusResponseNBM
{
"jobId":438,
"appId":1555944250593,
"name":"weka_classification_train",
"library":"weka",
"service":"classification",
"task":"TRAIN",
"state":"FINISH",
"startTime":"2019-04-29T05:55:00.499+0000",
"endTime":"2019-04-29T05:55:02.234+0000"
}
Get Train Job Output
Description
Once the job status is completed, the job output can be retrieved from this API.
Code
train_job_output_response = c.job_output(train_response["data"]["jobId"])
train_job_output_response
{'id': 343,
'jobId': 395,
'output': {'eval': {'kappa': 0.9013287725299303,
'recall': {'Active': 0.9935344827586207, 'Churned': 0.875},
'correct': 580.0,
'accuracy': 96.66666666666667,
'rocCurve': {'values': [[1.0, 1.0],
[0.9926, 1.0],
[0.9853, 1.0],...},
'errorRate': 0.03333333333333333,
'inCorrect': 20.0,
'precision': {'Active': 0.9644351464435147, 'Churned': 0.9754098360655737},
'areaUnderPRC': {'Active': 0.9986191814630054,
'Churned': 0.9834324571790051},
'areaUnderROC': {'Active': 0.9959115111561866,
'Churned': 0.9959115111561866},
'priorEntropy': 0.7735670912290766,
'confusionMatrix': [[119.0, 17.0], [3.0, 461.0]],
'numTrueNegatives': {'Active': 119.0, 'Churned': 461.0},
'numTruePositives': {'Active': 461.0, 'Churned': 119.0},
'trueNegativeRate': {'Active': 0.875, 'Churned': 0.9935344827586207},
'truePositiveRate': {'Active': 0.9935344827586207, 'Churned': 0.875},
'falseNegativeRate': {'Active': 0.00646551724137931, 'Churned': 0.125},
'falsePositiveRate': {'Active': 0.125, 'Churned': 0.00646551724137931},
'numFalseNegatives': {'Active': 3.0, 'Churned': 17.0},
'numFalsePositives': {'Active': 17.0, 'Churned': 3.0},
'pearsonCorrelation': {'Challenges': 0.2727450704708272,
'Coin_spent': 0.28607374154837806,
'Coin_earned': 0.27218790086896255,
'Achievements': 0.18328111059683977,
'No_of_sessions': 0.19543025560325186,
'Session_length': 0.27076482505618493},
'confusionMatrixHeaders': ['Churned', 'Active'],
'mathewsCorrelationCoefficient': {'Active': 0.9034864557683286,
'Churned': 0.9034864557683286}},
'modelUrl': '/spotflock-studio/22/1556191223344-Player_Churn_Model_-_RandomForest_3156348600223223355.mdl'}}
train_job_output_response_NBM = c.job_output(train_response_NBM["data"]["jobId"])
train_job_output_response_NBM
{'id': 343,
'jobId': 395,
'output': {'eval': {'kappa': -0.04213287725299303,
'recall': {'Active': 0.9935344827586207, 'Churned': 0.875},
'correct': 580.0,
'accuracy': 56.17666666666667,
'rocCurve': {'values': [[1.0, 1.0],
[0.9926, 1.0],
[0.9853, 1.0],...},
'errorRate': 0.438333333333333333,
'inCorrect': 20.0,
'precision': {'Active': 0.9644351464435147, 'Churned': 0.9754098360655737},
'areaUnderPRC': {'Active': 0.9986191814630054,
'Churned': 0.9834324571790051},
'areaUnderROC': {'Active': 0.9959115111561866,
'Churned': 0.9959115111561866},
'priorEntropy': 0.7875670912290766,
'confusionMatrix': [[119.0, 17.0], [3.0, 461.0]],
'numTrueNegatives': {'Active': 119.0, 'Churned': 461.0},
'numTruePositives': {'Active': 461.0, 'Churned': 119.0},
'trueNegativeRate': {'Active': 0.875, 'Churned': 0.9935344827586207},
'truePositiveRate': {'Active': 0.9935344827586207, 'Churned': 0.875},
'falseNegativeRate': {'Active': 0.00646551724137931, 'Churned': 0.125},
'falsePositiveRate': {'Active': 0.125, 'Churned': 0.00646551724137931},
'numFalseNegatives': {'Active': 3.0, 'Churned': 17.0},
'numFalsePositives': {'Active': 17.0, 'Churned': 3.0},
'pearsonCorrelation': {'Challenges': 0.2727450704708272,
'Coin_spent': 0.28607374154837806,
'Coin_earned': 0.27218790086896255,
'Achievements': 0.18328111059683977,
'No_of_sessions': 0.19543025560325186,
'Session_length': 0.27076482505618493},
'confusionMatrixHeaders': ['Churned', 'Active'],
'mathewsCorrelationCoefficient': {'Active': 0.9034864557683286,
'Churned': 0.9034864557683286}},
'modelUrl': '/spotflock-studio/22/1556191223344-Player_Churn_Model_-_RandomForest_3156348600223223344.mdl'}}
Code
JSONObject trainJobOutputResponse = c.jobOutput(trainResponse.getJSONObject("data")
.get("jobId"));
System.out.println(trainJobOutputResponse.toString());
trainJobOutputResponse
{"id": 343,
"jobId": 395,
"output": {"eval": {"kappa": -0.04213287725299303,
"recall": {"Active": 0.9935344827586207, "Churned": 0.875},
"correct": 580.0,
"accuracy": 56.17666666666667,
"rocCurve": {"values": [[1.0, 1.0],
[0.9926, 1.0],
[0.9853, 1.0],...},
"errorRate": 0.438333333333333333,
"inCorrect": 20.0,
"precision": {"Active": 0.9644351464435147, "Churned": 0.9754098360655737},
"areaUnderPRC": {"Active": 0.9986191814630054,
"Churned": 0.9834324571790051},
"areaUnderROC": {"Active": 0.9959115111561866,
"Churned": 0.9959115111561866},
"priorEntropy": 0.7875670912290766,
"confusionMatrix": [[119.0, 17.0], [3.0, 461.0]],
"numTrueNegatives": {"Active": 119.0, "Churned": 461.0},
"numTruePositives": {"Active": 461.0, "Churned": 119.0},
"trueNegativeRate": {"Active": 0.875, "Churned": 0.9935344827586207},
"truePositiveRate": {"Active": 0.9935344827586207, "Churned": 0.875},
"falseNegativeRate": {"Active": 0.00646551724137931, "Churned": 0.125},
"falsePositiveRate": {"Active": 0.125, "Churned": 0.00646551724137931},
"numFalseNegatives": {"Active": 3.0, "Churned": 17.0},
"numFalsePositives": {"Active": 17.0, "Churned": 3.0},
"pearsonCorrelation": {"Challenges": 0.2727450704708272,
"Coin_spent": 0.28607374154837806,
"Coin_earned": 0.27218790086896255,
"Achievements": 0.18328111059683977,
"No_of_sessions": 0.19543025560325186,
"Session_length": 0.27076482505618493},
"confusionMatrixHeaders": ["Churned", "Active"],
"mathewsCorrelationCoefficient": {"Active": 0.9034864557683286,
"Churned": 0.9034864557683286}},
"modelUrl": "/spotflock-studio/22/1556191223344-Player_Churn_Model_-_RandomForest_31563486002232234.mdl"}}
JSONObject trainJobOutputResponseNBM = c.jobOutput(trainResponseNBM.getJSONObject("data")
.get("jobId"));
System.out.println(trainJobOutputResponse.toString());
trainJobOutputResponseNBM
{"id": 343,
"jobId": 395,
"output": {"eval": {"kappa": -0.04213287725299303,
"recall": {"Active": 0.9935344827586207, "Churned": 0.875},
"correct": 580.0,
"accuracy": 56.17666666666667,
"rocCurve": {"values": [[1.0, 1.0],
[0.9926, 1.0],
[0.9853, 1.0],...},
"errorRate": 0.438333333333333333,
"inCorrect": 20.0,
"precision": {"Active": 0.9644351464435147, "Churned": 0.9754098360655737},
"areaUnderPRC": {"Active": 0.9986191814630054,
"Churned": 0.9834324571790051},
"areaUnderROC": {"Active": 0.9959115111561866,
"Churned": 0.9959115111561866},
"priorEntropy": 0.7875670912290766,
"confusionMatrix": [[119.0, 17.0], [3.0, 461.0]],
"numTrueNegatives": {"Active": 119.0, "Churned": 461.0},
"numTruePositives": {"Active": 461.0, "Churned": 119.0},
"trueNegativeRate": {"Active": 0.875, "Churned": 0.9935344827586207},
"truePositiveRate": {"Active": 0.9935344827586207, "Churned": 0.875},
"falseNegativeRate": {"Active": 0.00646551724137931, "Churned": 0.125},
"falsePositiveRate": {"Active": 0.125, "Churned": 0.00646551724137931},
"numFalseNegatives": {"Active": 3.0, "Churned": 17.0},
"numFalsePositives": {"Active": 17.0, "Churned": 3.0},
"pearsonCorrelation": {"Challenges": 0.2727450704708272,
"Coin_spent": 0.28607374154837806,
"Coin_earned": 0.27218790086896255,
"Achievements": 0.18328111059683977,
"No_of_sessions": 0.19543025560325186,
"Session_length": 0.27076482505618493},
"confusionMatrixHeaders": ["Churned", "Active"],
"mathewsCorrelationCoefficient": {"Active": 0.9034864557683286,
"Churned": 0.9034864557683286}},
"modelUrl": "/spotflock-studio/22/1556191223344-Player_Churn_Model_-_RandomForest_315634860022322334.mdl"}}
Evaluation Metrics
kappa | The Kappa statistic (or value) is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance). |
recall | Recall is also referred to as the true positive rate or sensitivity. |
correct | Correct is given by total number of correctly predicted values. |
accuracy | This is the ratio of the number of correct predictions out of all predictions made. |
rocCurve | ROC (Receiver Operating Characteristic) Curve is to examine the performance of a binary classifier, by creating a graph of the True Positives vs. False Positives. |
errorRate | Error Rate is the ratio of total number of incorrectly predicted instances to total number of instances. |
inCorrect | Incorrect is given by total number of incorrectly predicted values. |
precision | The precision for a class is the number of true positives divided by the total number of elements labeled as belonging to the positive class. |
areaUnderPRC | total area calculated under PRC (Precision Recall Curve) curve. |
areaUnderROC | total area calculated under ROC curve. |
priorEntropy | Entropy is a measure of the uncertainty associated with a random variable. |
numTrueNegatives | Total number of instances predicted correctly as Negative. |
numTruePositives | Total number of instances predicted correctly as Positive. |
trueNegativeRate | Ratio of number of True Negatives to total number of negatives. |
truePositiveRate | Ratio of number of True Positives to total number of positives. |
falseNegativeRate | Ratio of number of False Negatives to total number of negatives. |
falsePositiveRate | Ratio of number of False Positives to total number of positives. |
numFalseNegatives | Total number of instances where incorrectly predicted as Negatives instead of Positives. |
numFalsePositives | Total number of instances where incorrectly predicted as Positives instead of Negatives. |
pearsonCorrelation | Evaluates the worth of an attribute by measuring the pearsonCorrelation between it and the target. |
confusionMatrixHeaders | You can create confusion matrix in the order of confusion Matrix headers. |
correlationCoefficient | This parameter tells you how much true value and predicted value are related. It gives values between −1 and 1, where 0 is no relation, 1 is a very strong Linear Relation and −1 is an Inverse Linear Relation. |
mathewsCorrelationCoefficient | The Matthews Correlation Coefficient has a range of -1 to 1 where -1 indicates a completely wrong binary classifier while 1 indicates a completely correct binary classifier. |
Model Selection
We have built two models:
Model | Accuracy | Kappa | Error Rate | Prior Entropy |
---|---|---|---|---|
RandomForest | 96.83% | 0.911 | 0.032 | 0.806 |
NaiveBayesMultinomial | 56.17%% | -0.042 | 0.438 | 0.787 |
Based on the above result, we would select Logistic model for prediction.
Get Model Url
Description
After Train job is finished, you can get the model url.Code
model = train_job_output_response["output"]["modelUrl"]
String model = trainJobOutputResponse.getJSONObject("output").get("modelUrl");
By Printing model
'/spotflock-studio/22/1556191223344-Player_Churn_Model_-_RandomForest_3156348600223223355.mdl'
Predict on Test Data
Description
The below code is to predict on test data by passing the model url that was obtained from previous response.Code
predict_response = c.predict("classification", test_data, model,"weka")
predict_response
{'code': 200,
'data': {'jobId': 446,
'appId': 1555944250593,
'name': 'weka_classification_predict',
'library': 'weka',
'service': 'Classification',
'task': 'PREDICT',
'state': 'RUN',
'startTime': '2019-04-29T06:53:51.738+0000',
'endTime': None,
'request': {'library': 'weka',
'config': {'modelUrl': '/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl',
'params': {},
'datasetUrl': '/spotflock-studio/xxxxxx@spotflock.com/1556520761816-Test_Loan.csv'}}}}
params = new JSONObject();
params.put("lib","weka");
JSONObject predictResponse = c.predict("classification", testData, model, params);
predictResponse
{"code": 200,
"data": {"jobId": 446,
"appId": 1555944250593,
"name": "weka_classification_predict",
"library": "weka",
"service": "Classification",
"task": "PREDICT",
"state": "RUN",
"startTime": "2019-04-29T06:53:51.738+0000",
"endTime": null,
"request": {"library": "weka",
"config": {"modelUrl": "/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl",
"params": {},
"datasetUrl": "/spotflock-studio/xxxxxx@spotflock.com/1556520761816-Test_Loan.csv"}}}}
Get Prediction Job Status
Description
The train/predict jobs take some amount of time to be completed and so their status can be checked with this API.Code
predict_job_status_response = c.job_status(predict_response["data"]["jobId"])
predict_job_status_response
JSONObject predictJobStatusResponse = c.jobStatus(predictResponse.getJSONObject("data")
.get("jobId"));
predictJobStatusResponse
{
"jobId":439,
"appId":1555944250593,
"name":"weka_regression_predict",
"library":"weka",
"service":"Regression",
"task":"PREDICT",
"state":"FINISH",
"startTime":"2019-04-29T05:55:27.324+0000",
"endTime":"2019-04-29T05:55:33.962+0000"
}
Get Prediction Job Output
Description
Once the job status is completed, the job output can be retrieved from this API.Code
predict_job_output_response = c.job_output(predict_response["data"]["jobId"])
predict_job_output_response
{
"id": 173,
"jobId": 161,
"output": {
"reqId": 161,
"predFileUrl": "/spotflock-studio/22/1551864223344-prediction.csv"
}
}
JSONObject predictJobOutputResponse = c.jobOutput(predictResponse.getJSONObject("data")
.get("jobId"));
System.out.println(predictJobOutputResponse.toString());
predictJobOutputResponse
{
"id": 182,
"jobId": 437,
"output": {
"reqId": 211,
"predFileUrl": "/spotflock-studio-prod/22/1552024447138-prediction.csv"
}
}
Get Prediction File Url
Description
Once the Predict job is completed, get the prediction file url.Code
pred_file = predict_job_output_response['output']['predFileUrl']
String pred_file = predictJobOutputResponse.getJSONObject("output").get("predFileUrl")
pred_file
'/spotflock-studio-prod/22/1552024447138-prediction.csv'
Download Prediction File
Description
You can download the predicted file as csv by using below code.Code
prediction_response = c.download(pred_file)
import io
import pandas as pd
df = pd.read_csv(io.StringIO(prediction_response.text))
df.to_csv('pred_file.csv')
prediction_response
Achievements,Challenges,Session_length,No_of_sessions,Coin_earned,Coin_spent
6.0,44.0,98.0,50.0,43.0,32.0,Active,0.04644,0.95356
1.0,9.0,20.0,50.0,43.0,32.0,Active,0.164163,0.835837
0.0,1.0,0.0,50.0,43.0,32.0,Active,0.252699,0.747301
0.0,0.0,0.0,50.0,43.0,32.0,Active,0.23988,0.76012
...
...
JSONObject predictionResponse = c.download(pred_file);
FileWriter outputfile = new FileWriter(ENTER YOUR OUTPUT FILE PATH);
CSVWriter writer = new CSVWriter(outputfile);
writer.writeAll(prediction_response.toString());
writer.close();
predictionResponse
Achievements,Challenges,Session_length,No_of_sessions,Coin_earned,Coin_spent
6.0,44.0,98.0,50.0,43.0,32.0,Active,0.04644,0.95356
1.0,9.0,20.0,50.0,43.0,32.0,Active,0.164163,0.835837
0.0,1.0,0.0,50.0,43.0,32.0,Active,0.252699,0.747301
0.0,0.0,0.0,50.0,43.0,32.0,Active,0.23988,0.76012
...
...
Using Interface (Non-Developer)
-
Once we have created an App we need to go to the Phoenix ML.
-
Once we are at the Phoenix ML dashboard we can see that the App is enabled.
-
Before we can train our model, we need to upload the dataset in Dataset Management.
Navigate to Dataset ManagementUploading Train & Test dataset -
After we have uploaded the training data, we create a new model and need to configure it.
- First you need to name your model.
- How you want to split your data like train and validation..
- Type of Model ie Classification or Regression. Here we are using Classification
- We will select Random Forest as our algorithm.
- Now we have to select our training dataset which we uploaded earlier.
- Select features from the list which you want to use for building the model. Here we are using all features except Item_Identifier and Item_Outlet_Sales.
- Label is our target variable that is player activity.
- And then click Train.
-
After we come back to the Phoenix ML>Explore dashboard which has a list of all the models trained.
-
After the model gets trained if success it will show a green TRAINED flag and if it has failed the it gets represented with a red FAILED flag. Reason for model fail can be incorrect data type in the dataset like you label (Target variable) is categorical variable while using Linear Regression.
Same way we will create new model but this time we will use NaiveBayesMultinomial Algorithm and name the model ‘Player Churn Model - NaiveBayesMultinomial’.
Once the model gets successfully completed we can see the results i.e the Pearson’s Correlation graph, Accuracy, Kappa, Error rate , Prior Entropy and ROC curve for both the models.
RandomForest Model ResultNaiveBayesMultinomial Model Result -
Model Selection:
We have built two models:
Model Accuracy Kappa Error Rate Prior Entropy RandomForest 96.83% 0.911 0.032 0.806 NaiveBayesMultinomial 56.17%% -0.042 0.438 0.787 -
Predicting Test Data
To predict the player activity for the test data, we need to come back to the Phoenix ML>Explore dashboard which has a list of all the models trained. We can go to the predict option. And enter the model name and the test dataset we want to use.
We will select ‘Player Churn Model - RandomForest’.
-
After the successful compilation of the predictive model we will get the results. It will predict player activity as churn and active. You will also get churn probability and active probability. We can view the results in a separate window or download a csv file.
Prediction Result
Summary
- This model helps us in determining set of players who will probably churn.
- We can give this players certain incentives like coins so that they can buy extra lives, unlock level,etc.
- We can also explore characteristics of this probable churns and determine the purpose of churn. Accordingly we can make changes the experience for player retention.