Overview


Nowadays, There are many risks related to bank loans, for the bank and for those who get the loans. Banks and other financial Institutions suffer huge amount of losses due to Non- Performing assets.

Due to this banks and other institutes have started to tighten the underwriting of the loans.


Problem Statement


Predict on the given parameters will the person will default the loan or not.


Our Approach


To solve the problem, we use INTELLIHUB . Lets see how INTELLIHUB works:

  • First user needs to uploads the train as well as test data.
  • It is stored in the cloud.
  • Data from cloud then goes to IntelliHubML (a Machine Learning Module in INTELLIHUB ). In IntelliHubML, you have 3 ML Libraries (Weka, H2O & Scikit) to choose from.
  • Using IntelliHubML, you train a model.
  • You can evaluate using RMSE, accuracy and other different metrics.
  • You can then use the above model for prediction.


Data Definition


The data set contains information about 4000 loan information. The variables in the data set are:

  • Loan_id: Unique Loan Id
  • Customer_Id: Unique Customer Id
  • CurrentLoanAmount: Amount loan has been asked for
  • Credit_Score: Credit score of the customer
  • Annual_Income: Annual Income of the customer
  • Monthly_Debt: Monthly debt that has to be paid
  • Yr_Credit_His: Year of credit history
  • Current_Cr.Bal: Current credit balance
  • Loan_Status: Loan Status (Label)

Intellihub gives the flexibility of choosing features upto 20 columns.



Model Building


  • We want to create a model in which it can predict either a particular customer will default the loan or not.
  • This will help us to find if we can underwrite the loan application of a particular customer or not.
  • There we will build classification model using two algorithms: NaiveBayesMultinomial & Logistic.
  • We will evaluate & compare each model and accordingly choose the best model for our test dataset.

We will evaluate & compare each model and accordingly choose the best model for our test dataset.

While doing our modeling with IntelliHubML we will be using some terminologies:

  • Training Data : The data you use for training your model. It contains all the information you have collected about the problem statement.
  • Test Data : The data you use for testing the model. You can make predictions on this data.
  • Features : These are all the columns in your dataset which you use for training your model.
  • Class/Label : This is the column which identifies the particular record and the one you want to predict.
  • Accuracy : This is the percentage of data correctly predicted when you apply the model to your data.

First we need to create an App from console and enable API for IntelliHubML.In Intellihub, we can build model:

  • Using SDK (For Developer)


Using SDK


If you want to develop a model using SDK, just copy API key from INTELLIHUB Console.




Connect To IntelliHub

Description

You can access the services provided by enabling API for IntelliHubML. INTELLIHUB provides IntellihubClient where you have to pass your APP KEY as an argument.

Code


import intellihub

c = intellihub.IntellihubClient("YOUR API KEY")
 

import com.spotflock.IntellihubClient;

IntellihubClient c = new IntellihubClient("YOUR API KEY");
 

Upload Train And Test Files

Description

As Intellihub is a cloud platform, It stores Train and Test Files remotely. File upload API will return file storage locations from Cloud Storage in response.

Upload Train File


train_file_store_response = c.store("path/to/train/file")

train_data = train_file_store_response["fileUrl"]
 

JSONObject train_file_store_response = c.store("path/to/test/file");
JSONObject  train_data= train_file_store_response("fileUrl");
System.out.println(train_data.toString());
 

train_data file url

'/spotflock-studio-prod/xxxxx@xxxxxxxx.com/1551936734455-loan_train.csv'

Upload Test File


test_file_store_response = c.store("path/to/test/file")

test_data = test_file_store_response["fileUrl"]
 

JSONObject test_file_store_response = c.store("path/to/test/file");
JSONObject test_data = test_file_store_response("fileUrl");
System.out.println(test_data.toString());
 

test_data file url

'/spotflock-studio-prod/xxxxx@xxxxxxx.com/1551936725437-loan_test.csv'

Classification Model

Description

This API would enable you to train a Classification model. The model takes some time to be trained and thus the job status has to be checked. Once the job is completed, the job output API would give you the model info.

Arguments

lib Library for training the model. Currently we are supporting spotflock and weka libraries.
service Valid parameter values are classification and regression.
model_name Model name and with this name model will be saved.
algorithm Algorithm by which model will be trained.
dataset_url Train dataset file location in Spotflock storage.
label Label of the column in train dataset file.
train_percentage % of data will be used for training and model will be tested against remaining % of data.
features Column names list which is used to train regression model.
save_model If True, model will be saved.

Code


train_response = c.train("classification", "Logistic", train_data, "Loan_Status",
["CurrentLoanAmount","CreditScore","MonthlyDebt”,"Yr_Credit_His"],
"Loan Model - Logistic Regression","weka", 80, True)

train_response

{
	'code': 200,
 	'data':
 	{
 		'jobId': 444,
  		'appId': 1555944250593,
  		'name': 'weka_classification_train',
  		'library': 'weka',
  		'service': 'Classification',
  		'task': 'TRAIN',
  		'state': 'RUN',
  		'startTime': '2019-04-29T06:52:47.976+0000',
  		'endTime': None,
  		'request': {
  		'library': 'weka',
   		'config': {
   			'datasetUrl': '/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv',
    		'algorithm': 'Logistic',
    		'saveModel': True,
    		'label': 'Loan Status',
    		'features': ['CurrentLoanAmount','Credit_Score','MonthlyDebt',
'Yr_Credit_His'], 'name': 'Loan Model - Logistic Regression', 'trainPercentage': 80, 'params': {} } } } }

Similarly, we will build another model using NaiveBayesMultinomial algorithm.

Code


train_response_NBM = c.train("classification", "NaiveBayesMultinomial", train_data, "Loan_Status",
["CurrentLoanAmount","CreditScore","MonthlyDebt”,"Yr_Credit_His"],
"Loan Model - NaiveBayesMultinomial","weka", 80, True)

train_response_NBM

{
	'code': 200,
 	'data':
 	{
 		'jobId': 445,
  		'appId': 1555944250593,
  		'name': 'weka_classification_train',
  		'library': 'weka',
  		'service': 'Classification',
  		'task': 'TRAIN',
  		'state': 'RUN',
  		'startTime': '2019-04-29T06:52:47.976+0000',
  		'endTime': None,
  		'request': {
  		'library': 'weka',
   		'config': {
   			'datasetUrl': '/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv',
    		'algorithm': 'NaiveBayesMultinomial',
    		'saveModel': True,
    		'label': 'Loan Status',
    		'features': ['CurrentLoanAmount','Credit_Score','MonthlyDebt',
'Yr_Credit_His'], 'name': 'Loan Model - NaiveBayesMultinomial', 'trainPercentage': 80, 'params': {} } } } }
lib Library for training the model. Currently we are supporting spotflock and weka libraries.
service Valid parameter values are classification and regression.
modelName Model name and with this name model will be saved.
algorithm Algorithm by which model will be trained.
datasetUrl Train dataset file location in Spotflock storage.
label Label of the column in train dataset file.
trainPercentage % of data will be used for training and model will be tested against remaining % of data.
features Column names list which is used to train regression model.
saveModel If True, model will be saved.

Code


JSONArray features = new JSONArray();
features.put("CurrentLoanAmount");
features.put("Credit_Score");
features.put("MonthlyDebt");
features.put("Yr_Credit_His");

JSONObject params = new JSONObject();
params.put("lib","weka");
params.put("saveModel",true);
params.put("trainPercentage",80);
params.put("modelName","Loan Model - Logistic Regression");

String response = c.train("classification","Logistic", trainData, 
"Loan_Status", features, params); JSONObject trainResponse = new JSONObject(response);

trainResponse


{
	"code": 200,
 	"data":
 	{
 		"jobId": 444,
  		"appId": 1555944250593,
  		"name": "weka_classification_train",
  		"library": "weka",
  		"service": "Classification",
  		"task": "TRAIN",
  		"state": "RUN",
  		"startTime": "2019-04-29T06:52:47.976+0000",
  		"endTime": null,
  		"request": {
  		"library": "weka",
   		"config": {
   			"datasetUrl": "/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv",
    		"algorithm": "Logistic",
    		"saveModel": true,
    		"label": "Loan_Status",
    		"features": ["CurrentLoanAmount","Credit_Score","MonthlyDebt",
"Yr_Credit_His"], "name": "Loan Model - Logistic Regression", "trainPercentage": 80, "params": {} } } } }

Similarly, we will build another model using NaiveBayesMultinomial algorithm.

Code


JSONArray features = new JSONArray();
features.put("CurrentLoanAmount");
features.put("Credit_Score");
features.put("MonthlyDebt");
features.put("Yr_Credit_His");

JSONObject params = new JSONObject();
params.put("lib","weka");
params.put("saveModel",true);
params.put("trainPercentage",80);
params.put("modelName","Loan Model - NaiveBayesMultinomial");

String response = c.train("classification","NaiveBayesMultinomial", trainData, 
"Loan_Status", features, params); JSONObject trainResponseNBM = new JSONObject(response);

trainResponseNBM


{
	"code": 200,
 	"data":
 	{
 		"jobId": 445,
  		"appId": 1555944250593,
  		"name": "weka_classification_train",
  		"library": "weka",
  		"service": "Classification",
  		"task": "TRAIN",
  		"state": "RUN",
  		"startTime": "2019-04-29T06:52:47.976+0000",
  		"endTime": null,
  		"request": {
  		"library": "weka",
   		"config": {
   			"datasetUrl": "/spotflock-studio/xxxxxx@xxxxxx.com/1556520753513-Train_Loan.csv",
    		"algorithm": "NaiveBayesMultinomial",
    		"saveModel": true,
    		"label": "Loan_Status",
    		"features": ["CurrentLoanAmount","Credit_Score","MonthlyDebt",
"Yr_Credit_His"], "name": "Loan Model - NaiveBayesMultinomial", "trainPercentage": 80, "params": {} } } } }

Get Train Job Status

Description

The train/predict jobs take some amount of time to be completed and so their status can be checked with this API.

Code


train_job_status_response = c.job_status(train_response["data"]["jobId"])
 

train_job_status_response


{
  "jobId":438,
  "appId":1555944250593,
  "name":"weka_classification_train",
  "library":"weka",
  "service":"classification",
  "task":"TRAIN",
  "state":"FINISH",
  "startTime":"2019-04-29T05:55:00.499+0000",
  "endTime":"2019-04-29T05:55:02.234+0000"
}

train_job_status_response_NBM = c.job_status(train_response_NBM["data"]["jobId"])
 

train_job_status_response_NBM


{
  "jobId":439,
  "appId":1555944250593,
  "name":"weka_classification_train",
  "library":"weka",
  "service":"classification",
  "task":"TRAIN",
  "state":"FINISH",
  "startTime":"2019-04-29T05:55:00.499+0000",
  "endTime":"2019-04-29T05:55:02.234+0000"
}

JSONObject trainJobStatusResponse = c.jobStatus(trainResponse.getJSONObject("data")
.get("jobId")); System.out.println(trainJobStatusResponse.toString());

trainJobStatusResponse


{
  "jobId":438,
  "appId":1555944250593,
  "name":"weka_classification_train",
  "library":"weka",
  "service":"classification",
  "task":"TRAIN",
  "state":"FINISH",
  "startTime":"2019-04-29T05:55:00.499+0000",
  "endTime":"2019-04-29T05:55:02.234+0000"
}

JSONObject trainJobStatusResponseNBM = c.jobStatus(trainResponseNBM.getAsJsonObject("data")
.get("jobId")); System.out.println(trainJobStatusResponse.toString());

trainJobStatusResponseNBM


{
  "jobId":438,
  "appId":1555944250593,
  "name":"weka_classification_train",
  "library":"weka",
  "service":"classification",
  "task":"TRAIN",
  "state":"FINISH",
  "startTime":"2019-04-29T05:55:00.499+0000",
  "endTime":"2019-04-29T05:55:02.234+0000"
}

Get Train Job Output

Description

Once the job status is completed, the job output can be retrieved from this API.

Code


train_job_output_response = c.job_output(train_response["data"]["jobId"])
 

train_job_output_response


{'id': 391,
 'jobId': 444,
 'output': {'eval': {'kappa': 0.27758932679036955,
   'recall': {'Fully Paid': 1.0, 'Charged Off': 0.2753623188405797},
   'correct': 512.0,
   'accuracy': 69.66011235955057,
   'rocCurve': {'values': [[1.0, 1.0],
     [0.9977, 1.0],
     [0.9954, 1.0],..}
   'errorRate': 0.3008988764044944,
   'inCorrect': 200.0,
   'precision': {'Fully Paid': 0.6855345911949685, 'Charged Off': 1.0},
   'areaUnderPRC': {'Fully Paid': 0.7558018683863663,
    'Charged Off': 0.6505613270802746},
   'areaUnderROC': {'Fully Paid': 0.6686278420422817,
    'Charged Off': 0.6686278420422817},
   'priorEntropy': 0.9733890610721709,
   'confusionMatrix': [[436.0, 0.0], [200.0, 76.0]],
   'numTrueNegatives': {'Fully Paid': 76.0, 'Charged Off': 436.0},
   'numTruePositives': {'Fully Paid': 436.0, 'Charged Off': 76.0},
   'trueNegativeRate': {'Fully Paid': 0.2753623188405797, 'Charged Off': 1.0},
   'truePositiveRate': {'Fully Paid': 1.0, 'Charged Off': 0.2753623188405797},
   'falseNegativeRate': {'Fully Paid': 0.0, 'Charged Off': 0.7246376811594203},
   'falsePositiveRate': {'Fully Paid': 0.7246376811594203, 'Charged Off': 0.0},
   'numFalseNegatives': {'Fully Paid': 0.0, 'Charged Off': 200.0},
   'numFalsePositives': {'Fully Paid': 200.0, 'Charged Off': 0.0},
   'pearsonCorrelation': {'Loan Status': 0.0,
    'Current Loan Amount': 0.2807766059248401},
   'confusionMatrixHeaders': ['Fully Paid', 'Charged Off'],
   'mathewsCorrelationCoefficient': {'Fully Paid': 0.4344771509261165,
    'Charged Off': 0.4344771509261165}},
  'modelUrl': '/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl'}}}}

                            

train_job_output_response_NBM = c.job_output(train_response_NBM["data"]["jobId"])
 

train_job_output_response_NBM


                                {'id': 391,
 'jobId': 444,
 'output': {'eval': {'kappa': 0.13858932679036955,
   'recall': {'Fully Paid': 1.0, 'Charged Off': 0.2753623188405797},
   'correct': 512.0,
   'accuracy': 48.46011235955057,
   'rocCurve': {'values': [[1.0, 1.0],
     [0.9977, 1.0],
     [0.9954, 1.0],..}
   'errorRate': 0.5208988764044944,
   'inCorrect': 200.0,
   'precision': {'Fully Paid': 0.6855345911949685, 'Charged Off': 1.0},
   'areaUnderPRC': {'Fully Paid': 0.7558018683863663,
    'Charged Off': 0.6505613270802746},
   'areaUnderROC': {'Fully Paid': 0.6686278420422817,
    'Charged Off': 0.6686278420422817},
   'priorEntropy': 0.9503890610721709,
   'confusionMatrix': [[436.0, 0.0], [200.0, 76.0]],
   'numTrueNegatives': {'Fully Paid': 76.0, 'Charged Off': 436.0},
   'numTruePositives': {'Fully Paid': 436.0, 'Charged Off': 76.0},
   'trueNegativeRate': {'Fully Paid': 0.2753623188405797, 'Charged Off': 1.0},
   'truePositiveRate': {'Fully Paid': 1.0, 'Charged Off': 0.2753623188405797},
   'falseNegativeRate': {'Fully Paid': 0.0, 'Charged Off': 0.7246376811594203},
   'falsePositiveRate': {'Fully Paid': 0.7246376811594203, 'Charged Off': 0.0},
   'numFalseNegatives': {'Fully Paid': 0.0, 'Charged Off': 200.0},
   'numFalsePositives': {'Fully Paid': 200.0, 'Charged Off': 0.0},
   'pearsonCorrelation': {'Loan Status': 0.0,
    'Current Loan Amount': 0.2807766059248401},
   'confusionMatrixHeaders': ['Fully Paid', 'Charged Off'],
   'mathewsCorrelationCoefficient': {'Fully Paid': 0.4344771509261165,
    'Charged Off': 0.4344771509261165}},
  'modelUrl': '/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl'}}

                            

Code


JSONObject trainJobOutputResponse = c.jobOutput(trainResponse.getJSONObject("data")
.get("jobId")); System.out.println(trainJobOutputResponse.toString());

trainJobOutputResponse


                                {"id": 391,
 "jobId": 444,
 "output": {"eval": {"kappa": 0.27758932679036955,
   "recall": {"Fully Paid": 1.0, "Charged Off": 0.2753623188405797},
   "correct": 512.0,
   "accuracy": 69.66011235955057,
   "rocCurve": {"values": [[1.0, 1.0],
     [0.9977, 1.0],
     [0.9954, 1.0],..}
   "errorRate": 0.3008988764044944,
   "inCorrect": 200.0,
   "precision": {"Fully Paid": 0.6855345911949685, "Charged Off": 1.0},
   "areaUnderPRC": {"Fully Paid": 0.7558018683863663,
    "Charged Off": 0.6505613270802746},
   "areaUnderROC": {"Fully Paid": 0.6686278420422817,
    "Charged Off": 0.6686278420422817},
   "priorEntropy": 0.9733890610721709,
   "confusionMatrix": [[436.0, 0.0], [200.0, 76.0]],
   "numTrueNegatives": {"Fully Paid": 76.0, "Charged Off": 436.0},
   "numTruePositives": {"Fully Paid": 436.0, "Charged Off": 76.0},
   "trueNegativeRate": {"Fully Paid": 0.2753623188405797, "Charged Off": 1.0},
   "truePositiveRate": {"Fully Paid": 1.0, "Charged Off": 0.2753623188405797},
   "falseNegativeRate": {"Fully Paid": 0.0, "Charged Off": 0.7246376811594203},
   "falsePositiveRate": {"Fully Paid": 0.7246376811594203, "Charged Off": 0.0},
   "numFalseNegatives": {"Fully Paid": 0.0, "Charged Off": 200.0},
   "numFalsePositives": {"Fully Paid": 200.0, "Charged Off": 0.0},
   "pearsonCorrelation": {"Loan Status": 0.0,
    "Current Loan Amount": 0.2807766059248401},
   "confusionMatrixHeaders": ["Fully Paid", "Charged Off"],
   "mathewsCorrelationCoefficient": {"Fully Paid": 0.4344771509261165,
    "Charged Off": 0.4344771509261165}},
  "modelUrl": "/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl"}}}}
                            

JSONObject trainJobOutputResponseNBM = c.jobOutput(trainResponseNBM.getJSONObject("data")
.get("jobId")); System.out.println(trainJobOutputResponse.toString());

trainJobOutputResponseNBM


                                {"id": 391,
 "jobId": 444,
 "output": {"eval": {"kappa": 0.13858932679036955,
   "recall": {"Fully Paid": 1.0, "Charged Off": 0.2753623188405797},
   "correct": 512.0,
   "accuracy": 48.46011235955057,
   "rocCurve": {"values": [[1.0, 1.0],
     [0.9977, 1.0],
     [0.9954, 1.0],..}
   "errorRate": 0.5208988764044944,
   "inCorrect": 200.0,
   "precision": {"Fully Paid": 0.6855345911949685, "Charged Off": 1.0},
   "areaUnderPRC": {"Fully Paid": 0.7558018683863663,
    "Charged Off": 0.6505613270802746},
   "areaUnderROC": {"Fully Paid": 0.6686278420422817,
    "Charged Off": 0.6686278420422817},
   "priorEntropy": 0.9503890610721709,
   "confusionMatrix": [[436.0, 0.0], [200.0, 76.0]],
   "numTrueNegatives": {"Fully Paid": 76.0, "Charged Off": 436.0},
   "numTruePositives": {"Fully Paid": 436.0, "Charged Off": 76.0},
   "trueNegativeRate": {"Fully Paid": 0.2753623188405797, "Charged Off": 1.0},
   "truePositiveRate": {"Fully Paid": 1.0, "Charged Off": 0.2753623188405797},
   "falseNegativeRate": {"Fully Paid": 0.0, "Charged Off": 0.7246376811594203},
   "falsePositiveRate": {"Fully Paid": 0.7246376811594203, "Charged Off": 0.0},
   "numFalseNegatives": {"Fully Paid": 0.0, "Charged Off": 200.0},
   "numFalsePositives": {"Fully Paid": 200.0, "Charged Off": 0.0},
   "pearsonCorrelation": {"Loan Status": 0.0,
    "Current Loan Amount": 0.2807766059248401},
   "confusionMatrixHeaders": ["Fully Paid", "Charged Off"],
   "mathewsCorrelationCoefficient": {"Fully Paid": 0.4344771509261165,
    "Charged Off": 0.4344771509261165}},
  "modelUrl": "/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl"}}}}
                            

Evaluation Metrics

kappa The Kappa statistic (or value) is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance).
recall Recall is also referred to as the true positive rate or sensitivity.
correct Correct is given by total number of correctly predicted values.
accuracy This is the ratio of the number of correct predictions out of all predictions made.
rocCurve ROC (Receiver Operating Characteristic) Curve is to examine the performance of a binary classifier, by creating a graph of the True Positives vs. False Positives.
errorRate Error Rate is the ratio of total number of incorrectly predicted instances to total number of instances.
inCorrect Incorrect is given by total number of incorrectly predicted values.
precision The precision for a class is the number of true positives divided by the total number of elements labeled as belonging to the positive class.
areaUnderPRC total area calculated under PRC (Precision Recall Curve) curve.
areaUnderROC total area calculated under ROC curve.
priorEntropy Entropy is a measure of the uncertainty associated with a random variable.
numTrueNegatives Total number of instances predicted correctly as Negative.
numTruePositives Total number of instances predicted correctly as Positive.
trueNegativeRate Ratio of number of True Negatives to total number of negatives.
truePositiveRate Ratio of number of True Positives to total number of positives.
falseNegativeRate Ratio of number of False Negatives to total number of negatives.
falsePositiveRate Ratio of number of False Positives to total number of positives.
numFalseNegatives Total number of instances where incorrectly predicted as Negatives instead of Positives.
numFalsePositives Total number of instances where incorrectly predicted as Positives instead of Negatives.
pearsonCorrelation Evaluates the worth of an attribute by measuring the pearsonCorrelation between it and the target.
confusionMatrixHeaders You can create confusion matrix in the order of confusion Matrix headers.
correlationCoefficient This parameter tells you how much true value and predicted value are related. It gives values between −1 and 1, where 0 is no relation, 1 is a very strong Linear Relation and −1 is an Inverse Linear Relation.
mathewsCorrelationCoefficient The Matthews Correlation Coefficient has a range of -1 to 1 where -1 indicates a completely wrong binary classifier while 1 indicates a completely correct binary classifier.

Model Selection

We have built two models:

Model Accuracy Kappa Error Rate Prior Entropy
Logistics 69.66% 0.277 0.30 0.973
NaiveBayesMultinomial 48.46% 0.138 0.52 0.950

Based on the above result, we would select Logistic model for prediction.

Get Model Url

Description

After Train job is finished, you can get the model url.

Code


model = train_job_output_response["output"]["modelUrl"]
 

String model = trainJobOutputResponse.getJSONObject("output").get("modelUrl");
 

By Printing model

'/spotflock-studio-prod/22/1552024436737-Loan_Data_Model_6647825745227784853.mdl'

Predict on Test Data

Description

The below code is to predict on test data by passing the model url that was obtained from previous response.

Code


predict_response = c.predict("classification", test_data, model,"weka")
 

predict_response

{'code': 200,
 'data': {'jobId': 446,
  'appId': 1555944250593,
  'name': 'weka_classification_predict',
  'library': 'weka',
  'service': 'Classification',
  'task': 'PREDICT',
  'state': 'RUN',
  'startTime': '2019-04-29T06:53:51.738+0000',
  'endTime': None,
  'request': {'library': 'weka',
   'config': {'modelUrl': '/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl',
    'params': {},
    'datasetUrl': '/spotflock-studio/xxxxxx@spotflock.com/1556520761816-Test_Loan.csv'}}}}


params = new JSONObject();
params.put("lib","weka");
JSONObject predictResponse = c.predict("classification", testData, model, params);
 

predictResponse


{"code": 200,
 "data": {"jobId": 446,
  "appId": 1555944250593,
  "name": "weka_classification_predict",
  "library": "weka",
  "service": "Classification",
  "task": "PREDICT",
  "state": "RUN",
  "startTime": "2019-04-29T06:53:51.738+0000",
  "endTime": null,
  "request": {"library": "weka",
   "config": {"modelUrl": "/spotflock-studio-prod/22/1556520769304-Loan_Model_-_Logistic_Regression_5146553929183771197.mdl",
    "params": {},
    "datasetUrl": "/spotflock-studio/xxxxxx@spotflock.com/1556520761816-Test_Loan.csv"}}}}
                                               

Get Prediction Job Status

Description

The train/predict jobs take some amount of time to be completed and so their status can be checked with this API.

Code


predict_job_status_response = c.job_status(predict_response["data"]["jobId"])
 

predict_job_status_response


JSONObject predictJobStatusResponse = c.jobStatus(predictResponse.getJSONObject("data")
.get("jobId"));

predictJobStatusResponse


{
  "jobId":439,
  "appId":1555944250593,
  "name":"weka_regression_predict",
  "library":"weka",
  "service":"Regression",
  "task":"PREDICT",
  "state":"FINISH",
  "startTime":"2019-04-29T05:55:27.324+0000",
  "endTime":"2019-04-29T05:55:33.962+0000"
}

Get Prediction Job Output

Description

Once the job status is completed, the job output can be retrieved from this API.

Code


predict_job_output_response = c.job_output(predict_response["data"]["jobId"])
 

predict_job_output_response

{
    'id': 182,
    'jobId': 437,
    'output': {
        'reqId': 211,
        'predFileUrl': '/spotflock-studio-prod/22/1552024447138-prediction.csv'
    }
}

JSONObject predictJobOutputResponse = c.jobOutput(predictResponse.getJSONObject("data")
.get("jobId")); System.out.println(predictJobOutputResponse.toString());

predictJobOutputResponse

{
    "id": 182,
    "jobId": 437,
    "output": {
        "reqId": 211,
        "predFileUrl": "/spotflock-studio-prod/22/1552024447138-prediction.csv"
    }
}

Get Prediction File Url

Description

Once the Predict job is completed, get the prediction file url.

Code


pred_file = predict_job_output_response['output']['predFileUrl']
 

String pred_file = predictJobOutputResponse.getJSONObject("output").get("predFileUrl")
 

pred_file

'/spotflock-studio-prod/22/1552024447138-prediction.csv'

Download Prediction File

Description

You can download the predicted file as csv by using below code.

Code


prediction_response = c.download(pred_file)
import io
import pandas as pd
df = pd.read_csv(io.StringIO(prediction_response.text))
df.to_csv('pred_file.csv')

JSONObject predictionResponse = c.download(pred_file);
FileWriter outputfile = new FileWriter(ENTER YOUR OUTPUT FILE PATH);
CSVWriter writer = new CSVWriter(outputfile);
writer.writeAll(prediction_response.toString());
writer.close();
 


Summary


  • This model helps us in determining if the customer will default a loan or not.
  • Financial companies who have interests in personal loans face risk from customers who default. With the advent of predictive analytics, the risk to the company can be reduced by being able to predict the outcome of these loans.