The accuracy score is calculated simply by dividing the number of correct predictions made by the model by the total number of predictions made by the model (can be multiplied by 100 to transform the result into a percentage). Accuracy scoreĪccuracy score is one of the most basic evaluation metrics which is widely used to evaluate classification models. The evaluation metrics we are going to use are the accuracy score metric, f1 score metric, and finally the confusion matrix. Our main objective in this process is to find the best model for our given case. EvaluationĪs I said before, in this process we are going to evaluate our built models using the evaluation metrics provided by the scikit-learn package. Our next step is to evaluate each of the models and find which is the most suitable one for our case. With that, we have successfully built our six types of classification models and interpreted the code for easy understanding. We mentioned the ‘max_depth’ to be 4 and finally, fitted and stored the predicted values into the ‘xgb_yhat’. We built the model using the ‘XGBClassifier’ algorithm provided by the xgboost package. That’s the reason why the random forest model is used versus a decision tree. Remember that the main difference between the decision tree and the random forest is that, decision tree uses the entire dataset to construct a single model whereas, the random forest uses randomly selected features to construct multiple models. Finally, fitting and storing the values into the ‘rf_yhat’. The next model is the Random forest model which we built using the ‘RandomForestClassifier’ algorithm and we mentioned the ‘max_depth’ to be 4 just like how we did to build the decision tree model. ![]() After that, we stored the predicted values into the ‘svm_yhat’ after fitting the model. We built the Support Vector Machine model using the ‘SVC’ algorithm and we didn’t mention anything inside the algorithm as we managed to use the default kernel which is the ‘rbf’ kernel. There is nothing much to explain about the code for Logistic regression as we kept the model in a way more simplistic manner by using the ‘LogisticRegression’ algorithm and as usual, fitted and stored the predicted variables in the ‘lr_yhat’ variable. ![]() The value of the ‘n_neighbors’ is randomly selected but can be chosen optimistically through iterating a range of values, followed by fitting and storing the predicted values into the ‘knn_yhat’ variable. We have built the model using the ‘KNeighborsClassifier’ algorithm and mentioned the ‘n_neighbors’ to be ‘5’. Finally, we have fitted and stored the predicted values into the ‘tree_yhat’ variable. Inside the algorithm, we have mentioned the ‘max_depth’ to be ‘4’ which means we are allowing the tree to split four times and the ‘criterion’ to be ‘entropy’ which is most similar to the ‘max_depth’ but determines when to stop splitting the tree. Starting with the decision tree, we have used the ‘DecisionTreeClassifier’ algorithm to build the model. In the above code, we have built six different types of classification models starting from the Decision tree model to the XGBoost model. Let’s import all of our primary packages into our python environment. Importing the Packagesįor this project, our primary packages are going to be Pandas to work with data, NumPy to work with arrays, scikit-learn for data split, building and evaluating the classification models, and finally the xgboost package for the xgboost classifier model algorithm. With that, let’s dive into the coding part. In recent days, the job market for python is seamlessly higher than any other programming language and companies like Netflix are using python for data science and many other applications. We are using python for this project because it is really effortless to make use of a bunch of methods, has an extensive amount of packages for machine learning, and can be learned easily. Evaluating the created classification models using the evaluation metrics.Building six types of classification models.Processing the data to our needs and Exploratory Data Analysis.Importing the required packages into our python environment. ![]() Given the case, it will be more optimistic to deploy a classification model rather than any others. ![]() Why Classification? Classification is the process of predicting discrete variables (binary, Yes/no, etc.). Our ultimate intent is to tackle this situation by building classification models to classify and distinguish fraud transactions. This is the case we are going to deal with. You are given a dataset containing the transactions between people, the information that they are fraud or not, and you are asked to differentiate between them. Assume that you are employed to help a credit card company to detect potential fraud cases so that the customers are ensured that they won’t be charged for the items they did not purchase.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |