Image + users

Data Analysis For Machine Learning with Python (part-4)

Data Analysis For Machine Learning with Python (part-4)

05/11/19   15 minutes read     423 Naren Allam

In this article, you will learn machine learning types and how to practically build a complete model with training,testing and model Evaluation on mobiles data set and much more..!

follow data analysis articles :
Data Analysis For Machine Learning with Python (part-1)
Data Analysis For Machine Learning with Python (part-2)
Data Analysis For Machine Learning with Python (part-3)

Machine Learning :

Machine Learning is a class of software that can self-improve with exposure to useful is the basis of Artificial intelligence that involves machines self-developing models to process data and make predictions.

The following are common types of machine learning :
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • 1. Supervised_Learning :
    In supervised learning, machine learning model learns from the past input data and makes future prediction as output.or supervised learning means predicting input values based on the label or the target variable of the training.

    Types of Supervised Learning :

    1.Classification :Classification problems normally have a categorical output like a ‘yes’ or ‘no’, ‘1’ or ‘0’, ‘True’ or ‘false’.

    2.Regression :Regression is a Predictive Modelling Technique,Relationship between Dependent(target) vs Independent variable(s) (predictor).This technique is used for Forecasting,Time-series modelling,Finding the casual effect relationship between variables.

    What is Algorithm : Algorithm is basically a mathematical formula used to process some information and produce some desired results and it is a series of steps for solving a problem,executing a task or performing a calculation.

    Algorithms in Classification-Supervised Learning:
    1.Linear Classifiers: Logistic Regression, Naive Bayes Classifier.
    2.Nearest Neighbor
    3.Support Vector Machines
    4.Decision Trees
    5.Boosted Trees
    6.Random Forest
    7.Neural Networks

    Algorithms in Regression-Supervised Learning:
    There are many regression analysis techniques, but the three most widely used regression models are:
    1.Linear Regression
    2.Logistic Regression
    3.Polynomial Regression

    2.Unsupervised Learning :
    In Unsupervised learning, machine learning model uses unlabeled input data and allows the algorithm to act on that information without guidance.
    The most common unsupervised learning method is cluster analysis which is used for exploratory data analysis to find hidden patterns or grouping in data.

    Clustering : The task of grouping related data points together without labeling them.Grouping patient records with similar symptoms without knowing what the symptoms indicate.

    Algorithms in Unsupervised Learning:
    Some of the algorithms used in unsupervised learning are:
    1.K-Means Clustering
    2.Hierarchical Clustering
    3.Hidden Markov model

    3.Reinforcement Learning :
    Reinforcement learning involves teaching the machine to think for itself by using a system of rewards.It is a computational approach used to understand and automate the goal-directed learning and decision-making.
    It is distinguished from other computational approaches by its emphasis on learning by the individual from direct interaction with its environment, without relying upon some predefined labelled dataset.

    Elements of Reinforcement Learning :
    Except for the agent and the environment, we have four sub-elements of reinforcement learning system :
    1.Policy:It defines the learning agent’s way of behaving at a given time.
    2.Reward function:It defines the goal in reinforcement learning problem.
    3.Value function:It specifies what is good in the long run.
    4.Model of the environment (optional):Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced.

    How does it work?
    Reinforcement learning is all about trying to understand the optimal way of making decisions/actions so that we maximise reward R. This reward is a reply signal that shows how well the agent is doing at a given time step.The action A that an agent takes at every time step is a function of both the reward and the state S, which is a description of the environment the agent is in. The mapping from environment states to actions is policy P.The policy basically defines the agent’s way of behaving at a certain time, given a certain situation.Now, we also have a value function V which is a measure of how good each position is. This is different from the reward in that the reward signal indicates what is good in the immediate sense, while the value function is more indicative of how good it is to be in this state/position in the long run. Finally, we have a model M which is the agent’s representation of the environment. This is the agent’s model of how it thinks that the environment is going to behave.
    The whole Reinforcement Learning environment can be described with an MDP.

    Markov Decision processes(MDP) :
    are mathematical frameworks to describe an environment in reinforcement learning and almost all RL problems can be formalized using MDPs. An MDP consists of a set of finite environment states S, a set of possible actions A(s) in each state, a real valued reward function R(s) and a transition model P(s’, s | a). However, real world environments are more likely to lack any prior knowledge of environment dynamics. Model-free RL methods come handy in such cases.

    Reinforcement Learning algorithms :
    2.SARSA (State-Action-Reward-State-Action)
    3.Deep Q-Networks
    4.DDPG(Deep Deterministic Policy Gradient)

    Building a model on mobile data_set

    Install required packages to build a model.

    PYTHON  Copy
                          # Required packages.
    from sklearn.tree import DecisionTreeClassifier
    from sklearn import tree
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import confusion_matrix
    from sklearn.tree import export_graphviz
    from sklearn.externals.six import StringIO 
    from IPython.display import Image 
    from pydot import graph_from_dot_data
    import pandas as pd
    import numpy as np
    import pydotplus

    Select predictor and response variables.

    PYTHON  Copy
    X = df_final.loc[:, df_final.columns != 'target_rating']
    y =  df_final.loc[:, df_final.columns == 'target_rating']

    Splitting the dataset into train and test datsets

    PYTHON  Copy
                          # Splitting the dataset into train and test datsets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
    dt = DecisionTreeClassifier() 
    dt =, y_train)
    y_pred_train = dt.predict(X_train)
    y_pred_test = dt.predict(X_test)
    # y_pred_train, y_pred_test

    Model Evaluation

    PYTHON  Copy
                          confusion_matrix(y_true=list(y_train.target_rating), y_pred=list(y_pred_train))
    # Accuracy on training dataset
    tn_train, fp_train, fn_train, tp_train = confusion_matrix(y_true=list(y_train.target_rating), y_pred=list(y_pred_train)).ravel()
    tn_train, fp_train, fn_train, tp_train
    accuracy_train = (tn_train+tp_train)/(tn_train+fp_train+fn_train+tp_train)*100
    "The accuracy on training data is {} %".format(accuracy_train)
    # 'The accuracy on training data is 100.0 %'
    # Accuracy on test dataset
    tn_test, fp_test, fn_test, tp_test = confusion_matrix(y_true=list(y_test.target_rating), y_pred=list(y_pred_test)).ravel()
    tn_test, fp_test, fn_test, tp_test
    #(17, 2, 1, 44) 

    The accuracy on training data

    PYTHON  Copy
                          accuracy_test = (tp_test+tn_test)/(tn_test+fp_test+fn_test+tp_test)*100
    "The accuracy on training data is {} %".format(accuracy_test)

    'The accuracy on training data is 95.3125 %'

    in this article, we did how to practically build a complete model with training,testing and model Evaluation on mobiles data set.