BIG DATA ANALYTICS MACHINE LEARNING ALGORITHMS: A SURVEY

1M.Lija,

2A.Aloysius, 3S. Banumathi

1M.Phil

Scholar, 2,3Assistant professor,

1, 2, 3

Department of Computer Science,

1, 2 St Joseph’s

College, 3 Holy cross College, Trichy-2.

ABSTRACT

Big

data analytics is the process of examining huge and various data sets. Big data

to uncover unknown patterns, unidentified correlations, market trends, customer

preferences and other useful information that can help organizations make more

informed business decisions. Using different algorithms to provide comparisons

can offer some amazing results about the data being used. Making these

comparisons will give a manager more nearby into business problem and

solutions.

Predictive

analytics is one of types of big data analytics which is learns from knowledge

and expect the future performance or patterns. This paper presents a study of

on machine Learning based evolutionary algorithms which identifies the

applicable accurate algorithm based on problem.

Index Terms— Big

Data, Big Data prediction, Big Data Analytics, Evolutionary algorithms, machine Learning

I. INTRODUCTION

Big data is a

collection of data sets that are so huge and complex that usual data

processing application software is not enough to deal. Data set is

a collection of data. Big data challenges include capturing data, data

storage, data study, investigate, transfer, visualization, querying,

and updating and information privacy. There are three characteristics to

big data known as Volume, Variety and Velocity. The term “big data” was

created to define the collection of huge amounts of data in structured,

semi-structured, or unstructured formats in big databases, file systems, or other types of repositories. The

processing of this data in order to produce an analysis and combination of the

trends and actions in actual or almost real-time. Out of the above

amounts of data, the unstructured data needs more immediate analysis and bears

more valuable information to be exposed, providing a more in-depth

understanding of the researched subject. It is also the unstructured data which

incurs more challenges in collecting, storing, organizing, classifying,

analyzing, as well as supervision. In addition to the big data computing

ability, the quick advances in using intellectual data analytics techniques

drawn from the emerging areas of artificial intelligence (AI) and machine

learning (ML) provide the ability to process very big amounts of diverse

unstructured data that is now being generated daily to extract valuable

actionable information. Machine learning

explores the learning and structure of algorithms that can study from

and make predictions on data. The proper information extraction

from the variety of resources needs mining, machine learning and natural

languages processing techniques. Readily available are four types of analytics

specifically prescriptive, predictive, diagnostic, and descriptive. According

to Gartner, most of the association had used predictive compares to other

types.

Generally, Machine Learning Algorithms can be

separated into categories according to their use and the main categories are

the following:

·

Supervised learning

·

Unsupervised Learning

·

Semi-supervised Learning

·

Reinforcement Learning

·

Evolutionary

Learning

·

Deep

learning

MACHINE LEARNING

UNSUPERVISED LEARNING

SUPERVISED LEARNING

REGRESSION

CLASSIFICATION

CLUSTURING

Fig: 1 Machine learning techniques

ALGORITHM

LEARNING TASK

SUPERVISED LEARNING

NEAREST NEIGHBOR

CLASSIFICATION

NAIVE BAYES

CLASSIFICATION

DECISION TREES

CLASSIFICATION

LINEAR REGRESSION

REGRESSION

SUPPORT VECTOR MACHINE

DUAL

USE

NEURAL NETWORK

DUAL USE

UNSUPERVISED LEARNING

K_MEANS CLUSTERING

CLUSTERING

ASSOCIATION RULES

BATTERN

DETECTION

Table1. Machine Learning Algorithms

with Learning Task

II.

MACHINE LEARNING

The traditional

Machine Learning (ML) techniques have been developed and used for extracting

useful information from the data through training and validation using labeled

datasets 6. Machine learning (ML), a sub-field of artificial intelligence

(AI), focuses on the task of enabling computational systems to learn from data

about how to perform a wanted task mechanically 11. The goal of machine learning is to develop

methods that can automatically identify patterns in data, and then to use. The

machine learning task involves with numerical and probabilistic methods 9. It

development training data and produce a predictive model. The data adaptive

machine learning methods can be acknowledged throughout the science world 8. Machine learning has many applications

including decision making, predicting and it is a key enabling technology in

the operation of data mining and big data techniques in the varied fields of

healthcare, science, engineering, business and finance 5. Fig.1 illustrates

the techniques of Machine Learning. The tasks can be characterized into the

following major types:

III. SUPERVISED

LEARNING

According to the

nature of the presented data, the two main categories of learning tasks are: supervised learning when both

inputs and their required outputs (labels) are known and the system learns to

map inputs to outputs. Classification and regression are examples of supervised

learning: in classification the outputs take separate values (class labels)

while in regression the outputs are continuous. Examples of classification

algorithms are k-nearest neighbour, logistic regression, and Support Vector Machine

(SVM) while regression examples include Support Vector Regression (SVR), linear

regression, and polynomial regression. Some algorithms such as neural networks

can be used for both, classification and regression. Table.1

illustrates the types of supervised Learning algorithms and its learning task.

10

a. Nearest Neighbor

The nearest neighbor (NN) technique is very easy, highly

efficient and successful in the field of pattern recognition, text

categorization, object recognition etc. Its simplicity is its main benefit, but

the disadvantages can’t be ignored even. The memory requirement and computation

difficulty also matter. Many techniques are developed to defeat these

limitations 13.

b.

Naive Bayes

Naïve

Bayes Classifiers are based on Baye’s Theorem that assumes self-determination

among features given a class. All the attributes are analysed independently

giving all of them equal importance 18. The very surprised feature of Naive

bayes is extremely fast to run large and thin data set .These has been

generally used for the Internet traffic classification: e.g., naive Bayesian

classification of the Internet traffic.

c.

Decision Trees (DT)

Decision

Trees define as widely used spontaneous method that can be used for learning

and predicting about target features both for quantitative target attributes as

well as nominal target attributes. It is directed tree with root node which has

no incoming edges, and all other nodes with accurately one incoming edges,

known as decision nodes. 12.

d. Linear Regression

Regression analyses mainly focus on finding

association between a dependent variable and one or more independent variable.

Predict the value of dependent relative variable based on one or more

independent variable. The regression model basically divided into univariate

and multivariate and also that is further divides into linear and nonlinear.

15

e.

Support Vector Machines

Support

Vector Machines (SVM) is an extensively used supervised learning technique that

is remarkable for being practical and theoretically sound, simultaneously. The

approach of SVM is rooted in the field of numerical learning theory, and is

systematic: e.g., training a SVM has an only one of its kind solution (since it

involves optimization of a concave function) 11. A support vector machine is a Classification

method. An SVM training algorithm builds a model that assigns new examples into

one type or the other, making it a non-probabilistic binary linear classifier.

An SVM model is a representation of the examples as points in space, mapped so

that the examples of the separate categories are divided by a clear gap that is

as extensive as possible. The support

vector machine has been developed as strong tool for classification and

regression in noisy, complex domains. The two key features of support vector

machines are generalization theory, which leads to a honourable way to choose

hypothesis and, kernel functions, which introduce non-linearity in the

hypothesis space without explicitly requiring a non-linear algorithm 14.

IV. UNSUPERVISED LEARNING

Clustering is the basic method in

unsupervised learning. In clustering, the learning task is to categorize,

without requiring a labeled training set, examples into “clusters? on the basis

of perceived relationship. This clustering is used to find the groups of inputs

which have similarity in their characteristics. Spontaneously, clustering is

similar to unsupervised classification while classification in supervised

learning assumed the availability of a correctly labeled training set, the

unsupervised task of clustering seeks to classify the structure of input data

directly 8. Suggestion services to meet the requirements of users, and time

technology for analysis of clustering processing is growing in importance,

along with the big data analysis technologies 16. Table.1

refers to the types of unsupervised Learning algorithms and its learning task.

V. CONCLUSION

The paper

presents an overview of big data analytics machine-learning algorithms

particularly supervised and unsupervised algorithms for big data. The quantity

of data has been rising and data set analyzing become more competitive. Machine

learning analytics is the combination of analytics techniques and decision

optimizations. This revision would provide a support for the researchers in

this area as it provides a wide Collection of previous research. This review is

imaginary in nature. However, in

future work can be implement for enhancing SVM based algorithm in Machine

Learning.

REFERENCES

1

“Big Data Tutorial” https://intellipaat.com/blog/big-data-tutorial-for-beginners/

2 “Big Data ” https://en.wikipedia.org/wiki/Big_data

3″Big Data”

http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data

4 Bogdan

Ionescu, Dan Ionescu, Cristian Gadea, Bogdan Solomon and Mircea Trifan ,”An

Architecture and Methods for Big Data Analysis”, vol356,pp 491-514, springer

2014.

5. Fatima, M.

and Pasha, “Survey of Machine Learning Algorithms for Disease Diagnostic.

Journal of Intelligent Learning Systems and Applications , 9, 1-16, March 2017.

6Shan

Suthaharan, ” Big Data Classification:

Problems and Challenges in Network Intrusion Prediction with Machine Learning

” Performance Evaluation Review, Vol. 41, No. 4, March 2014.

7 Neha

khan, Mohd Shahid Husain, Mohd Rizwan Beg, “Big data classification using

evolutionary techniques: a survey”, IEEE conference on Engineering and

Technology (ICETECH), 2015.

8 M. I. Jordan ,

T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects”,

sciencemag.org, vol349, Issue 6245, 2015.

9 G.Vaitheeswaran,

L. Arockiam, “Machine Learning Based Approach to Enhance the Accuracy of

Sentiment Analysis on Tweets”, International Journal Of Advanced Research In

Computer Science And Management Studies, volume4,issue 5,2016.

10Alexandra

L’Heureux, Katarina Grolinger, Hany F. ElYamany, Miriam A. M. Capretz,

“Machine

Learning with Big Data: Challenges and Approaches”,DOI 10.1109/ACCESS.2017.2696365,

IEEE Access, 2017.

11 S.Banumathi, A.Aloysius, ” Big data prediction

using evolutionary techniques: a survey”, Journal

of Emerging Technologies and Innovative Research (JETIR), sep 2016.

12 Wei

Dai , Wei Ji, ” A Map reduce implementation of c4.5 Decision Tree algorithm”,

International Journal of Data base theory and Applications, Vol.7, No.1, 2014.

13 Farhad Soleimanian Gharehchopogh, Seyyed Reza Khaze

,Isa Maleki3, “A

New Approach in Bloggers Classification with Hybrid of K-Nearest Neighbor and

Artificial Neural Network Algorithms”, Indian

Journal of Science and Technology, Vol 8(3), 237–246, February 2015.

14 Mrs.P.Sheela

Rani, S.Shalini, J.Rukmani, A.Shanthini, “Energy efficient scheduling of map

reduce for evolving big data applications”, International journal of advanced

research in computer and communication engineering, vol.5, issue.2, 2016.

15 Ramya

MG, Chetan Balaji, Girish L, ” Environment change prediction to adapt climate

smart agriculture using big data analytics”, IJARCT.ORG, 2015.

16 Se-Hoon Jung,

Jong-Chan Kim, Chun-Bo Sim,” Prediction Data Processing Scheme using an

Artificial Neural Network and Data Clustering for Big Data”, IEEE Explorer,

2015.

17 Charles

W. Anderson?, Minwoo Lee†

and Daniel L. Elliott, “Faster Reinforcement Learning After Pretraining Deep

Networks to Predict State Dynamics”, IEEE Explorer, 2015.

18 Amir Gandomi, Murtaza

Haider, “Beyond the hype: big data concepts, methods, and analytics”,

International Journal of Information Management, 2015.