BIG DATA ANALYTICS MACHINE LEARNING ALGORITHMS: A SURVEY
2A.Aloysius, 3S. Banumathi
Scholar, 2,3Assistant professor,
1, 2, 3
Department of Computer Science,
1, 2 St Joseph’s
College, 3 Holy cross College, Trichy-2.
data analytics is the process of examining huge and various data sets. Big data
to uncover unknown patterns, unidentified correlations, market trends, customer
preferences and other useful information that can help organizations make more
informed business decisions. Using different algorithms to provide comparisons
can offer some amazing results about the data being used. Making these
comparisons will give a manager more nearby into business problem and
analytics is one of types of big data analytics which is learns from knowledge
and expect the future performance or patterns. This paper presents a study of
on machine Learning based evolutionary algorithms which identifies the
applicable accurate algorithm based on problem.
Index Terms— Big
Data, Big Data prediction, Big Data Analytics, Evolutionary algorithms, machine Learning
Big data is a
collection of data sets that are so huge and complex that usual data
processing application software is not enough to deal. Data set is
a collection of data. Big data challenges include capturing data, data
storage, data study, investigate, transfer, visualization, querying,
and updating and information privacy. There are three characteristics to
big data known as Volume, Variety and Velocity. The term “big data” was
created to define the collection of huge amounts of data in structured,
semi-structured, or unstructured formats in big databases, file systems, or other types of repositories. The
processing of this data in order to produce an analysis and combination of the
trends and actions in actual or almost real-time. Out of the above
amounts of data, the unstructured data needs more immediate analysis and bears
more valuable information to be exposed, providing a more in-depth
understanding of the researched subject. It is also the unstructured data which
incurs more challenges in collecting, storing, organizing, classifying,
analyzing, as well as supervision. In addition to the big data computing
ability, the quick advances in using intellectual data analytics techniques
drawn from the emerging areas of artificial intelligence (AI) and machine
learning (ML) provide the ability to process very big amounts of diverse
unstructured data that is now being generated daily to extract valuable
actionable information. Machine learning
explores the learning and structure of algorithms that can study from
and make predictions on data. The proper information extraction
from the variety of resources needs mining, machine learning and natural
languages processing techniques. Readily available are four types of analytics
specifically prescriptive, predictive, diagnostic, and descriptive. According
to Gartner, most of the association had used predictive compares to other
Generally, Machine Learning Algorithms can be
separated into categories according to their use and the main categories are
Fig: 1 Machine learning techniques
SUPPORT VECTOR MACHINE
Table1. Machine Learning Algorithms
with Learning Task
Machine Learning (ML) techniques have been developed and used for extracting
useful information from the data through training and validation using labeled
datasets 6. Machine learning (ML), a sub-field of artificial intelligence
(AI), focuses on the task of enabling computational systems to learn from data
about how to perform a wanted task mechanically 11. The goal of machine learning is to develop
methods that can automatically identify patterns in data, and then to use. The
machine learning task involves with numerical and probabilistic methods 9. It
development training data and produce a predictive model. The data adaptive
machine learning methods can be acknowledged throughout the science world 8. Machine learning has many applications
including decision making, predicting and it is a key enabling technology in
the operation of data mining and big data techniques in the varied fields of
healthcare, science, engineering, business and finance 5. Fig.1 illustrates
the techniques of Machine Learning. The tasks can be characterized into the
following major types:
According to the
nature of the presented data, the two main categories of learning tasks are: supervised learning when both
inputs and their required outputs (labels) are known and the system learns to
map inputs to outputs. Classification and regression are examples of supervised
learning: in classification the outputs take separate values (class labels)
while in regression the outputs are continuous. Examples of classification
algorithms are k-nearest neighbour, logistic regression, and Support Vector Machine
(SVM) while regression examples include Support Vector Regression (SVR), linear
regression, and polynomial regression. Some algorithms such as neural networks
can be used for both, classification and regression. Table.1
illustrates the types of supervised Learning algorithms and its learning task.
a. Nearest Neighbor
The nearest neighbor (NN) technique is very easy, highly
efficient and successful in the field of pattern recognition, text
categorization, object recognition etc. Its simplicity is its main benefit, but
the disadvantages can’t be ignored even. The memory requirement and computation
difficulty also matter. Many techniques are developed to defeat these
Bayes Classifiers are based on Baye’s Theorem that assumes self-determination
among features given a class. All the attributes are analysed independently
giving all of them equal importance 18. The very surprised feature of Naive
bayes is extremely fast to run large and thin data set .These has been
generally used for the Internet traffic classification: e.g., naive Bayesian
classification of the Internet traffic.
Decision Trees (DT)
Trees define as widely used spontaneous method that can be used for learning
and predicting about target features both for quantitative target attributes as
well as nominal target attributes. It is directed tree with root node which has
no incoming edges, and all other nodes with accurately one incoming edges,
known as decision nodes. 12.
d. Linear Regression
Regression analyses mainly focus on finding
association between a dependent variable and one or more independent variable.
Predict the value of dependent relative variable based on one or more
independent variable. The regression model basically divided into univariate
and multivariate and also that is further divides into linear and nonlinear.
Support Vector Machines
Vector Machines (SVM) is an extensively used supervised learning technique that
is remarkable for being practical and theoretically sound, simultaneously. The
approach of SVM is rooted in the field of numerical learning theory, and is
systematic: e.g., training a SVM has an only one of its kind solution (since it
involves optimization of a concave function) 11. A support vector machine is a Classification
method. An SVM training algorithm builds a model that assigns new examples into
one type or the other, making it a non-probabilistic binary linear classifier.
An SVM model is a representation of the examples as points in space, mapped so
that the examples of the separate categories are divided by a clear gap that is
as extensive as possible. The support
vector machine has been developed as strong tool for classification and
regression in noisy, complex domains. The two key features of support vector
machines are generalization theory, which leads to a honourable way to choose
hypothesis and, kernel functions, which introduce non-linearity in the
hypothesis space without explicitly requiring a non-linear algorithm 14.
IV. UNSUPERVISED LEARNING
Clustering is the basic method in
unsupervised learning. In clustering, the learning task is to categorize,
without requiring a labeled training set, examples into “clusters? on the basis
of perceived relationship. This clustering is used to find the groups of inputs
which have similarity in their characteristics. Spontaneously, clustering is
similar to unsupervised classification while classification in supervised
learning assumed the availability of a correctly labeled training set, the
unsupervised task of clustering seeks to classify the structure of input data
directly 8. Suggestion services to meet the requirements of users, and time
technology for analysis of clustering processing is growing in importance,
along with the big data analysis technologies 16. Table.1
refers to the types of unsupervised Learning algorithms and its learning task.
presents an overview of big data analytics machine-learning algorithms
particularly supervised and unsupervised algorithms for big data. The quantity
of data has been rising and data set analyzing become more competitive. Machine
learning analytics is the combination of analytics techniques and decision
optimizations. This revision would provide a support for the researchers in
this area as it provides a wide Collection of previous research. This review is
imaginary in nature. However, in
future work can be implement for enhancing SVM based algorithm in Machine
“Big Data Tutorial” https://intellipaat.com/blog/big-data-tutorial-for-beginners/
2 “Big Data ” https://en.wikipedia.org/wiki/Big_data
Ionescu, Dan Ionescu, Cristian Gadea, Bogdan Solomon and Mircea Trifan ,”An
Architecture and Methods for Big Data Analysis”, vol356,pp 491-514, springer
5. Fatima, M.
and Pasha, “Survey of Machine Learning Algorithms for Disease Diagnostic.
Journal of Intelligent Learning Systems and Applications , 9, 1-16, March 2017.
Suthaharan, ” Big Data Classification:
Problems and Challenges in Network Intrusion Prediction with Machine Learning
” Performance Evaluation Review, Vol. 41, No. 4, March 2014.
khan, Mohd Shahid Husain, Mohd Rizwan Beg, “Big data classification using
evolutionary techniques: a survey”, IEEE conference on Engineering and
Technology (ICETECH), 2015.
8 M. I. Jordan ,
T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects”,
sciencemag.org, vol349, Issue 6245, 2015.
L. Arockiam, “Machine Learning Based Approach to Enhance the Accuracy of
Sentiment Analysis on Tweets”, International Journal Of Advanced Research In
Computer Science And Management Studies, volume4,issue 5,2016.
L’Heureux, Katarina Grolinger, Hany F. ElYamany, Miriam A. M. Capretz,
Learning with Big Data: Challenges and Approaches”,DOI 10.1109/ACCESS.2017.2696365,
IEEE Access, 2017.
11 S.Banumathi, A.Aloysius, ” Big data prediction
using evolutionary techniques: a survey”, Journal
of Emerging Technologies and Innovative Research (JETIR), sep 2016.
Dai , Wei Ji, ” A Map reduce implementation of c4.5 Decision Tree algorithm”,
International Journal of Data base theory and Applications, Vol.7, No.1, 2014.
13 Farhad Soleimanian Gharehchopogh, Seyyed Reza Khaze
,Isa Maleki3, “A
New Approach in Bloggers Classification with Hybrid of K-Nearest Neighbor and
Artificial Neural Network Algorithms”, Indian
Journal of Science and Technology, Vol 8(3), 237–246, February 2015.
Rani, S.Shalini, J.Rukmani, A.Shanthini, “Energy efficient scheduling of map
reduce for evolving big data applications”, International journal of advanced
research in computer and communication engineering, vol.5, issue.2, 2016.
MG, Chetan Balaji, Girish L, ” Environment change prediction to adapt climate
smart agriculture using big data analytics”, IJARCT.ORG, 2015.
16 Se-Hoon Jung,
Jong-Chan Kim, Chun-Bo Sim,” Prediction Data Processing Scheme using an
Artificial Neural Network and Data Clustering for Big Data”, IEEE Explorer,
W. Anderson?, Minwoo Lee†
and Daniel L. Elliott, “Faster Reinforcement Learning After Pretraining Deep
Networks to Predict State Dynamics”, IEEE Explorer, 2015.
18 Amir Gandomi, Murtaza
Haider, “Beyond the hype: big data concepts, methods, and analytics”,
International Journal of Information Management, 2015.