Paper Title
Cost-Efficient Clustering Techniques on Large Data Using R Environment
Abstract
Data mining is the statistical computation of exploring patterns in large dataset. The huge amount of data can be
stored in information systems. Depending upon the category of pattern we are choosing in large data a data mining tasks can
be classified into predictive and descriptive analytics. Descriptive analytics analyzes the past occurrences on data and gives
us a perception how to approach in future. Descriptive analytics can be sub classified into Association, Summarization, and
clustering. Clustering do an exploratory data analysis .For doing any statistical computations in data mining we have various
tools. In this Weka is a open-source toolkit that executes data mining algorithms. We have some disadvantages in weka tool,
it cannot handle large data, it cannot import various data formats and it implements only in Java programming language. In
this context we are using R programming environment used for data analytics. The main advantage of R is it can handle big
data. In this paper we are differentiating various clustering algorithms by using R and to identify which algorithm will be
more feasible for handling large data.
Keywords - Data mining, Descriptive analytics, Clustering,R programming