A survey of clustering algorithms for big data: Taxonomy and empirical analysis

  • Authors: Vv.Aa.
  • IEEE Transactions on Emerging Topics in Computing
  • 2014
  • DOI: 10.1109/TETC.2014.2330519


Clustering algorithms have emerged as an alternative powerful meta-learning tool to accu- rately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.


Exploring the effectiveness of demand management policy in reducing traffic congestion and environmental pollution: Car-free day and odd-even plate measures for Bandung city in Indonesia

Muhammad Farda; Chandra Balijepalli

Case Studies on Transport Policy . 2018-07-17 sciencedirect

Traffic speed cloud maps: A new method for analyzing macroscopic traffic flow

Jianli Xiao; Zhonghao Wang

Physica A: Statistical Mechanics and its Applications . 2018-10-15 sciencedirect

Temporal traffic smoothing for IoT traffic in mobile networks

Yoshinobu Yamada; Ryoichi Shinkuma; Takanori Iwai; Takeo Onishi; Kozo Satoda

Computer Networks . 2018-12-09 sciencedirect

A new lattice model of traffic flow considering driver’s anticipation effect of the traffic interruption probability

Guanghan Peng; Hua Kuang; Li Qing

Physica A: Statistical Mechanics and its Applications . 2018-10-01 sciencedirect

A novel control strategy for balancing traffic flow in urban traffic network based on iterative learning control

Fei Yan; Gaowei Yan; Mifeng Ren; Jianyan Tian; Zhongke Shi

Physica A: Statistical Mechanics and its Applications . 2018-10-15 sciencedirect

Traffic state prediction using ISOMAP manifold learning

Qingchao Liu; Yingfeng Cai; Haobin Jiang; Jian Lu; Long Chen

Physica A: Statistical Mechanics and its Applications . 2018-09-15 sciencedirect

Extending labeled mobile network traffic data by three levels traffic identification fusion

Zhen Liu; Ruoyu Wang; Deyu Tang

Future Generation Computer Systems . 2018-11-30 sciencedirect

Effect of traffic legislation on road traffic deaths in Ningbo, China

Rongrong Sheng; Shuang Zhong; Adrian G. Barnett; Bryan J. Weiner; Cunrui Huang

Annals of Epidemiology . 2018-08-31 sciencedirect

Construction of traffic state vector using mutual information for short-term traffic flow prediction

Unsok Ryu; Jian Wang; Thaeyong Kim; Sonil Kwak; Juhyok U

Transportation Research Part C: Emerging Technologies . 2018-11-30 sciencedirect

Is city traffic damaged by torrential rain?

Shengmin Guo; Ruoqian Wu; Qingfeng Tong; Guanwen Zeng; Daqing Li

Physica A: Statistical Mechanics and its Applications . 2018-08-01 sciencedirect

An extended continuum traffic model with the consideration of the optimal velocity difference

De-li Fan; Yi-cai Zhang; Yin Shi; Yu Xue; Fang-ping Wei

Physica A: Statistical Mechanics and its Applications . 2018-10-15 sciencedirect

Managing traffic congestion in the Accra Central Market, Ghana

Frances Agyapong; Thomas Kolawole Ojo

Journal of Urban Management . 2018-09-30 sciencedirect

Modelling the net traffic congestion impact of bus operations in Melbourne

Duy Q. Nguyen-Phuoc; Graham Currie; Chris De Gruyter; Inhi Kim; William Young

Transportation Research Part A: Policy and Practice . 2018-11-30 sciencedirect

On-road measurements and modelling of vehicular emissions during traffic interruption and congestion events in an urban traffic corridor

Arti Choudhary; Sharad Gokhale

Atmospheric Pollution Research . 2018-09-26 sciencedirect

Detrended cross-correlation analysis of urban traffic congestion and NO<ce:inf loc=post>2</ce:inf> concentrations in Chengdu

Kai Shi; Baofeng Di; Kaishan Zhang; Chaoyang Feng; Laurence Svirchev

Transportation Research Part D: Transport and Environment . 2018-06-30 sciencedirect

Alleviating road network congestion: Traffic pattern optimization using Markov chain traffic assignment

Sinan Salman; Suzan Alaswad

Computers & Operations Research . 2018-11-30 sciencedirect

Exploring traffic congestion correlation from multiple data sources

Yuqi Wang; Jiannong Cao; Wengen Li; Tao Gu; Wenzhong Shi

Pervasive and Mobile Computing . 2017-10-31 sciencedirect

Is traffic congestion overrated? Examining the highly variable effects of congestion on travel and accessibility

Andrew Mondschein; Brian D. Taylor

Journal of Transport Geography . 2017-10-31 sciencedirect

Mapping distance-decay of premature mortality attributable to PM<ce:inf loc=post>2.5</ce:inf>-related traffic congestion

Weeberb J. Requia; Petros Koutrakis

Environmental Pollution . 2018-12-31 sciencedirect

Not so fast? Examining neighborhood-level effects of traffic congestion on job access

Trevor Thomas; Andrew Mondschein; Taner Osman; Brian D. Taylor

Transportation Research Part A: Policy and Practice . 2018-07-31 sciencedirect

Investigating ‘anywhere working’ as a mechanism for alleviating traffic congestion in smart cities

John L. Hopkins; Judith McKay

Technological Forecasting and Social Change . 2018-07-29 sciencedirect

A survey on Internet of Things architectures

Journal of King Saud University - Computer and Information Sciences . 2018-07-31 sciencedirect

The industrial internet of things (IIoT): An analysis framework

Hugh Boyes; Bil Hallaq; Joe Cunningham; Tim Watson

Computers in Industry . 2018-10-31 sciencedirect

The Internet of Things, Fog and Cloud continuum: Integration and challenges

Luiz Bittencourt; Roger Immich; Rizos Sakellariou; Nelson Fonseca; Omer Rana

Internet of Things . 2018-10-31 sciencedirect

The Internet of Things: Foundational ethical issues

Fritz Allhoff; Adam Henschke

Internet of Things . 2018-09-30 sciencedirect

Ownership of personal data in the Internet of Things

Computer Law & Security Review . 2018-10-31 sciencedirect

5G Internet of Things: A survey

Shancang Li; Li Da Xu; Shanshan Zhao

Journal of Industrial Information Integration . 2018-06-30 sciencedirect

Internet of things forensics: Recent advances, taxonomy, requirements, and open challenges

Ibrar Yaqoob; Ibrahim Abaker Targio Hashem; Arif Ahmed; S. M. Ahsan Kazmi; Choong Seon Hong

Future Generation Computer Systems . 2018-10-11 sciencedirect

EclipseIoT: A secure and adaptive hub for the Internet of Things

Eirini Anthi; Shazaib Ahmad; Omer Rana; George Theodorakopoulos; Pete Burnap

Computers & Security . 2018-09-30 sciencedirect

An Elevator Monitoring System Based On The Internet Of Things

You Zhou; Kai Wang; Hongxia Liu

Procedia Computer Science . 2018-12-31 sciencedirect

REIsearch NEWS


Playing catch-up, Germany throws money at AI


Europe 5G network buildout to trigger deals, won't bust capex budgets


The hackers getting paid to keep the internet safe


How Facebook Failed To Build A Better Alexa (Or Siri)


How Companies Can and Will Likely Respond to Smart Governance Policies


Smart City Approaches in the Real World


Free flow of data in the EU – a pathway into the cloud


Big data used to predict the future


Data Privacy And Cybersecurity Issues In Mergers And Acquisitions


2018 Cybersecurity Market Report


Disconnecting from the IoT: Are we in too deep?


Chinese news agency adds AI anchors to its broadcast team


AI is not “magic dust” for your company, says Google’s Cloud AI boss


West Virginians abroad in 29 countries have voted by mobile device, in the biggest blockchain-based voting test ever


ANYmal robot tested on offshore platform


“Towards a Good AI Society” Summit focuses on ethics of digital technologies


Databases vs data lakes: Which should you be using?


iNerd quiz: one out of two Europeans scores as a digital Mr. Bean


Deep learning is not a replacement for human creativity, period


Self-driving cars go fully driverless on California roads