International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE An Advanced Data Clustering Framework for Mining Large-Scale Data Sets
ABSTRACT The rapid expansion of big data across diverse application domains has introduced substantial challenges in extracting useful knowledge from massive datasets. Clustering is a key unsupervised data mining technique that facilitates the identification of hidden structures by grouping similar data objects. However, conventional clustering methods such as K-Means and DBSCAN exhibit several drawbacks, including limited scalability, dependence on initial parameter settings, and degraded performance in high-dimensional and noisy data environments. This paper presents an advanced hybrid clustering framework that combines metaheuristic optimization, distributed computing, and federated learning to address these issues. The proposed methodology utilizes Variable Neighborhood Search (VNS) and Lévy Flights to enhance centroid optimization, Apache Spark for efficient parallel computation, and federated learning to maintain data privacy. Experimental validation on standard benchmark datasets demonstrates that the proposed approach delivers improved clustering accuracy, enhanced robustness, and superior scalability when compared with traditional clustering techniques.
AUTHOR DR. SAURABH SHARMA Department of Computer Science and Engineering, Baderia Global Institute of Engineering & Management, Jabalpur, Madhya Pradesh, India
VOLUME 180
DOI DOI: 10.15680/IJIRCCE.2026.1401041
PDF pdf/41_An Advanced Data Clustering Framework for Mining Large-Scale Data Sets.pdf
KEYWORDS
References [1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
[2] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, Jun. 2010, doi: 10.1016/j.patrec.2009.09.011.
[3] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, Berkeley, CA, USA, 1967, pp. 281–297.
[4] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 1996, pp. 226–231.
[5] D. Sculley, “Web-scale k-means clustering,” in Proc. 19th Int. Conf. World Wide Web (WWW), Raleigh, NC, USA, 2010, pp. 1177–1178, doi: 10.1145/1772690.1772862.
[6] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987, doi: 10.1016/0377-0427(87)90125-7.
[7] I. T. Jolliffe, Principal Component Analysis, 2nd ed. New York, NY, USA: Springer, 2002.
[8] I. T. Jolliffe and J. Cadima, “Principal component analysis: A review and recent developments,” Philosophical Transactions of the Royal Society A, vol. 374, no. 2065, pp. 1–16, 2016, doi: 10.1098/rsta.2015.0202.
[9] P. Hansen and N. Mladenović, “Variable neighborhood search: Principles and applications,” European Journal of Operational Research, vol. 130, no. 3, pp. 449–467, 2001, doi: 10.1016/S0377-2217(00)00100-4.
[10] S. Verma, M. Bhatele, and A. A. Waoo, “ROLE-BASED ACCESS CONTROL FRAMEWORK USING DYNAMIC WATERMARKING FOR SECURE DICOM MEDICAL IMAGE COMMUNICATION,” Journal / Article, May 2025. ResearchGate
[11] S. Verma, M. Bhatele, and A. A. Waoo, “ADVANCED SECURITY FRAMEWORK FOR DICOM IMAGES USING TRIPLE WATERMARKING WITH DWT AND SVD FOR ROLE-BASED ACCESS CONTROL,” Research Article, Jun. 2025. ResearchGate
[12] S. Verma, M. Bhatele, and A. A. Waoo, “ENHANCING MEDICAL IMAGE SECURITY THROUGH RGB AND YUV COLOR BASED ADVANCED TRIPLE WATERMARKING TECHNIQUES,” Article, Sep. 2024. ResearchGate
[13] S. Jain Choudhary, A. Choudhary, S. Verma, and S. Thakur, “A NOVEL APPROACH FOR SECURING MEDICAL IMAGING DATA WITH EMBEDDED POLICY-BASED WATERMARKING,” Conference Paper, Aug. 2025.
[14] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, Jan. 2019, doi: 10.1145/3298981.
[15] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving machine learning,” in Proc. ACM Conf. Computer and Communications Security (CCS), Dallas, TX, USA, 2017, pp. 1175–1191, doi: 10.1145/3133956.3133982.
[16] T. White, Hadoop: The Definitive Guide, 4th ed. Sebastopol, CA, USA: O’Reilly Media, 2015.
[17] UCI Machine Learning Repository, “KDD Cup 1999 Data,” University of California, Irvine. [Online]. Available: https://archive.ics.uci.edu
[18] D. Dua and C. Graff, “UCI Machine Learning Repository,” Irvine, CA, USA: University of California, School of Information and Computer Science, 2019.
image
Copyright © IJIRCCE 2020.All right reserved