International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE A New Data Clustering Approach for Data Mining in Large Data Sets
ABSTRACT The exponential growth of big data has created significant challenges in extracting meaningful knowledge from large-scale datasets. Clustering is a fundamental unsupervised data mining technique used to discover hidden patterns and group similar data objects. However, traditional clustering algorithms such as K-Means and DBSCAN suffer from limitations related to scalability, sensitivity to initialization, and poor performance in high-dimensional and noisy environments. This paper proposes a novel hybrid data clustering framework that integrates metaheuristic optimization, distributed computing, and federated learning to overcome these challenges. The proposed approach employs Variable Neighborhood Search (VNS) and Lévy Flights for intelligent centroid optimization, Apache Spark for scalable parallel processing, and federated learning to ensure privacy preservation. Experimental evaluation on benchmark real-world datasets demonstrates that the proposed method achieves superior clustering accuracy, robustness, and scalability compared to conventional clustering techniques.
AUTHOR JANHVI BAIGA, DR. SAURABH SHARMA, PROF. SAURABH VERMA Department of Computer Science and Engineering, Baderia Global Institute of Engineering & Management, Jabalpur, Madhya Pradesh, India
VOLUME 180
DOI DOI: 10.15680/IJIRCCE.2026.1401037
PDF pdf/37_A New Data Clustering Approach for Data Mining in Large Data Sets.pdf
KEYWORDS
References [1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
[2] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, Jun. 2010, doi: 10.1016/j.patrec.2009.09.011.
[3] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, Berkeley, CA, USA, 1967, pp. 281–297.
[4] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 1996, pp. 226–231.
[5] D. Sculley, “Web-scale k-means clustering,” in Proc. 19th Int. Conf. World Wide Web (WWW), Raleigh, NC, USA, 2010, pp. 1177–1178, doi: 10.1145/1772690.1772862.
[6] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987, doi: 10.1016/0377-0427(87)90125-7.
[7] I. T. Jolliffe, Principal Component Analysis, 2nd ed. New York, NY, USA: Springer, 2002.
[8] I. T. Jolliffe and J. Cadima, “Principal component analysis: A review and recent developments,” Philosophical Transactions of the Royal Society A, vol. 374, no. 2065, pp. 1–16, 2016, doi: 10.1098/rsta.2015.0202.
[9] P. Hansen and N. Mladenović, “Variable neighborhood search: Principles and applications,” European Journal of Operational Research, vol. 130, no. 3, pp. 449–467, 2001, doi: 10.1016/S0377-2217(00)00100-4.
[10] S. Verma, M. Bhatele, and A. A. Waoo, “ROLE-BASED ACCESS CONTROL FRAMEWORK USING DYNAMIC WATERMARKING FOR SECURE DICOM MEDICAL IMAGE COMMUNICATION,” Journal / Article, May 2025. ResearchGate
[11] S. Verma, M. Bhatele, and A. A. Waoo, “ADVANCED SECURITY FRAMEWORK FOR DICOM IMAGES USING TRIPLE WATERMARKING WITH DWT AND SVD FOR ROLE-BASED ACCESS CONTROL,” Research Article, Jun. 2025. ResearchGate
[12] S. Verma, M. Bhatele, and A. A. Waoo, “ENHANCING MEDICAL IMAGE SECURITY THROUGH RGB AND YUV COLOR BASED ADVANCED TRIPLE WATERMARKING TECHNIQUES,” Article, Sep. 2024. ResearchGate
[13] S. Jain Choudhary, A. Choudhary, S. Verma, and S. Thakur, “A NOVEL APPROACH FOR SECURING MEDICAL IMAGING DATA WITH EMBEDDED POLICY-BASED WATERMARKING,” Conference Paper, Aug. 2025.
[14] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, Jan. 2019, doi: 10.1145/3298981.
[15] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving machine learning,” in Proc. ACM Conf. Computer and Communications Security (CCS), Dallas, TX, USA, 2017, pp. 1175–1191, doi: 10.1145/3133956.3133982.
[16] T. White, Hadoop: The Definitive Guide, 4th ed. Sebastopol, CA, USA: O’Reilly Media, 2015.
[17] UCI Machine Learning Repository, “KDD Cup 1999 Data,” University of California, Irvine. [Online]. Available: https://archive.ics.uci.edu
[18] D. Dua and C. Graff, “UCI Machine Learning Repository,” Irvine, CA, USA: University of California, School of Information and Computer Science, 2019.
image
Copyright © IJIRCCE 2020.All right reserved