Integrating OSINT with Hadoop Analytics for Rapid Person Identification in Smart-City Events
DOI:
https://doi.org/10.46793/AlfaTech1.2.39PKeywords:
Big Data, Apache Hadoop, OSINT, HDFS, HiveQL, OpenCVAbstract
Apache Hadoop is a platform for storing, processing, and analyzing large amounts of data. In this paper, data are ingested into HDFS and queried using HiveQL, while MapReduce is applied for large-corpus text processing. For image analysis, convolutional neural networks (CNN) with OpenCV are used for object/face detection and matching. OSINT (Open-Source Intelligence) techniques collect images, videos, and text from publicly available sources and fuse them with camera streams to accelerate person identification in crowded events. We evaluate the system by measuring precision/recall, processing time, and overall throughput. We also note legal and ethical safeguards (public sources only, data minimization, audit logging). This article is an invited, extended version of our AlfaTech 2025 conference paper [1].