I'm a data scientist specializing in machine learning, big data analytics, and AI model development. My journey began with a machine learning course during my bachelor's studies and has since evolved through diverse projects spanning facial verification systems, recommendation engines, and predictive analytics. Currently conducting research at Inria/LISN, I focus on 3D interactive ambient displays and spatial audio analytics, combining deep learning architectures with IoT systems and real-time data visualization. My experience spans academia, research, and industry across six countriesโFrance, Spain, Belgium, Sweden, South Korea, and Kazakhstan. This international background has strengthened my collaborative approach and ability to work effectively in multicultural environments. I hold a Big Data Management and Analytics master's degree and have expertise in Python, machine learning frameworks, data visualization, and statistical analysis. When not immersed in data, I enjoy painting, hiking, and exploring nutrition science.
Constructed advanced ML pipeline to predict chess puzzle difficulty ratings using 4.5M+ training instances. Architected custom PyTorch Transformer with specialized feature embeddings and designed hybrid Tree+Neural model combining LightGBM with deep neural networks.
๐Developed face verification system using VAE and AE as feature extractors, achieving 86.65% accuracy on LFW benchmark dataset. Engineered deep learning architectures including ResNet18 with ArcFace loss processing 494k+ facial images.
๐Developed 3D interactive prototype using Three.js and WebGL with particle-based waveform rendering and spatial sound localization. Engineered dual-microphone ESP32 IoT system for real-time meeting room analytics with ML pipeline.
๐Engineered ETL pipeline using PySpark and managed data with Delta Lake. Developed personalized recommendation algorithms via Apache Spark and implemented real-time data stream processing with Apache Kafka.
๐Architected predictive analytics pipeline using Python to identify high-potential Uzbek buyers for Korean exporters, enhancing B2B lead generation accuracy by 35%. Engineered 9+ interactive Tableau dashboards for event analytics.
๐Orchestrated data pipeline for sociological research on Asian politics processing 10k+ datasets using R and Python. Applied NLP and social network analysis to study political discourse and elite relationships in East Asian societies.
๐This project demonstrates the implementation of TPC-DS benchmarking on a MySQL database using Python scripts to automate the process. The primary goal is to evaluate the performance and scalability of the MySQL database under different conditions, as defined by the TPC-DS benchmark.
๐Developed a multi-modal sentiment analysis system combining text, audio, and visual features using transformer architectures. Implemented BERT for text processing, CNN for image analysis, and LSTM for audio features, achieving 94% accuracy on multimodal datasets.
๐Implemented Graph Convolutional Networks (GCN) and GraphSAGE for molecular property prediction in drug discovery. Built molecular graph representations and achieved state-of-the-art performance on BACE, BBBP, and Tox21 benchmark datasets using PyTorch Geometric.
๐Created an AI-powered code review assistant using fine-tuned CodeBERT and GPT-3.5 models. Implemented automated bug detection, code quality assessment, and suggestion generation, reducing manual review time by 60% across 500+ repositories.
๐Developed a real-time multi-object tracking system using YOLO v8 for detection and DeepSORT for tracking. Implemented Kalman filtering and Hungarian algorithm for object association, achieving 95% tracking accuracy at 30 FPS on surveillance footage.
๐Built a comprehensive financial prediction platform using LSTM, ARIMA, and Prophet models for cryptocurrency and stock price forecasting. Integrated technical indicators, sentiment analysis from news data, and real-time trading signals with 78% accuracy.
๐Knowledge discovery for data streaming requires online feature selection to reduce the complexity of real-world datasets and significantly improve the learning process. This paper presents a comprehensive survey of feature selection (FS) algorithms for both static and dynamic environments, providing a detailed taxonomy that categorizes these methods based on search strategy, evaluation process, and feature structure.