Dilbar Isakova
"Some lessons have to be experienced before they can be understood"

DILBAR ISAKOVA

Data Science Intern

I'm a data scientist specializing in machine learning, big data analytics, and AI model development. My journey began with a machine learning course during my bachelor's studies and has since evolved through diverse projects spanning facial verification systems, recommendation engines, and predictive analytics. Currently conducting research at Inria/LISN, I focus on 3D interactive ambient displays and spatial audio analytics, combining deep learning architectures with IoT systems and real-time data visualization. My experience spans academia, research, and industry across six countriesโ€”France, Spain, Belgium, Sweden, South Korea, and Kazakhstan. This international background has strengthened my collaborative approach and ability to work effectively in multicultural environments. I hold a Big Data Management and Analytics master's degree and have expertise in Python, machine learning frameworks, data visualization, and statistical analysis. When not immersed in data, I enjoy painting, hiking, and exploring nutrition science.

12+ ML Projects
5 Countries
3+ Years Experience

EDUCATION

Master Thesis

Apr 2025 - Present
Inria/LISN
Fourth Semester in the Big Data Management and Analytics Master's program - BDMA Erasmus Mundus

Master Thesis Topic:

"Intelligent Ambient Data Visualization on Non-Planar Displays: A Multi-Modal Machine Learning Approach for Spatial Audio-Visual Analytics"

Master of Science - Computer Science & Artificial Intelligence

Sep 2024 - Present
CentraleSupรฉlec - Universitรฉ Paris-Saclay
Third Semester in the Big Data Management and Analytics Master's program - BDMA Erasmus Mundus

Key Coursework:

Massive Graph Management and Analytics Advanced Machine Learning Visual Analytics Decision Modelling Deep Learning Research Project Law and Intellectual Property French A2

Master of Technology - Big Data Management

Feb 2024 - Aug 2024
Universitat Politรจcnica de Catalunya
Second Semester in the Big Data Management and Analytics Master's program - BDMA Erasmus Mundus

Key Coursework:

Big Data Management Machine Learning Viability of Business Projects Semantic Data Management Big Data Seminar Ethics in Big Data Spanish A1

Master of Science - Business Intelligence Fundamentals

Sep 2023 - Jan 2024
Universitรฉ libre de Bruxelles
First Semester in the Big Data Management and Analytics Master's program - BDMA Erasmus Mundus

Key Coursework:

Data Warehouses Database System Architectures Management of Business and Data Science Workflows Advanced Databases Data Mining French A1

Bachelor of Information Systems

Aug 2018 - Jun 2022
KIMEP University
Grade: Cum Laude GPA: 4.14/4.33 (Top 5%)

Major & Achievements:

Computer Information Technologies Theis Topic: Forex Trading Using Machine Learning Algorithms Scholarship for Citizens of Central Asian Countries 2018-2022

ERASMUS ICM Program with Scholarship

Jan 2021 - Jun 2021
Uppsala University
Grade: Pass with Distinction

Completed Courses:

Business Intelligence Applications Big Data Management and Analysis Business Analytics Digital Business

Global Korean Scholarship - 3D Modeling

Jun 2019 - Aug 2019
Kyungwoon University
Grade: Pass with Distinction

Achievements & Courses:

Airfoil Project was honored with a 2nd place for high research and project development Calculus Advanced Programming Information Systems and Networking Physics Drone Control

EXPERIENCES

Apr 2025 - Present
Research Intern - Data Science and Visualization
Inria/LISN
Gif-sur-Yvette, France

Responsibilities & Achievements:

  • Developed a 3D interactive prototype of non-planar(spherical) ambient display using Three.js and WebGL featuring particle-based waveform rendering, spatial sound localization, and dynamic meeting state visualization, achieving 60+ FPS with real-time processing of 100+ audio frames per second.
  • Engineered a dual-microphone ESP32 IoT system for real-time meeting room analytics, implementing FFT-based audio processing and synchronized stereo capture, with low-latency data streaming via WebSocket to support spatial audio visualization.
  • Implementing ML pipeline combining Spatial Audio Transformer (SAT) architecture with stereo-aware attention mechanisms for multi-task learning: speaker counting (3-class), meeting classification (5-class), and continuous engagement prediction.
  • Creating comprehensive data augmentation framework generating 50+ hours of synthetic training data from limited recordings through room impulse response simulation, voice conversion, and meeting scenario synthesis, reducing data collection time by 80%.
  • Optimizing deep learning models (SAT) for edge deployment on ESP32 microcontroller through knowledge distillation and 8-bit quantization, achieving <50ms inference time while maintaining 90%+ accuracy for real-time IoT implementation.

Technologies:

Python Three.js WebGL PyTorch ESP32 Machine Learning Audio Processing WebSocket
Dec 2022 - Aug 2023
Data Science Intern
KOTRA (Korea Trade Promotion Corporation)
Tashkent, Uzbekistan

Responsibilities & Achievements:

  • Architected predictive analytics pipeline using Python (Pandas, Scikit-Learn, NumPy) to identify high-potential Uzbek buyers for Korean exporters, integrating CRM data, market intelligence, and headquarters datasets to enhance B2B lead generation accuracy by 35%.
  • Engineered 9+ interactive Tableau dashboards for event analytics, enabling data-driven planning of Korean-Uzbek trade conferences and exhibitions while improving attendee targeting precision and post-event ROI measurement.
  • Optimized end-to-end ML workflows including feature engineering, hyperparameter tuning, and model validation using logistic regression and ensemble methods (Random Forest, Gradient Boosting) for buyer classification tasks.
  • Employed predictive analytics with Python to pinpoint potential Uzbek buyers for Korean companies, increasing B2B engagement by analyzing sales data and market dynamics from industry reports and specific data provided by headquarters.
  • Developed 9+ dashboards in Tableau to visualize export data which helped to plan and execute major events (regional Korean-Uzbek conferences and exhibitions) with improved attendee targeting and engagement metrics.

Technologies:

Python Pandas Scikit-Learn NumPy Tableau Machine Learning Data Analytics Data Visualization
Jan 2022 - Sep 2022
Research Assistant
KIMEP University
Almaty, Kazakhstan

Responsibilities & Achievements:

  • Orchestrated data pipeline for sociological research projects on Asian politics including North Korean elite networks, and regional political structures, processing 10k+ datasets using R (dplyr, ggplot2) and Excel for statistical modeling and visualization.
  • Applied statistical analysis, natural language processing, and social network analysis to study political discourse and elite relationships in East Asian societies using Python and R.
  • Managed data collection and database queries using SQL to extract relevant subsets for comparative political studies across Asian regions.
  • Implemented data entry, data cleaning and producing figures in R and Excel
  • Manipulated data and datasets using SQL
  • Succeeded in performing advanced statistical analysis, NLP and SNA using Python and R

Technologies:

R Python SQL Excel ggplot2 NLP Social Network Analysis Statistical Analysis
Jan 2022 - Apr 2022
iOS Developer Intern
ICE MEDIA & Tech Group
Almaty, Kazakhstan

Responsibilities & Achievements:

  • Learned the basics of functional programming
  • Completed SwiftUI framework integration successfully
  • Refined project management skills by developing mobile application designs, user flows
  • Developed from the scratch a weather app (code is available on GitHub)
  • Applied modern iOS development practices and design patterns
  • Collaborated with cross-functional teams to deliver mobile solutions

Technologies:

Swift SwiftUI iOS Development Xcode Git GitHub
Sep 2021 - Dec 2021
Data Science Intern
Data Science Lab
Almaty, Kazakhstan

Responsibilities & Achievements:

  • Acquired basic knowledge in Machine Learning fundamentals and methodologies
  • Learned how to collect, clean, and preprocess large datasets including handling missing values, normalizing data, and transforming data for analysis using Python with libraries such as pandas and NumPy
  • Developed skills in exploratory data analysis and data visualization techniques
  • Gained hands-on experience with data preprocessing workflows and best practices
  • Applied statistical methods for data validation and quality assessment

Technologies:

Python Pandas NumPy Machine Learning Data Analysis Data Visualization
Jan 2020 - Sep 2020
Information Technology Assistant
KIMEP University
Almaty, Kazakhstan

Responsibilities & Achievements:

  • Supervised and monitored the work of administrative staff in the Center for Entrepreneurship and Innovation
  • Performed official report of IT and educational project for the Bang College of Business
  • Accomplished IT, social media and educational projects for KIMEP University
  • Provided technical support for educational initiatives and digital transformation projects
  • Collaborated with academic staff to improve IT infrastructure and processes

Technologies:

Microsoft Office System Administration Technical Support Documentation

PROJECTS

COMPETITION PROJECT

Chess Puzzle Difficulty Prediction

Apr 2025 โ€“ Present
โ™Ÿ๏ธ

Constructed advanced ML pipeline to predict chess puzzle difficulty ratings using 4.5M+ training instances. Architected custom PyTorch Transformer with specialized feature embeddings and designed hybrid Tree+Neural model combining LightGBM with deep neural networks.

๐Ÿ”—
RESEARCH PROJECT

Variational Autoencoder for Facial Verification

Sep 2024 โ€“ Feb 2025
๐Ÿ‘ค

Developed face verification system using VAE and AE as feature extractors, achieving 86.65% accuracy on LFW benchmark dataset. Engineered deep learning architectures including ResNet18 with ArcFace loss processing 494k+ facial images.

๐Ÿ”—
RESEARCH PROJECT

3D Interactive Ambient Display System

Apr 2025 โ€“ Present
๐ŸŒ

Developed 3D interactive prototype using Three.js and WebGL with particle-based waveform rendering and spatial sound localization. Engineered dual-microphone ESP32 IoT system for real-time meeting room analytics with ML pipeline.

๐Ÿ”—
ACADEMIC PROJECT

AmiGo Social App

Jan 2024 โ€“ Jun 2024
๐Ÿ“ฑ

Engineered ETL pipeline using PySpark and managed data with Delta Lake. Developed personalized recommendation algorithms via Apache Spark and implemented real-time data stream processing with Apache Kafka.

๐Ÿ”—
RESEARCH PROJECT

Predictive Analytics for B2B Lead Generation

Feb 2023 โ€“ Sep 2023
๐Ÿ“Š

Architected predictive analytics pipeline using Python to identify high-potential Uzbek buyers for Korean exporters, enhancing B2B lead generation accuracy by 35%. Engineered 9+ interactive Tableau dashboards for event analytics.

๐Ÿ”—
RESEARCH PROJECT

Political Discourse Analysis System

Jan 2022 โ€“ Sep 2022
๐Ÿ›๏ธ

Orchestrated data pipeline for sociological research on Asian politics processing 10k+ datasets using R and Python. Applied NLP and social network analysis to study political discourse and elite relationships in East Asian societies.

๐Ÿ”—
MASTER COURSE PROJECT

Benchmarking the MySQL DBMS on TPC-DS Benchmark

Sep 2023 โ€“ Dec 2023
+

This project demonstrates the implementation of TPC-DS benchmarking on a MySQL database using Python scripts to automate the process. The primary goal is to evaluate the performance and scalability of the MySQL database under different conditions, as defined by the TPC-DS benchmark.

๐Ÿ”—
MACHINE LEARNING PROJECT

Multi-Modal Sentiment Analysis

Mar 2024 โ€“ Jun 2024
๐ŸŽญ

Developed a multi-modal sentiment analysis system combining text, audio, and visual features using transformer architectures. Implemented BERT for text processing, CNN for image analysis, and LSTM for audio features, achieving 94% accuracy on multimodal datasets.

๐Ÿ”—
DEEP LEARNING PROJECT

Graph Neural Networks for Drug Discovery

Oct 2024 โ€“ Jan 2025
๐Ÿงฌ

Implemented Graph Convolutional Networks (GCN) and GraphSAGE for molecular property prediction in drug discovery. Built molecular graph representations and achieved state-of-the-art performance on BACE, BBBP, and Tox21 benchmark datasets using PyTorch Geometric.

๐Ÿ”—
NLP PROJECT

Automated Code Review Assistant

Nov 2023 โ€“ Feb 2024
๐Ÿค–

Created an AI-powered code review assistant using fine-tuned CodeBERT and GPT-3.5 models. Implemented automated bug detection, code quality assessment, and suggestion generation, reducing manual review time by 60% across 500+ repositories.

๐Ÿ”—
COMPUTER VISION PROJECT

Real-Time Object Tracking System

Aug 2023 โ€“ Nov 2023
๐Ÿ‘๏ธ

Developed a real-time multi-object tracking system using YOLO v8 for detection and DeepSORT for tracking. Implemented Kalman filtering and Hungarian algorithm for object association, achieving 95% tracking accuracy at 30 FPS on surveillance footage.

๐Ÿ”—
TIME SERIES PROJECT

Financial Market Prediction Platform

Jun 2022 โ€“ Aug 2022
๐Ÿ“ˆ

Built a comprehensive financial prediction platform using LSTM, ARIMA, and Prophet models for cryptocurrency and stock price forecasting. Integrated technical indicators, sentiment analysis from news data, and real-time trading signals with 78% accuracy.

๐Ÿ”—

SKILLS

๐Ÿ’ป

Programming Languages

๐Ÿ
Python
๐Ÿ“Š
R
๐Ÿ—ƒ๏ธ
SQL
๐Ÿ“ฑ
Swift
๐ŸŸจ
JavaScript
๐ŸŒ
HTML/CSS
๐Ÿค–

Data Science & AI

๐Ÿ”ฅ
PyTorch
๐Ÿง 
TensorFlow
๐Ÿผ
Pandas
โš™๏ธ
Scikit-learn
๐Ÿ‘๏ธ
Computer Vision
๐Ÿ’ฌ
NLP
๐Ÿ“ˆ

Big Data Technologies

โšก
Apache Spark
๐Ÿ˜
Hadoop
๐Ÿ“จ
Kafka
๐Ÿ”บ
Delta Lake
๐Ÿงฑ
Databricks
๐Ÿ—„๏ธ

Databases

๐Ÿ˜
PostgreSQL
๐Ÿƒ
MongoDB
๐Ÿ”—
Neo4j
๐Ÿ“Š

Data Visualization

๐Ÿ“ˆ
Tableau
๐Ÿ“‰
ggplot2
๐Ÿ“‹
Dashboards
๐ŸŒ

Web Technologies

๐ŸŽฎ
Three.js
๐ŸŽจ
WebGL
๐Ÿ”Œ
WebSocket
๐Ÿ”ฌ

Research & Analysis

๐Ÿ“
Statistical Analysis
โš™๏ธ
ETL Pipelines
๐Ÿงช
A/B Testing
๐Ÿ“
Academic Research
๐Ÿ› ๏ธ

Specialized Tools

๐Ÿ”„
Transformers
๐Ÿ—๏ธ
ResNet18
๐ŸŽญ
VAE/AE
๐Ÿ’ก
LightGBM
๐Ÿ“ก
IoT (ESP32)

PUBLICATIONS

Research Paper
2024

Streaming Feature Selection

Dilbar Isakova, Linhan Wang
Research Paper presented on Twelfth European Big Data Management & Analytics Summer School
(eBISS 2024) in University of Padua, Padova, Italy

Knowledge discovery for data streaming requires online feature selection to reduce the complexity of real-world datasets and significantly improve the learning process. This paper presents a comprehensive survey of feature selection (FS) algorithms for both static and dynamic environments, providing a detailed taxonomy that categorizes these methods based on search strategy, evaluation process, and feature structure.

Streaming Feature Selection Big Data Dimensionality Reduction Machine Learning Online Learning

Contact Me

Location
Paris, France
Email
dilbar.isakova@student-cs.fr
eesaack@gmail.com
Phone
+33 7 68 26 12 66