UIDAI

UIDAI Aadhaar Data Analysis & Visualization Platform

Author: Shuvam Banerji Seal’s Team

A comprehensive, production-grade data analysis pipeline for Aadhaar enrollment, demographic, and biometric datasets with ML-powered insights and interactive web visualization.


🌟 Highlights


πŸ“Š Dataset Summary

Dataset Records Unique States Unique Districts File Size
Biometric 3,531,034 36 948 603 MB
Demographic 1,597,301 36 952 265 MB
Enrollment 982,502 36 963 164 MB
Total 6,110,837 36 ~960 1.03 GB

πŸ—οΈ Project Structure

UIDAI_hackathon/
β”œβ”€β”€ analysis/
β”‚   └── codes/
β”‚       β”œβ”€β”€ utils/              # Utility modules
β”‚       β”‚   β”œβ”€β”€ parallel.py     # Multiprocessing utilities
β”‚       β”‚   β”œβ”€β”€ io_utils.py     # File I/O operations
β”‚       β”‚   β”œβ”€β”€ validators.py   # Data validation
β”‚       β”‚   └── progress.py     # Progress tracking
β”‚       β”œβ”€β”€ time_series/        # Time series analysis
β”‚       β”œβ”€β”€ geographic/         # Geographic analysis
β”‚       β”œβ”€β”€ demographic/        # Demographic analysis
β”‚       β”œβ”€β”€ statistical/        # Statistical analysis
β”‚       β”œβ”€β”€ ml_models/          # ML training & inference
β”‚       β”œβ”€β”€ api_clients/        # External API clients
β”‚       └── run_all_analyses.py # Master runner
β”œβ”€β”€ web/
β”‚   └── frontend/               # React/TypeScript dashboard
β”‚       β”œβ”€β”€ src/
β”‚       β”‚   β”œβ”€β”€ components/     # Reusable UI components
β”‚       β”‚   β”œβ”€β”€ pages/          # Page components
β”‚       β”‚   β”œβ”€β”€ hooks/          # Custom React hooks
β”‚       β”‚   └── utils/          # Utility functions
β”‚       └── public/data/        # Analysis results (JSON)
β”œβ”€β”€ Dataset/                    # Data files
β”‚   β”œβ”€β”€ api_data_aadhar_biometric/
β”‚   β”œβ”€β”€ api_data_aadhar_demographic/
β”‚   β”œβ”€β”€ api_data_aadhar_enrolment/
β”‚   └── corrected_dataset/      # Augmented datasets
└── docker/                     # Docker configuration

πŸš€ Quick Start

Prerequisites

Step 1: Clone & Setup

# Clone the repository
git clone https://github.com/Shuvam-Banerji-Seal/UIDAI_hackathon.git
cd UIDAI_hackathon

# If using Git LFS, pull large files
git lfs pull

# Create Python virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install Python dependencies
pip install pandas numpy scipy scikit-learn matplotlib tqdm

Step 2: Run Analysis

# Navigate to analysis codes
cd analysis/codes

# Run all analyses with sample data (fast, ~5 seconds)
python run_all_analyses.py --sample 5000

# Run all analyses with full dataset (slower, ~2-5 minutes)
python run_all_analyses.py

# Run with sequential processing (for debugging)
python run_all_analyses.py --sample 5000 --sequential

Analysis outputs are saved to:

Step 3: View Web Dashboard

# Navigate to frontend directory
cd ../../web/frontend

# Install npm dependencies
npm install

# Start development server
npm run dev

Open http://localhost:5173 in your browser.

Build for production:

npm run build    # Creates dist/ folder
npm run preview  # Preview production build

πŸ“ˆ Analysis Modules

Time Series Analysis

Geographic Analysis

Demographic Analysis

Statistical Analysis


πŸ–₯️ CLI Reference

Main Analysis Runner

# Show help
python run_all_analyses.py --help

# Run with sample data
python run_all_analyses.py --sample 10000

# Run specific analyses only
python run_all_analyses.py --analyses time_series,geographic

# Custom output directory
python run_all_analyses.py --output-dir ./my_results

# Sequential processing (useful for debugging)
python run_all_analyses.py --sequential

Individual Analyzers

# Time Series Analysis
from time_series.analyzer import run_time_series_analysis
results = run_time_series_analysis(sample_rows=5000)

# Geographic Analysis
from geographic.analyzer import run_geographic_analysis
results = run_geographic_analysis(sample_rows=5000)

# Demographic Analysis
from demographic.analyzer import run_demographic_analysis
results = run_demographic_analysis(sample_rows=5000)

# Statistical Analysis
from statistical.analyzer import run_statistical_analysis
results = run_statistical_analysis(sample_rows=5000)

🐳 Docker Support

# Build and run analysis
docker-compose up analysis

# Run frontend development server
docker-compose up web-dev

# Build frontend for production
docker-compose up web-build

# Run all services
docker-compose up

πŸ“ Output Files

Results Directory Structure

results/
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ time_series/     # Time series JSON files
β”‚   β”œβ”€β”€ geographic/      # Geographic JSON files
β”‚   β”œβ”€β”€ demographic/     # Demographic JSON files
β”‚   └── statistical/     # Statistical JSON files
β”œβ”€β”€ exports/
β”‚   β”œβ”€β”€ json/           # Combined JSON exports
β”‚   └── csv/            # CSV exports
└── analysis_summary.json

Web Data Files

web/frontend/public/data/
β”œβ”€β”€ analysis_summary.json
β”œβ”€β”€ time_series.json
β”œβ”€β”€ geographic.json
β”œβ”€β”€ demographic.json
β”œβ”€β”€ statistical.json
└── ml_results.json

πŸ”§ Configuration

Analysis Configuration

Edit analysis/codes/config.py to customize:

Frontend Configuration

Edit web/frontend/vite.config.ts for:


✨ Features

Data Processing

Data Augmentation

Web Dashboard


πŸ“Š Data Quality Metrics

Metric Biometric Demographic Enrollment
Duplicates Removed 1,700 364,958 1,119
Census Coverage 100% 100% 100%
State Normalization 36/36 36/36 36/36

πŸ› οΈ Technology Stack

Backend (Python)

Frontend (TypeScript)


πŸ› Troubleshooting

Common Issues

1. β€œModule not found” errors

# Ensure you're in the correct directory
cd analysis/codes

# Or add to Python path
export PYTHONPATH="${PYTHONPATH}:/path/to/UIDAI_hackathon/analysis/codes"

2. Memory errors with large datasets

# Use sample data
python run_all_analyses.py --sample 10000

3. Frontend build errors

# Clear node_modules and reinstall
rm -rf node_modules package-lock.json
npm install

4. Dataset not found

# Check if augmented datasets exist
ls -la Dataset/corrected_dataset/biometric/
ls -la Dataset/corrected_dataset/demographic/
ls -la Dataset/corrected_dataset/enrollement/

πŸ“ License

MIT License - See LICENSE file for details.


πŸ‘₯ Authors

Shuvam Banerji Seal’s Team


Made with ❀️ for UIDAI Hackathon