Author: Shuvam Banerji Sealβs Team
A comprehensive, production-grade data analysis pipeline for Aadhaar enrollment, demographic, and biometric datasets with ML-powered insights and interactive web visualization.
| Dataset | Records | Unique States | Unique Districts | File Size |
|---|---|---|---|---|
| Biometric | 3,531,034 | 36 | 948 | 603 MB |
| Demographic | 1,597,301 | 36 | 952 | 265 MB |
| Enrollment | 982,502 | 36 | 963 | 164 MB |
| Total | 6,110,837 | 36 | ~960 | 1.03 GB |
UIDAI_hackathon/
βββ analysis/
β βββ codes/
β βββ utils/ # Utility modules
β β βββ parallel.py # Multiprocessing utilities
β β βββ io_utils.py # File I/O operations
β β βββ validators.py # Data validation
β β βββ progress.py # Progress tracking
β βββ time_series/ # Time series analysis
β βββ geographic/ # Geographic analysis
β βββ demographic/ # Demographic analysis
β βββ statistical/ # Statistical analysis
β βββ ml_models/ # ML training & inference
β βββ api_clients/ # External API clients
β βββ run_all_analyses.py # Master runner
βββ web/
β βββ frontend/ # React/TypeScript dashboard
β βββ src/
β β βββ components/ # Reusable UI components
β β βββ pages/ # Page components
β β βββ hooks/ # Custom React hooks
β β βββ utils/ # Utility functions
β βββ public/data/ # Analysis results (JSON)
βββ Dataset/ # Data files
β βββ api_data_aadhar_biometric/
β βββ api_data_aadhar_demographic/
β βββ api_data_aadhar_enrolment/
β βββ corrected_dataset/ # Augmented datasets
βββ docker/ # Docker configuration
# Clone the repository
git clone https://github.com/Shuvam-Banerji-Seal/UIDAI_hackathon.git
cd UIDAI_hackathon
# If using Git LFS, pull large files
git lfs pull
# Create Python virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install Python dependencies
pip install pandas numpy scipy scikit-learn matplotlib tqdm
# Navigate to analysis codes
cd analysis/codes
# Run all analyses with sample data (fast, ~5 seconds)
python run_all_analyses.py --sample 5000
# Run all analyses with full dataset (slower, ~2-5 minutes)
python run_all_analyses.py
# Run with sequential processing (for debugging)
python run_all_analyses.py --sample 5000 --sequential
Analysis outputs are saved to:
results/ - Detailed JSON/CSV filesweb/frontend/public/data/ - JSON for web dashboard# Navigate to frontend directory
cd ../../web/frontend
# Install npm dependencies
npm install
# Start development server
npm run dev
Open http://localhost:5173 in your browser.
Build for production:
npm run build # Creates dist/ folder
npm run preview # Preview production build
# Show help
python run_all_analyses.py --help
# Run with sample data
python run_all_analyses.py --sample 10000
# Run specific analyses only
python run_all_analyses.py --analyses time_series,geographic
# Custom output directory
python run_all_analyses.py --output-dir ./my_results
# Sequential processing (useful for debugging)
python run_all_analyses.py --sequential
# Time Series Analysis
from time_series.analyzer import run_time_series_analysis
results = run_time_series_analysis(sample_rows=5000)
# Geographic Analysis
from geographic.analyzer import run_geographic_analysis
results = run_geographic_analysis(sample_rows=5000)
# Demographic Analysis
from demographic.analyzer import run_demographic_analysis
results = run_demographic_analysis(sample_rows=5000)
# Statistical Analysis
from statistical.analyzer import run_statistical_analysis
results = run_statistical_analysis(sample_rows=5000)
# Build and run analysis
docker-compose up analysis
# Run frontend development server
docker-compose up web-dev
# Build frontend for production
docker-compose up web-build
# Run all services
docker-compose up
results/
βββ analysis/
β βββ time_series/ # Time series JSON files
β βββ geographic/ # Geographic JSON files
β βββ demographic/ # Demographic JSON files
β βββ statistical/ # Statistical JSON files
βββ exports/
β βββ json/ # Combined JSON exports
β βββ csv/ # CSV exports
βββ analysis_summary.json
web/frontend/public/data/
βββ analysis_summary.json
βββ time_series.json
βββ geographic.json
βββ demographic.json
βββ statistical.json
βββ ml_results.json
Edit analysis/codes/config.py to customize:
Edit web/frontend/vite.config.ts for:
| Metric | Biometric | Demographic | Enrollment |
|---|---|---|---|
| Duplicates Removed | 1,700 | 364,958 | 1,119 |
| Census Coverage | 100% | 100% | 100% |
| State Normalization | 36/36 | 36/36 | 36/36 |
1. βModule not foundβ errors
# Ensure you're in the correct directory
cd analysis/codes
# Or add to Python path
export PYTHONPATH="${PYTHONPATH}:/path/to/UIDAI_hackathon/analysis/codes"
2. Memory errors with large datasets
# Use sample data
python run_all_analyses.py --sample 10000
3. Frontend build errors
# Clear node_modules and reinstall
rm -rf node_modules package-lock.json
npm install
4. Dataset not found
# Check if augmented datasets exist
ls -la Dataset/corrected_dataset/biometric/
ls -la Dataset/corrected_dataset/demographic/
ls -la Dataset/corrected_dataset/enrollement/
MIT License - See LICENSE file for details.
Shuvam Banerji Sealβs Team
Made with β€οΈ for UIDAI Hackathon