NHANES Age Prediction π§ π
In July 2025, I participated in the Summer Analytics Hackathon hosted by IIT Guwahati on AI Planet. The challenge was to predict whether a person belonged to the Adult (below 60) or Senior (60+) age group based on health and nutritional survey data from the NHANES dataset.
π Hackathon Highlights
- Hackathon Name: Summer Analytics 2025
- Organized By: Consulting and Analytics Club, IIT Guwahati
- Dataset: NHANES (National Health and Nutrition Examination Survey)
- Goal: Classify individuals into Adult or Senior age groups.
- Leaderboard Ranks:
- Private leaderboard: 176 / 220
- Public leaderboard: 160 / 236
π§βπ» My Approach
π Data Preprocessing
- Cleaned missing values using mean imputation.
- Converted categorical columns (e.g., Gender, Diabetes indicator) to numerical form.
- Removed rows where the target variable was missing.
π Feature Engineering
- Created interaction features like:
BMI x Glucoseβ to capture relationships between obesity & glucose levels.BMI x Insulinβ to model insulin resistance tendencies.
π€ Models Tried
- Baseline: Logistic Regression.
- Advanced Models:
- Random Forest (F1 ~30%)
- XGBoost (F1 ~35%)
- Stacking Ensemble (F1 ~37%)
π The Stacking model combined XGBoost, Random Forest, and Logistic Regression to improve overall performance.
π Results
| Model | F1 Score |
|---|---|
| Logistic Regression | 18% |
| Random Forest | 30% |
| XGBoost | 35% |
| Stacking Ensemble | 37% |
π Top features (via XGBoost importance):
BMIGlucose levelInsulin level- Interaction features like
BMI x Glucose.
π‘ Key Learnings
β
Handling imbalanced classes using class_weight and scale_pos_weight.
β
Importance of feature engineering in tabular data.
β
How stacking can combine weak learners into a stronger predictor.
π Takeaways
This hackathon was a great hands-on experience in healthcare analytics. It taught me how to handle real-world datasets with missing values and imbalances and boosted my skills in ensemble learning.
π βAchieved top 50% rank in a national-level hackathon by building an XGBoost-based age prediction model with advanced feature engineering and stacking ensemble techniques.β