NHANES Age Prediction: Summer Analytics 2025 Hackathon Experience


NHANES Age Prediction πŸ§ πŸ“Š

In July 2025, I participated in the Summer Analytics Hackathon hosted by IIT Guwahati on AI Planet. The challenge was to predict whether a person belonged to the Adult (below 60) or Senior (60+) age group based on health and nutritional survey data from the NHANES dataset.


πŸ† Hackathon Highlights

  • Hackathon Name: Summer Analytics 2025
  • Organized By: Consulting and Analytics Club, IIT Guwahati
  • Dataset: NHANES (National Health and Nutrition Examination Survey)
  • Goal: Classify individuals into Adult or Senior age groups.
  • Leaderboard Ranks:
    • Private leaderboard: 176 / 220
    • Public leaderboard: 160 / 236

πŸ§‘β€πŸ’» My Approach

πŸ“ Data Preprocessing

  • Cleaned missing values using mean imputation.
  • Converted categorical columns (e.g., Gender, Diabetes indicator) to numerical form.
  • Removed rows where the target variable was missing.

πŸ›  Feature Engineering

  • Created interaction features like:
    • BMI x Glucose β†’ to capture relationships between obesity & glucose levels.
    • BMI x Insulin β†’ to model insulin resistance tendencies.

πŸ€– Models Tried

  • Baseline: Logistic Regression.
  • Advanced Models:
    • Random Forest (F1 ~30%)
    • XGBoost (F1 ~35%)
    • Stacking Ensemble (F1 ~37%)

πŸš€ The Stacking model combined XGBoost, Random Forest, and Logistic Regression to improve overall performance.


πŸ“ˆ Results

ModelF1 Score
Logistic Regression18%
Random Forest30%
XGBoost35%
Stacking Ensemble37%

πŸ“Š Top features (via XGBoost importance):

  • BMI
  • Glucose level
  • Insulin level
  • Interaction features like BMI x Glucose.

πŸ’‘ Key Learnings

βœ… Handling imbalanced classes using class_weight and scale_pos_weight.
βœ… Importance of feature engineering in tabular data.
βœ… How stacking can combine weak learners into a stronger predictor.


πŸ“œ Takeaways

This hackathon was a great hands-on experience in healthcare analytics. It taught me how to handle real-world datasets with missing values and imbalances and boosted my skills in ensemble learning.

🌟 β€œAchieved top 50% rank in a national-level hackathon by building an XGBoost-based age prediction model with advanced feature engineering and stacking ensemble techniques.”