Juanda Airport Visitor Forecasting

Random Forest Regressor produced the strongest notebook result with R2 0.869 and MAE around 48k visitors on the 80-20 split.

  • Random Forest
  • Linear Regression
  • +2 more

Skim the case by context, role, result, and evidence.

Context

A time-series forecasting case study predicting monthly Juanda Airport visitor demand from 2014-2023 data.

My role

Data science practitioner

Result

Random Forest Regressor produced the strongest notebook result with R2 0.869 and MAE around 48k visitors on the 80-20 split.

Evidence

120 monthly observations from 2014-2023

Context

Airport visitor demand changes across time because of seasonality, travel patterns, and external operational factors. This academic forecasting case study used monthly Juanda Airport visitor data from 2014-2023 to compare baseline, ensemble, and hybrid deep learning approaches.

Problem

The core problem was to forecast future visitor counts from historical time-series data while keeping the comparison understandable and measurable. The data also contained a sharp mobility drop during the pandemic period, so preprocessing and model comparison mattered as much as the final score.

My Role

I handled data preparation, descriptive analysis, outlier handling, normalization, model setup, baseline comparison, and forecasting evaluation.

Evidence

Dashboard showing Juanda Airport visitor forecasting data, monthly trend, yearly average, and project metrics
Forecasting overview: 120 monthly observations, historical demand movement, and summary statistics for the airport visitor dataset.

Approach

  • Cleaned the monthly visitor dataset and checked missing values, duplicates, and descriptive statistics.
  • Handled low outliers using a quantile-based rule, then normalized values with Min-Max Scaling.
  • Compared Linear Regression, Random Forest Regressor, and CNN-LSTM across multiple train-test split scenarios.
  • Evaluated model performance using R2, MAE, MSE, and MAPE.
  • Selected the strongest notebook result based on both explanatory power and error size.

Key Decisions

The model selection stayed evidence-led. Although CNN-LSTM was included as the more complex sequence model, Random Forest produced the strongest result in the notebook and was easier to explain as a portfolio-ready baseline for non-linear demand behavior.

Model comparison for Juanda forecasting showing R2 and MAE across Linear Regression, Random Forest, and CNN-LSTM
Model comparison: Random Forest reached R2 0.869 with MAE around 48k visitors on the 80-20 split.

Result

Random Forest Regressor became the strongest model in the notebook, reaching R2 0.869 and MAE around 48k visitors on the 80-20 split. The result suggests that non-linear ensemble modeling was a better fit for this dataset than a simple linear baseline or the tested CNN-LSTM setup.

What I’d Improve

I would replace random train-test splitting with chronological split or walk-forward validation, add seasonality features such as month and holiday periods, compare against ARIMA/SARIMA or boosting models, and visualize forecast intervals.

Back to the full project index.

Return to all case studies and filter by modeling, automation, decision support, or analytics.

View all projects