Chapter 5 Assessing Model Performance
The classification rates of the models are hard to interpret alone. For example, is predicting 76% of the 2021 tournament games impressive or just better than the other models? In order to assess model performance, I decided to use the selection committee’s ranking as a baseline, comparing the model’s accuracy to the accuracy of just picking the higher seeded team (known as a “Chalk Bracket”).
The classification rates for the “Chalk” method and NN/RF/XG Ensemble method are in the table below. Just picking the higher seed was actually pretty effective for the 2019 tournament, resulting in an accuracy of 72%. However, the ensemble model still performed 6% better than the “Chalk” method in 2019. The “Chalk” method was much less accurate in picking the 2021 tournament games, coinciding with only a 61% classification rate. The ensemble model far outperformed the “Chalk” method during the 2021 tournament with a 17% higher accuracy score.
Chalk Accuracy | Model Accuracy | Percent Difference | |
---|---|---|---|
2019 Tournament | 71.64% | 76.12% | 6.06% |
2021 Tournament | 60.61% | 71.21% | 16.85% |
The ensemble model performed better than picking higher seeds in two very different tournament scenarios with an accuracy above 70%. Combining all three predictions increased the generalizability of the model while preserving accurate predictions.