Chapter 5 Assessing Model Performance

The classification rates of the models are hard to interpret alone. For example, is predicting 76% of the 2021 tournament games impressive or just better than the other models? In order to assess model performance, I decided to use the selection committee’s ranking as a baseline, comparing the model’s accuracy to the accuracy of just picking the higher seeded team (known as a “Chalk Bracket”).

The classification rates for the “Chalk” method and NN/RF/XG Ensemble method are in the table below. Just picking the higher seed was actually pretty effective for the 2019 tournament, resulting in an accuracy of 72%. However, the ensemble model still performed 6% better than the “Chalk” method in 2019. The “Chalk” method was much less accurate in picking the 2021 tournament games, coinciding with only a 61% classification rate. The ensemble model far outperformed the “Chalk” method during the 2021 tournament with a 17% higher accuracy score.

Table 5.1: NN/RF/XG Assessment
	Chalk Accuracy	Model Accuracy	Percent Difference
2019 Tournament	71.64%	76.12%	6.06%
2021 Tournament	60.61%	71.21%	16.85%

The ensemble model performed better than picking higher seeds in two very different tournament scenarios with an accuracy above 70%. Combining all three predictions increased the generalizability of the model while preserving accurate predictions.