We present the results of a study that explored the emotions experienced by students during interaction with an educational game for math (Heroes of Math Island). Starting from emotion frameworks in affective computing and education, we considered a larger set of emotions than in related research. For emotion labeling, we started from a standard methodology that relies on trained judges to report emotions over 20-s intervals, however, we asked judges to report all observed emotions in each interval, as opposed to only choosing one, as is standard practice. This variation allows us to discuss the appropriateness of this interval for emotion labeling. We present a detailed analysis of interrater reliability, both aggregated and over individual students, that considers not only labeling agreement among judges in terms of emotion type, but also with respect to the number of emotions detected. We also provide an analysis based on in-depth one-to-one interviews with judges, to gain insights on the challenges they encountered in labeling emotions.