The well established, gold standard approach to finding out what works in education research is to run a randomized controlled trial (RCT) using a standard pre-test and post-test design. RCTs have been used in the intelligent tutoring community for decades to determine which questions and tutorial feedback work best. Practically speaking, however, ITS creators need to make decisions on what content to deploy without the luxury of running an RCT. Additionally, most log data produced by an ITS is not in a form that can be evaluated for learning effectiveness with traditional methods. As a result, there is much data produced by tutoring systems that we as education researchers would like to be learning from but are not. In prior work we introduced one approach to this problem: a Bayesian knowledge tracing derived method that could analyze the log data of a tutoring system to determine which items were most effective for learning among a set of items of the same skill. The method was validated by way of simulations. In the current work we further evaluate this method and introduce a second, learning gain, analysis method for comparison. These methods were applied to 11 experiment datasets that investigated the effectiveness of various forms of tutorial help in a web-based math tutoring system. We found that the tutorial help chosen by the Bayesian method as having the highest rate of learning agreed with the learning gain analysis in 10 out of 11 of the experiments. An additional simulation study is presented comparing the statistical power of each method given different sample sizes. The practical impact of this work is an abundance of knowledge about what works that can now be learned from the thousands of experimental designs intrinsic in datasets of tutoring systems that assign items or feedback conditions in an individually-randomized order.