Research Report

Kane, T. J., & Staiger, D. O. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation.Abstract

The authors used a random-assignment experiment in Los Angeles Unified School District to evaluate various non-experimental methods for estimating teacher effects on student test scores. Having estimated teacher effects during a pre-experimental period, they used these estimates to predict student achievement following random assignment of teachers to classrooms. While all of the teacher effect estimates considered were significant predictors of student achievement under random assignment, those that controlled for prior student test scores yielded unbiased predictions and those that further controlled for mean classroom characteristics yielded the best prediction accuracy. In both the experimental and non-experimental data, the authors found that teacher effects faded out by roughly 50 percent per year in the two years following teacher assignment.

McGinn, D., Kelcey, B., Hill, H., & Chin, M. (Working Paper). Using Item Response Theory to Learn about Observational Instruments.Abstract

As many states are slated to soon use scores derived from classroom observation instruments in high-stakes decisions, developers must cultivate methods for improving the functioning of these instruments. We show how multidimensional, multilevel item response theory models can yield information critical for improving the performance of observational instruments.

West, M. R., Kraft, M. A., Finn, A. S., Duckworth, A. L., Gabrieli, C. F. O., & Gabrieli, J. D. E. (2014). Promise and Paradox: Measuring Students' Non-cognitive Skills and the Impact of Schooling.Abstract

The authors used self-report surveys to gather information on a broad set of non-cognitive skills from 1,368 eighth-grade students attending Boston Public Schools and linked this information to administrative data on their demographics and test scores. At the student level, scales measuring conscientiousness, self-control, grit, and growth mindset are positively correlated with attendance, behavior, and test-score gains between fourth- and eighth-grade. Conscientiousness, self-control, and grit are unrelated to test-score gains at the school level, however, and students attending over-subscribed charter schools with higher average test-score gains score lower on these scales than do students attending district schools. Exploiting charter school admissions lotteries, the authors replicate previous findings indicating positive impacts of charter school attendance on math achievement, but find negative impacts on these non-cognitive skills. The authors provide suggestive evidence that these paradoxical results are driven by reference bias, or the tendency for survey responses to be influenced by social context. The results therefore highlight the importance of improved measurement of non-cognitive skills in order to capitalize on their promise as a tool to inform education practice and policy.

Abdulkadiroglu, A., Angrist, J., Cohodes, S., Dynarski, S., Fullerton, J., Kane, T., & Pathak, P. (2009). Informing the Debate: Comparing Boston's Charter, Pilot, and Traditional Schools.Abstract

Whether using the randomized lotteries or statistical controls for measured background characteristics, we generally find large positive effects for Charter Schools, at both the middle school and high school levels. For each year of attendance in middle school, we estimate that Charter Schools raise student achievement .09 to .17 standard deviations in English Language Arts and .18 to .54 standard deviations in math relative to those attending traditional schools in the Boston Public Schools. The estimated impact on math achievement for Charter middle schools is extraordinarily large. Increasing performance by .5 standard deviations is the same as moving from the 50th to the 69th percentile in student performance. This is roughly half the size of the blackwhite achievement gap. In high school, the estimated gains are somewhat smaller than in middle school: .16 to .19 standard deviations in English Language Arts; .16 to .19 in mathematics; .2 to .28 in writing topic development; and .13 to .17 in writing composition with the lottery-based results. The estimated impacts of middle schools and high school Charters are similar in both the “observational” and “lottery-based” results.

Angrist, J. D., Cohodes, S. R., Dynarski, S. M., Fullerton, J. B., Kane, T. J., Pathak, P. A., & Walters, C. R. (2011). Student Achievement in Massachusetts' Charter Schools.Abstract

Researchers from the Harvard Graduate School of Education, MIT, and the University of Michigan have released the results of a new study that suggests that urban charter schools in Massachusetts have large positive effects on student achievement at both the middle and high school levels. Results for nonurban charter schools were less clear; some analyses indicated positive effects on student achievement at the high school level, while results for middle school students were much less encouraging.

View the Press Release

View the PowerPoint Presentation

Cascio, E. U., & Staiger, D. O. (2012). Knowledge, Tests, and Fadeout in Educational Interventions. Publisher's VersionAbstract

Educational interventions are often evaluated and compared on the basis of their impacts on test scores. Decades of research have produced two empirical regularities: interventions in later grades tend to have smaller effects than the same interventions in earlier grades, and the test score impacts of early educational interventions almost universally “fade out” over time. This paper explores whether these empirical regularities are an artifact of the common practice of rescaling test scores in terms of a student’s position in a widening distribution of knowledge. If a standard deviation in test scores in later grades translates into a larger difference in knowledge, an intervention’s effect on normalized test scores may fall even as its effect on knowledge does not. We evaluate this hypothesis by fitting a model of education production to correlations in test scores across grades and with college-going using both administrative and survey data. Our results imply that the variance in knowledge does indeed rise as children progress through school, but not enough for test score normalization to fully explain these empirical regularities.

Chin, M., Hill, H., McGinn, D., Staiger, D., & Buckley, K. (2013). Using Validity Criteria to Enable Model Selection: An Exploratory Analysis. Association for Public Policy Analysis and Management Fall Research Conference.Abstract

In this paper, the authors propose that an important determinant of value-added model choice should be alignment with alternative indicators of teacher and teaching quality. Such alignment makes sense from a theoretical perspective because better alignment is thought to indicate more valid systems. To provide initial evidence on this issue, they first calculated value-added scores for all fourth and fifth grade teachers within four districts, then extracted scores for 160 intensively studied teachers.Initial analyses using a subset of alternative indicators suggest that alignment between value-added scores and alternative indicators differ by model, though not significantly.

Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National Board Certification and Teacher Effectiveness: Evidence from a Random Assignment Experiment.Abstract

The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. For this study, the authors compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers. The authors conclude that students assigned to highly-rated applicants outperformed those in the comparison classrooms by more than those assigned to poorly-rated teachers. Moreover, the estimates with and without random assignment were similar.

Kraft, M. A., & Papay, J. P. (2014). Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Evaluation and Policy Analysis , 36 (4), 476-500. Publisher's VersionAbstract

Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.

Bacher-Hicks, A., Chin, M., Hill, H., & Staiger, D. (Working Paper). Explaining Teacher Effects on Achievement Using Measures from Multiple Research Traditions.Abstract

Researchers have identified many characteristics of teachers and teaching that contribute to student outcomes. However, most studies investigate only a small number of these characteristics, likely underestimating the overall contribution. In this paper, we use a set of 28 teacher-level predictors drawn from multiple research traditions to explain teacher-level variation in student outcomes. These predictors collectively explain 28% of teacher-level variability in state standardized math test scores and 40% in a predictor-aligned math test. In addition, each individual predictor explains only a small, relatively unique portion of the total teacher-level variability. This first finding highlights the importance of choosing predictors and outcomes that are well aligned, and the second suggests that the phenomena underlying teacher effects is multidimensional. 

Kelcey, B., McGinn, D., Hill, H. C., & Charalambous, C. (Working Paper). The Generalizability of Item Parameters Across Lessons.Abstract

The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.

Kane, T. J., Taylor, E., Tyler, J., & Wooten, A. (2011). Identifying Effective Classroom Practices Using Student Achievement Data. The Journal of Human Resources , 46 (3), 587-613.Abstract

This paper combines information from classroom-based observations and measures of teachers’ ability to improve student achievement as a step toward addressing the challenge of identifying effective teachers and teaching practices. The authors find that classroom-based measures of teaching effectiveness are related in substantial ways to student achievement growth. The authors conclude that the results point to the promise of teacher evaluation systems that would use information from both classroom observations and student test scores to identify effective teachers. Information on the types of practices that are most effective at raising achievement is also highlighted.

Lynch, K., Chin, M., & Blazar, D. (2013). How Well Do Teacher Observations Predict Value-Added? Exploring Variability Across Districts. In Association for Public Policy Analysis & Management Fall Research Conference . Washington, DC.Abstract

In this study we ask: Do observational instruments predict teachers' value-added equally well across different state tests and district/state contexts? And, to what extent are differences in these correlations a function of the match between the observation instrument and tested content? We use data from the Gates Foundation-funded Measures of Effective Teaching (MET) Project(N=1,333) study of elementary and middle school teachers from six large public school districts,and from a smaller (N=250) study of fourth- and fifth-grade math teachers from four large public school districts. Early results indicate that estimates of the relationship between teachers' value-added scores and their observed classroom instructional quality differ considerably by district.

Hill, H. C., Gogolen, C., Litke, E., Humez, A., Blazar, D., Corey, D., Barmore, J., et al. (2013). Examining High and Low Value-Added Mathematics: Can Expert Observers Tell the Difference? In Association for Public Policy Analysis & Management Fall Research Conference . Washington, DC.Abstract

In this study, we use value-added scores and video data in order to mount an exploratory study of high- and low-VAM teachers' instruction. Specifically, we seek to answer two research questions: First, can expert observers of mathematics instruction distinguish between high- and low-VAM teachers solely by observing their instruction? Second, what instructional practices, if any, consistently characterize high but not low-VAM teacher classrooms? To answer these questions, we use data generated by 250 fourth- and fifth-grade math teachers and their students in four large public school districts.Preliminary analyses indicate that a teacher's value-added rank was often not obvious to this team of expert observers.

Pages