National Center for Teacher Effectiveness

Project Status: Current
Focus Area: Teacher Effectiveness
Location: Massachusetts, Georgia & Washington, D.C.

How are multiple measures used in teacher evaluation related to one another and student learning? 

In July 2009, NCTE commenced a six-year effort to join disparate strands of education research, and develop a deeper and more comprehensive understanding of how to measure teacher and teaching effectiveness. NCTE is developing valid measures of effective mathematics teaching to be shared with practitioners, policymakers, and researchers. The measures may help target and plan teacher training, and improve teacher observation and feedback processes. 

There are three key strands of the work:

  • The core study Developing Measures of Effective Mathematics Teaching, which included a extensive data collection effort, and the development of valid and reliable tools to the field of education. Read the research overview.
  • Supplementary studies that aim to be responsive to the needs of education practitioners and policymakers. These studies investigate professional environments, teacher effects, teacher evaluation systems, and item response theory
  • National leadership activities such as conferences and webinars. You can learn more about these topics by accessing our resources below. 

The project is led by Harvard Graduate School of Education Professors Thomas J. Kane and Heather Hill, Dartmouth College Professor Douglas O. Staiger, and Project Director Corinne Herlihy.

Bacher-Hicks, A., Chin, M., Hill, H., & Staiger, D. (Working Paper, April 3, 2014). Explaining Teacher Effects on Achievement Using Measures from Multiple Research Traditions.Abstract

Researchers have identified many characteristics of teachers and teaching that contribute to student outcomes. However, most studies investigate only a small number of these characteristics, likely underestimating the overall contribution. In this paper, we use a set of 28 teacher-level predictors drawn from multiple research traditions to explain teacher-level variation in student outcomes. These predictors collectively explain 28% of teacher-level variability in state standardized math test scores and 40% in a predictor-aligned math test. In addition, each individual predictor explains only a small, relatively unique portion of the total teacher-level variability. This first finding highlights the importance of choosing predictors and outcomes that are well aligned, and the second suggests that the phenomena underlying teacher effects is multidimensional. 

Kelcey, B., McGinn, D., Hill, H. C., & Charalambous, C. (Working Paper, April 2, 2014). The Generalizability of Item Parameters Across Lessons.Abstract

The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.

Blazar, D., Litke, E., Barmore, J., & Gogolen, C. (Working Paper). What Does It Mean to be Ranked a "High" or "Low" Value-Added Teacher? Observing Differences in Instructional Quality Across Districts.Abstract

Education agencies are evaluating teachers using student achievement data. However, very little is known about the comparability of test-based or "value-added" metrics across districts and the extent to which they capture variability in classroom practices. Drawing on data from four urban districts, we find that teachers are categorized differently when compared within versus across districts. In addition, analyses of scores from two observation instruments, as well qualitative viewing of lesson videos identify stark differences in instructional practices across districts among teachers who receive similar within-district value-added rankings. Exploratory analyses suggest that these patterns are not explained by observable background characteristics of teachers and that factors beyond labor market sorting likely play a key role. 

McGinn, D., Kelcey, B., Hill, H., & Chin, M. (Working Paper). Using Item Response Theory to Learn about Observational Instruments.Abstract

As many states are slated to soon use scores derived from classroom observation instruments in high-stakes decisions, developers must cultivate methods for improving the functioning of these instruments. We show how multidimensional, multilevel item response theory models can yield information critical for improving the performance of observational instruments.

Blazar, D., Braslow, D., Charalambous, C., & Hill, H. C. (2015). Attending to General and Content-Specific Dimensions of Teaching: Exploring Factors Across Two Observation Instruments.Abstract

New observation instruments used in research and evaluation settings assess teachers along multiple domains of teaching practice, both general and content-specific. However, this work infrequently explores the relationship between these domains. In this study, we use exploratory and confirmatory factor analyses of two observation instruments - the Classroom Assessment Scoring System (CLASS) and the Mathematical Quality of Instruction (MQI) - to explore the extent to which we might integrate both general and content-specific view of teaching. Importantly, bi-factor analyses that account for instrument-specific variation enable more robust conclusions than in existing literature. Findings indicate that there is some overlap between instruments, but that the best factor structures include both general and content-specific practices. This suggests new approaches to measuring mathematics instruction for the purposes of evaluation and professional development. 

Blazar, D. (2015). Effective teaching in elementary mathematics: Identifying classroom practices that support student achievement. Economics of Education Review , 48, 16-29. Publisher's VersionAbstract

Recent investigations into the education production function have moved beyond traditional teacher inputs, such as education, certification, and salary, focusing instead on observational measures of teaching practice. However, challenges to identification mean that this work has yet to coalesce around specific instructional dimensions that increase student achievement. I build on this discussion by exploiting within-school, between-grade, and cross-cohort variation in scores from two observation instruments; further, I condition on a uniquely rich set of teacher characteristics, practices, and skills. Findings indicate that inquiry-oriented instruction positively predicts student achievement. Content errors and imprecisions are negatively related, though these estimates are sensitive to the set of covariates included in the model. Two other dimensions of instruction, classroom emotional support and classroom organization, are not related to this outcome. Findings can inform recruitment and development efforts aimed at improving the quality of the teacher workforce. 

Education Agencies

With the goal of positioning ourselves as a national resource on teacher effectiveness research, we have partnered with districts in Massachusetts, Georgia and Washington, D.C. to conduct rigorous research, develop tools, and share best practices and lessons learned in teacher evaluation and professional development.


The National Center for Teacher Effectiveness is supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305C090023 to the Center for Education Policy Research at Harvard University.