National Center for Teacher Effectiveness

Blazar, D., Braslow, D., Charalambous, C., & Hill, H. C. (2015). Attending to General and Content-Specific Dimensions of Teaching: Exploring Factors Across Two Observation Instruments.Abstract

New observation instruments used in research and evaluation settings assess teachers along multiple domains of teaching practice, both general and content-specific. However, this work infrequently explores the relationship between these domains. In this study, we use exploratory and confirmatory factor analyses of two observation instruments - the Classroom Assessment Scoring System (CLASS) and the Mathematical Quality of Instruction (MQI) - to explore the extent to which we might integrate both general and content-specific view of teaching. Importantly, bi-factor analyses that account for instrument-specific variation enable more robust conclusions than in existing literature. Findings indicate that there is some overlap between instruments, but that the best factor structures include both general and content-specific practices. This suggests new approaches to measuring mathematics instruction for the purposes of evaluation and professional development. 

Research Overview

In the fall of 2010, NCTE commenced three years of data collection and observations of math instruction across approximately 50 schools and 300 classrooms, respectively. Data was collected from several sources, including multiple classroom observations, student assessment data, and teacher surveys to accurately capture how teachers impact student achievement.

Learn more about the findings in the teacher...

Read more about Research Overview
Hill, H. C., Gogolen, C., Litke, E., Humez, A., Blazar, D., Corey, D., Barmore, J., et al. (2013). Examining High and Low Value-Added Mathematics: Can Expert Observers Tell the Difference? In Association for Public Policy Analysis & Management Fall Research Conference . Washington, DC.Abstract

In this study, we use value-added scores and video data in order to mount an exploratory study of high- and low-VAM teachers' instruction. Specifically, we seek to answer two research questions: First, can expert observers of mathematics instruction distinguish between high- and low-VAM teachers solely by observing their instruction? Second, what instructional practices, if any, consistently characterize high but not low-VAM teacher classrooms? To answer these questions, we use data generated by 250 fourth- and fifth-grade math teachers and their students in four large public school districts.Preliminary analyses indicate that a teacher's value-added rank was often not obvious to this team of expert observers.

Chin, M., Hill, H., McGinn, D., Staiger, D., & Buckley, K. (2013). Using Validity Criteria to Enable Model Selection: An Exploratory Analysis. Association for Public Policy Analysis and Management Fall Research Conference.Abstract

In this paper, the authors propose that an important determinant of value-added model choice should be alignment with alternative indicators of teacher and teaching quality. Such alignment makes sense from a theoretical perspective because better alignment is thought to indicate more valid systems. To provide initial evidence on this issue, they first calculated value-added scores for all fourth and fifth grade teachers within four districts, then extracted scores for 160 intensively studied teachers.Initial analyses using a subset of alternative indicators suggest that alignment between value-added scores and alternative indicators differ by model, though not significantly.

Kraft, M. A., & Papay, J. P. (2014). Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Evaluation and Policy Analysis , 36 (4), 476-500. Publisher's VersionAbstract

Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.

Bacher-Hicks, A., Chin, M., Hill, H., & Staiger, D. (Working Paper). Explaining Teacher Effects on Achievement Using Measures from Multiple Research Traditions.Abstract

Researchers have identified many characteristics of teachers and teaching that contribute to student outcomes. However, most studies investigate only a small number of these characteristics, likely underestimating the overall contribution. In this paper, we use a set of 28 teacher-level predictors drawn from multiple research traditions to explain teacher-level variation in student outcomes. These predictors collectively explain 28% of teacher-level variability in state standardized math test scores and 40% in a predictor-aligned math test. In addition, each individual predictor explains only a small, relatively unique portion of the total teacher-level variability. This first finding highlights the importance of choosing predictors and outcomes that are well aligned, and the second suggests that the phenomena underlying teacher effects is multidimensional. 

Hill, H. C., & Grossman, P. (2013). Learning from Teacher Observations: Challenges and Opportunities Posed by New Teacher Evaluation Systems. Harvard Educational Review.Abstract

In this article, Heather Hill and Pam Grossman discuss the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts. They argue that if these teacher observation instruments are to achieve the goal of supporting teachers in improving instructional practice, they must be subject-specific, involve content experts in the process of observation, and provide information that is both accurate and useful for teachers. They discuss the instruments themselves, raters and system design, and timing of and feedback from the observations. They conclude by outlining the challenges that policy makers face in designing observation systems that will work to improve instructional practice at scale.

Lynch, K., Chin, M., & Blazar, D. (2013). How Well Do Teacher Observations Predict Value-Added? Exploring Variability Across Districts. In Association for Public Policy Analysis & Management Fall Research Conference . Washington, DC.Abstract

In this study we ask: Do observational instruments predict teachers' value-added equally well across different state tests and district/state contexts? And, to what extent are differences in these correlations a function of the match between the observation instrument and tested content? We use data from the Gates Foundation-funded Measures of Effective Teaching (MET) Project(N=1,333) study of elementary and middle school teachers from six large public school districts,and from a smaller (N=250) study of fourth- and fifth-grade math teachers from four large public school districts. Early results indicate that estimates of the relationship between teachers' value-added scores and their observed classroom instructional quality differ considerably by district.

Kelcey, B., McGinn, D., Hill, H. C., & Charalambous, C. (Working Paper). The Generalizability of Item Parameters Across Lessons.Abstract

The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.

Hill, H. C., Charalambous, C. Y., Blazar, D., McGinn, D., Kraft, M. A., Beisiegel, M., Humez, A., et al. (2012). Validating Arguments for Observational Instruments: Attending to Multiple Sources of Variation. Educational Assessment , 17, 1-19.Abstract

Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to multiple sources of variation

Taylor, E. S., & Tyler, J. H. (2011). The Effect of Evaluation on Performance: Evidence from Longitudinal Student Achievement Data of Mid-career Teachers. Publisher's VersionAbstract

The effect of evaluation on employee performance is traditionally studied in the context of the principal-agent problem. Evaluation can, however, also be characterized as an investment in the evaluated employee’s human capital. We study a sample of mid-career public school teachers where we can consider these two types of evaluation effect separately. Employee evaluation is a particularly salient topic in public schools where teacher effectiveness varies substantially and where teacher evaluation itself is increasingly a focus of public policy proposals. We find evidence that a quality classroom-observation-based evaluation and performance measures can improve mid-career teacher performance both during the period of evaluation, consistent with the traditional predictions; and in subsequent years, consistent with human capital investment. However the estimated improvements during evaluation are less precise. Additionally, the effects sizes represent a substantial gain in welfare given the program’s costs.

Blazar, D., Litke, E., Barmore, J., & Gogolen, C. (Working Paper). What Does It Mean to be Ranked a "High" or "Low" Value-Added Teacher? Observing Differences in Instructional Quality Across Districts.Abstract

Education agencies are evaluating teachers using student achievement data. However, very little is known about the comparability of test-based or "value-added" metrics across districts and the extent to which they capture variability in classroom practices. Drawing on data from four urban districts, we find that teachers are categorized differently when compared within versus across districts. In addition, analyses of scores from two observation instruments, as well qualitative viewing of lesson videos identify stark differences in instructional practices across districts among teachers who receive similar within-district value-added rankings. Exploratory analyses suggest that these patterns are not explained by observable background characteristics of teachers and that factors beyond labor market sorting likely play a key role.