Teacher Effectiveness

Working Paper
Chin, M., & Goldhaber, D. (Working Paper). Exploring Explanations for the "Weak" Relationship Between Value Added and Observation-Based Measures of Teacher Performance. Read the working paper
Bacher-Hicks, A., Chin, M., Hill, H., & Staiger, D. (Working Paper). Explaining Teacher Effects on Achievement Using Measures from Multiple Research Traditions.Abstract

Researchers have identified many characteristics of teachers and teaching that contribute to student outcomes. However, most studies investigate only a small number of these characteristics, likely underestimating the overall contribution. In this paper, we use a set of 28 teacher-level predictors drawn from multiple research traditions to explain teacher-level variation in student outcomes. These predictors collectively explain 28% of teacher-level variability in state standardized math test scores and 40% in a predictor-aligned math test. In addition, each individual predictor explains only a small, relatively unique portion of the total teacher-level variability. This first finding highlights the importance of choosing predictors and outcomes that are well aligned, and the second suggests that the phenomena underlying teacher effects is multidimensional. 

Read the working paper
Kelcey, B., McGinn, D., Hill, H. C., & Charalambous, C. (Working Paper). The Generalizability of Item Parameters Across Lessons.Abstract

The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.

Read the paper summary
McGinn, D., Kelcey, B., Hill, H., & Chin, M. (Working Paper). Using Item Response Theory to Learn about Observational Instruments.Abstract

As many states are slated to soon use scores derived from classroom observation instruments in high-stakes decisions, developers must cultivate methods for improving the functioning of these instruments. We show how multidimensional, multilevel item response theory models can yield information critical for improving the performance of observational instruments.

Read the paper summary
Blazar, D., Litke, E., Barmore, J., & Gogolen, C. (Working Paper). What Does It Mean to be Ranked a "High" or "Low" Value-Added Teacher? Observing Differences in Instructional Quality Across Districts.Abstract

Education agencies are evaluating teachers using student achievement data. However, very little is known about the comparability of test-based or "value-added" metrics across districts and the extent to which they capture variability in classroom practices. Drawing on data from four urban districts, we find that teachers are categorized differently when compared within versus across districts. In addition, analyses of scores from two observation instruments, as well qualitative viewing of lesson videos identify stark differences in instructional practices across districts among teachers who receive similar within-district value-added rankings. Exploratory analyses suggest that these patterns are not explained by observable background characteristics of teachers and that factors beyond labor market sorting likely play a key role. 

Read the working paper
2016
Hill, H. C., Kraft, M. A., & Herlihy, C. (2016). Developing Common Core Classrooms Through Rubric-Based Coaching . Center for Education Policy Research at Harvard University.Abstract

The project team is still awaiting student test data to complete the evaluation, but this brief provides a short update on survey results. Students of MQI-coached teachers report that their teachers ask more substantive questions, and require more use of mathematical vocabulary as compared to students of control teachers. Students in MQI-coached classrooms also reported more student talk in class. Teachers who received MQI Coaching tended to find their professional development significantly more useful than control teachers, and were also more likely to report that their mathematics instruction improved over the course of the year.

Read the early research findings report.
(2016). Findings from a National Study on Research Use Among School and District Leaders (Technical Report No. 1) . National Center for Research in Policy and Practice. Read the report (NCRPP Website)
Kane, T. J., Owens, A. M., Marinell, W. H., Thal, D. R. C., & Staiger, D. O. (2016). Teaching Higher: Educators' Perspectives on Common Core Implementation. Read the report Read the report abstract
2015
Kane, T. J., Gehlbach, H., Greenberg, M., Quinn, D., & Thal, D. (2015). The Best Foot Forward Project: Substituting Teacher-Collected Video for In-Person Classroom Observations. Read the report
Quinn, D. M., Kane, T. J., Greenberg, M., & Thal, D. (2015). Effects of a Video-Based Teacher Observation Program on the De-privatization of Instruction: Evidence from a Randomized Experiment. l2a_de-privatization-of-instruction.pdf
Blazar, D. (2015). Effective teaching in elementary mathematics: Identifying classroom practices that support student achievement. Economics of Education Review , 48, 16-29. Publisher's VersionAbstract

Recent investigations into the education production function have moved beyond traditional teacher inputs, such as education, certification, and salary, focusing instead on observational measures of teaching practice. However, challenges to identification mean that this work has yet to coalesce around specific instructional dimensions that increase student achievement. I build on this discussion by exploiting within-school, between-grade, and cross-cohort variation in scores from two observation instruments; further, I condition on a uniquely rich set of teacher characteristics, practices, and skills. Findings indicate that inquiry-oriented instruction positively predicts student achievement. Content errors and imprecisions are negatively related, though these estimates are sensitive to the set of covariates included in the model. Two other dimensions of instruction, classroom emotional support and classroom organization, are not related to this outcome. Findings can inform recruitment and development efforts aimed at improving the quality of the teacher workforce. 

blazar_2015_effective_teaching_in_elementary_mathematics_eer.pdf
Greenberg, M. (2015). Best Foot Forward Project: Research Findings from Year 1. Read the report
Hill, H. C., Charalambous, C. Y., & Chin, M. (2015). Teacher Characteristics and Student Learning: Toward a More Comprehensive Examination of the Association. Read the working paper Teacher Characteristics Table
Hill, H. C., Chin, M., & Blazar, D. (2015). Teachers' Knowledge of Students: Defining a Domain. Read the working paper
Blazar, D., Braslow, D., Charalambous, C., & Hill, H. C. (2015). Attending to General and Content-Specific Dimensions of Teaching: Exploring Factors Across Two Observation Instruments.Abstract

New observation instruments used in research and evaluation settings assess teachers along multiple domains of teaching practice, both general and content-specific. However, this work infrequently explores the relationship between these domains. In this study, we use exploratory and confirmatory factor analyses of two observation instruments - the Classroom Assessment Scoring System (CLASS) and the Mathematical Quality of Instruction (MQI) - to explore the extent to which we might integrate both general and content-specific view of teaching. Importantly, bi-factor analyses that account for instrument-specific variation enable more robust conclusions than in existing literature. Findings indicate that there is some overlap between instruments, but that the best factor structures include both general and content-specific practices. This suggests new approaches to measuring mathematics instruction for the purposes of evaluation and professional development. 

Read the working paper
(2015). SDP Educator Diagnostic for Delaware Department of Education . Strategic Data Project.Abstract

The Strategic Data Project (SDP) collaborated with the state of Delaware to illuminate patterns related to three critical areas of policy focus for the state: the recruitment, placement, and success of new and early career teachers; teacher impact on student learning; and teacher retention and the stability of the state’s teacher workforce.

Download full report
(2015). SDP Human Capital Diagnostic for Colorado . Strategic Data Project.Abstract

The Strategic Data Project (SDP) collaborated with the Colorado Department of Education (CDE) and the Colorado Education Initiative (CEI) to conduct SDP’s Human Capital Diagnostic—a series of high leverage, policy-relevant analyses related to the state’s educator workforce. SDP’s Human Capital Diagnostic investigates questions on five critical topics related to teachers and teacher effectiveness: recruitment, placement, development, evaluation, and retention.

Download full report
(2015). SDP Key Findings Report for Colorado: Mathematics Teacher Placement Patterns . Strategic Data Project.Abstract

The Strategic Data Project (SDP) partnered with the Colorado Department of Education (CDE) and the Colorado Education Initiative (CEI) to investigate whether Colorado public school students who are academically behind their peers are disproportionately placed with novice teachers.

Download full report
2014
Kelcey, B., Hill, H. C., & McGinn, D. (2014). Approximate measurement invariance in cross-classified rater-mediated assessments. Frontiers in Psychology , 5 (1469). Publisher's VersionAbstract

An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity.

Download the full article
Hill, H. C., & Chin, M. (2014). Year-to-Year Stability in Measures of Teachers and Teaching. Read the working paper

Pages