Resources

Resources by Type

Research Report

Blazar, D., Braslow, D., Charalambous, C., & Hill, H. C. (2015). Attending to General and Content-Specific Dimensions of Teaching: Exploring Factors Across Two Observation Instruments.Abstract

New observation instruments used in research and evaluation settings assess teachers along multiple domains of teaching practice, both general and content-specific. However, this work infrequently explores the relationship between these domains. In this study, we use exploratory and confirmatory factor analyses of two observation instruments - the Classroom Assessment Scoring System (CLASS) and the Mathematical Quality of Instruction (MQI) - to explore the extent to which we might integrate both general and content-specific view of teaching. Importantly, bi-factor analyses that account for instrument-specific variation enable more robust conclusions than in existing literature. Findings indicate that there is some overlap between instruments, but that the best factor structures include both general and content-specific practices. This suggests new approaches to measuring mathematics instruction for the purposes of evaluation and professional development. 

Kelcey, B., Hill, H. C., & McGinn, D. (2014). Approximate measurement invariance in cross-classified rater-mediated assessments. Frontiers in Psychology , 5 (1469). Publisher's VersionAbstract

An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity.

West, M. R., Kraft, M. A., Finn, A. S., Duckworth, A. L., Gabrieli, C. F. O., & Gabrieli, J. D. E. (2014). Promise and Paradox: Measuring Students' Non-cognitive Skills and the Impact of Schooling.Abstract

The authors used self-report surveys to gather information on a broad set of non-cognitive skills from 1,368 eighth-grade students attending Boston Public Schools and linked this information to administrative data on their demographics and test scores. At the student level, scales measuring conscientiousness, self-control, grit, and growth mindset are positively correlated with attendance, behavior, and test-score gains between fourth- and eighth-grade. Conscientiousness, self-control, and grit are unrelated to test-score gains at the school level, however, and students attending over-subscribed charter schools with higher average test-score gains score lower on these scales than do students attending district schools. Exploiting charter school admissions lotteries, the authors replicate previous findings indicating positive impacts of charter school attendance on math achievement, but find negative impacts on these non-cognitive skills. The authors provide suggestive evidence that these paradoxical results are driven by reference bias, or the tendency for survey responses to be influenced by social context. The results therefore highlight the importance of improved measurement of non-cognitive skills in order to capitalize on their promise as a tool to inform education practice and policy.

Kraft, M. A., & Papay, J. P. (2014). Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Evaluation and Policy Analysis , 36 (4), 476-500. Publisher's VersionAbstract

Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.

Blazar, D., Gogolen, C., Hill, H. C., Humez, A., & Lynch, K. (2014). Predictors of Teachers' Instructional Practices.Abstract

We extend this line of research by investigating teacher career and background characteristics, personal resources, and school and district resources that predict an array of instructional practices identified on a mathematics-specific observational instrument, MQI, and a general instrument, CLASS. To understand these relationships, we use correlation and regression analyses. For a subset of teachers for whom we have data from multiple school years, we exploit within-teacher, cross-year variation to examine the relationship between class composition and instructional quality that is not confounded with the sorting of "better" students to "better" teachers. We conclude that multiple teacher- and school-level characteristics--rather than a single factor--are related to teachers' classroom practices.

Hill, H. C., & Grossman, P. (2013). Learning from Teacher Observations: Challenges and Opportunities Posed by New Teacher Evaluation Systems. Harvard Educational Review.Abstract

In this article, Heather Hill and Pam Grossman discuss the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts. They argue that if these teacher observation instruments are to achieve the goal of supporting teachers in improving instructional practice, they must be subject-specific, involve content experts in the process of observation, and provide information that is both accurate and useful for teachers. They discuss the instruments themselves, raters and system design, and timing of and feedback from the observations. They conclude by outlining the challenges that policy makers face in designing observation systems that will work to improve instructional practice at scale.

Chin, M., Hill, H., McGinn, D., Staiger, D., & Buckley, K. (2013). Using Validity Criteria to Enable Model Selection: An Exploratory Analysis. Association for Public Policy Analysis and Management Fall Research Conference.Abstract

In this paper, the authors propose that an important determinant of value-added model choice should be alignment with alternative indicators of teacher and teaching quality. Such alignment makes sense from a theoretical perspective because better alignment is thought to indicate more valid systems. To provide initial evidence on this issue, they first calculated value-added scores for all fourth and fifth grade teachers within four districts, then extracted scores for 160 intensively studied teachers.Initial analyses using a subset of alternative indicators suggest that alignment between value-added scores and alternative indicators differ by model, though not significantly.

Hill, H. C., Gogolen, C., Litke, E., Humez, A., Blazar, D., Corey, D., Barmore, J., et al. (2013). Examining High and Low Value-Added Mathematics: Can Expert Observers Tell the Difference? In Association for Public Policy Analysis & Management Fall Research Conference . Washington, DC.Abstract

In this study, we use value-added scores and video data in order to mount an exploratory study of high- and low-VAM teachers' instruction. Specifically, we seek to answer two research questions: First, can expert observers of mathematics instruction distinguish between high- and low-VAM teachers solely by observing their instruction? Second, what instructional practices, if any, consistently characterize high but not low-VAM teacher classrooms? To answer these questions, we use data generated by 250 fourth- and fifth-grade math teachers and their students in four large public school districts.Preliminary analyses indicate that a teacher's value-added rank was often not obvious to this team of expert observers.

SDP Partner Diagnostic