Research Report

Quinn, D. M., Kane, T. J., Greenberg, M., & Thal, D. (2015). Effects of a Video-Based Teacher Observation Program on the De-privatization of Instruction: Evidence from a Randomized Experiment.

l2a_de-privatization-of-instruction.pdf

Chin, M., & Goldhaber, D. (Working Paper). Exploring Explanations for the "Weak" Relationship Between Value Added and Observation-Based Measures of Teacher Performance.

Read the working paper

Blazar, D. (2015). Effective teaching in elementary mathematics: Identifying classroom practices that support student achievement. Economics of Education Review , 48, 16-29. Publisher's Version Abstract

blazar_2015_effective_teaching_in_elementary_mathematics_eer.pdf

Recent investigations into the education production function have moved beyond traditional teacher inputs, such as education, certification, and salary, focusing instead on observational measures of teaching practice. However, challenges to identification mean that this work has yet to coalesce around specific instructional dimensions that increase student achievement. I build on this discussion by exploiting within-school, between-grade, and cross-cohort variation in scores from two observation instruments; further, I condition on a uniquely rich set of teacher characteristics, practices, and skills. Findings indicate that inquiry-oriented instruction positively predicts student achievement. Content errors and imprecisions are negatively related, though these estimates are sensitive to the set of covariates included in the model. Two other dimensions of instruction, classroom emotional support and classroom organization, are not related to this outcome. Findings can inform recruitment and development efforts aimed at improving the quality of the teacher workforce.

Kelcey, B., Hill, H. C., & McGinn, D. (2014). Approximate measurement invariance in cross-classified rater-mediated assessments. Frontiers in Psychology , 5 (1469). Publisher's Version Abstract

Download the full article

An important assumption underlying meaningful comparisons of scores in rater-mediated assessments is that measurement is commensurate across raters. When raters differentially apply the standards established by an instrument, scores from different raters are on fundamentally different scales and no longer preserve a common meaning and basis for comparison. In this study, we developed a method to accommodate measurement noninvariance across raters when measurements are cross-classified within two distinct hierarchical units. We conceptualized random item effects cross-classified graded response models and used random discrimination and threshold effects to test, calibrate, and account for measurement noninvariance among raters. By leveraging empirical estimates of rater-specific deviations in the discrimination and threshold parameters, the proposed method allows us to identify noninvariant items and empirically estimate and directly adjust for this noninvariance within a cross-classified framework. Within the context of teaching evaluations, the results of a case study suggested substantial noninvariance across raters and that establishing an approximately invariant scale through random item effects improves model fit and predictive validity.

Hill, H. C., Chin, M., & Blazar, D. (2015). Teachers' Knowledge of Students: Defining a Domain.

Read the working paper

Hill, H. C., Charalambous, C. Y., & Chin, M. (2015). Teacher Characteristics and Student Learning: Toward a More Comprehensive Examination of the Association.

Read the working paper

Teacher Characteristics Table

Hill, H. C., & Chin, M. (2014). Year-to-Year Stability in Measures of Teachers and Teaching.

Read the working paper

Blazar, D., Braslow, D., Charalambous, C., & Hill, H. C. (2015). Attending to General and Content-Specific Dimensions of Teaching: Exploring Factors Across Two Observation Instruments.Abstract

Read the working paper

New observation instruments used in research and evaluation settings assess teachers along multiple domains of teaching practice, both general and content-specific. However, this work infrequently explores the relationship between these domains. In this study, we use exploratory and confirmatory factor analyses of two observation instruments - the Classroom Assessment Scoring System (CLASS) and the Mathematical Quality of Instruction (MQI) - to explore the extent to which we might integrate both general and content-specific view of teaching. Importantly, bi-factor analyses that account for instrument-specific variation enable more robust conclusions than in existing literature. Findings indicate that there is some overlap between instruments, but that the best factor structures include both general and content-specific practices. This suggests new approaches to measuring mathematics instruction for the purposes of evaluation and professional development.

Angrist, J. D., Cohodes, S. R., Dynarski, S. M., Fullerton, J. B., Kane, T. J., Pathak, P. A., & Walters, C. R. (2011). Student Achievement in Massachusetts' Charter Schools.Abstract

Read the full report

Researchers from the Harvard Graduate School of Education, MIT, and the University of Michigan have released the results of a new study that suggests that urban charter schools in Massachusetts have large positive effects on student achievement at both the middle and high school levels. Results for nonurban charter schools were less clear; some analyses indicated positive effects on student achievement at the high school level, while results for middle school students were much less encouraging.

View the Press Release

View the PowerPoint Presentation

Cascio, E. U., & Staiger, D. O. (2012). Knowledge, Tests, and Fadeout in Educational Interventions. Publisher's Version Abstract

Download full report

Educational interventions are often evaluated and compared on the basis of their impacts on test scores. Decades of research have produced two empirical regularities: interventions in later grades tend to have smaller effects than the same interventions in earlier grades, and the test score impacts of early educational interventions almost universally “fade out” over time. This paper explores whether these empirical regularities are an artifact of the common practice of rescaling test scores in terms of a student’s position in a widening distribution of knowledge. If a standard deviation in test scores in later grades translates into a larger difference in knowledge, an intervention’s effect on normalized test scores may fall even as its effect on knowledge does not. We evaluate this hypothesis by fitting a model of education production to correlations in test scores across grades and with college-going using both administrative and survey data. Our results imply that the variance in knowledge does indeed rise as children progress through school, but not enough for test score normalization to fully explain these empirical regularities.

Chin, M., Hill, H., McGinn, D., Staiger, D., & Buckley, K. (2013). Using Validity Criteria to Enable Model Selection: An Exploratory Analysis. Association for Public Policy Analysis and Management Fall Research Conference.Abstract

Download working paper

In this paper, the authors propose that an important determinant of value-added model choice should be alignment with alternative indicators of teacher and teaching quality. Such alignment makes sense from a theoretical perspective because better alignment is thought to indicate more valid systems. To provide initial evidence on this issue, they first calculated value-added scores for all fourth and fifth grade teachers within four districts, then extracted scores for 160 intensively studied teachers.Initial analyses using a subset of alternative indicators suggest that alignment between value-added scores and alternative indicators differ by model, though not significantly.

Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National Board Certification and Teacher Effectiveness: Evidence from a Random Assignment Experiment.Abstract

Download full report

The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. For this study, the authors compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers. The authors conclude that students assigned to highly-rated applicants outperformed those in the comparison classrooms by more than those assigned to poorly-rated teachers. Moreover, the estimates with and without random assignment were similar.

Bacher-Hicks, A., Chin, M., Hill, H., & Staiger, D. (Working Paper). Explaining Teacher Effects on Achievement Using Measures from Multiple Research Traditions.Abstract

Read the working paper

Researchers have identified many characteristics of teachers and teaching that contribute to student outcomes. However, most studies investigate only a small number of these characteristics, likely underestimating the overall contribution. In this paper, we use a set of 28 teacher-level predictors drawn from multiple research traditions to explain teacher-level variation in student outcomes. These predictors collectively explain 28% of teacher-level variability in state standardized math test scores and 40% in a predictor-aligned math test. In addition, each individual predictor explains only a small, relatively unique portion of the total teacher-level variability. This first finding highlights the importance of choosing predictors and outcomes that are well aligned, and the second suggests that the phenomena underlying teacher effects is multidimensional.

Kraft, M. A., & Papay, J. P. (2014). Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Evaluation and Policy Analysis , 36 (4), 476-500. Publisher's Version Abstract

Download full report

Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.

Kelcey, B., McGinn, D., Hill, H. C., & Charalambous, C. (Working Paper). The Generalizability of Item Parameters Across Lessons.Abstract

Read the paper summary

The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.

qualtrics survey

Research Report

Pages

Filter By

Contact Us

CEPR Websites

707006a558f5455a2df8d272ac9f9f28

Styles Code

f10daae7474d731f9b58aa4bc69737a0

ee3ba2f8a47bfa7770037397405524ed