At a glance
- Program: HMH Into Reading®
- Subject: Literacy Curriculum
- Report Type: Efficacy Study, Study Conducted by Third Party
- Grade Level: Elementary
- Region: Southwest
- District Urbanicity: Suburban
- District Size: Large, Medium
- Implementation Model: Core Instruction
To determine the impact of HMH Into Reading® on K–3 students’ reading outcomes, Cobblestone Applied Research & Evaluation, Inc. (Cobblestone) conducted a quasi-experimental design (QED) study, comparing K–3 Acadience Reading® achievement data from a carefully matched sample of students from two Arizona school districts: one district that used HMH Into Reading and one district that did not use the program during the study year (2021–2022). This retrospective QED study compared Acadience Reading achievement data from two Arizona school districts, including 1,350 (treatment) students who used HMH Into Reading and 1,350 well-matched (control) students. Propensity score matching was used to match students based on similar characteristics: grade level, Acadience beginning of year score, gender, free/reduced-price lunch (FRPL) status, English learner (EL) status, and disability status, forming the analytic sample.
Two research questions addressed the reading outcomes of K–3 students using HMH Into Reading during the 2021–2022 school year: performance over time (Research Question 1) and a comparison between students who used the program and those who did not (Research Question 2). To address each research question, Acadience Reading assessment scores were analyzed separately for each grade level. The first analysis compared students’ performance using HMH Into Reading at three time points (beginning, middle, and end of year). Results indicate that students in all grade levels (K, 1, 2, and 3) showed significant improvement on their reading. The second analyses compared students using HMH Into Reading with carefully matched control students. Results indicate that HMH Into Reading students in kindergarten and Grade 3 significantly outperformed control students at the end of the year, and students in Grades 1 and 2 performed similarly in both study groups. These findings suggest that HMH Into Reading significantly improved students’ reading skills in kindergarten and Grade 3, with differences not likely attributable to other factors. Based on the study design and study results, this study meets the criteria for Tier 2 ESSA Moderate Evidence.
According to the National Assessment of Educational Progress (NAEP), also known as the Nation's Report Card, only one-third of students in the United States demonstrate reading proficiency (National Assessment of Educational Progress, 2022). This trend has remained consistent across different grade levels (i.e., Grades 4, 8, and 12) and over the past three decades. From 2011 to 2019, there was a positive trend indicating a gradual increase in the proportion of students achieving reading proficiency, reaching a record high of 37% in 2017. However, a recent study analyzing data from over five million public school students in the United States found that the COVID-19 pandemic has caused an unprecedented decline in academic development, particularly in reading achievement for elementary school students, with the largest impact observed in Grades 3–5 (Kuhfeld et al., 2023).
Considering these findings—and given the foundational nature of reading to general educational outcomes—there is a need for high-quality educational programs to support the literacy development of young students. To help provide quality education for students in the United States, the federal Every Student Succeeds Act (ESSA) identifies and promotes educational programs with proven success.
The four ESSA tiers of evidence (Institute of Education Sciences, n.d.) are:
- Tier 1 Strong Evidence
- Tier 2 Moderate Evidence
- Tier 3 Promising Evidence
- Tier 4 Demonstrates a Rationale
A previous ESSA Tier 3 (Promising Evidence) study examining HMH Into Reading in Texas elementary schools found reading achievement for Grade 4 students using the program significantly improved compared to those using other programs (Eddy et al., 2023). The current study aims to expand on previous evidence of the program’s effectiveness in developing elementary students’ literacy skills by examining its impact on students’ literacy skills in Arizona and earlier grade levels (K–3).
HMH Into Reading
HMH Into Reading is a research-based, evidence-informed, comprehensive English language arts program built to support teachers in delivering explicit and systematic instruction across all literacy strands—phonological awareness, phonics, fluency, vocabulary, comprehension, and writing—using a structured literacy approach. The program engages students in building knowledge and skills through culturally relevant texts organized around science, social studies, and arts topics. HMH Into Reading provides standards-aligned K–5 content and assessments, data insights, and differentiated resources to meet diverse classroom needs during whole- and small-group instruction.
In addition, to support the delivery of effective instruction, HMH Into Reading features research-based approaches to professional learning that support teachers in becoming developers of high-impact learning experiences for their students. Comprehensive professional learning solutions are data and evidence driven, mapped to instructional goals, and centered on students, and they build educators’ collective capacity. HMH allows teachers to achieve agency in their professional growth through effective instructional strategies, embedded teacher support, and ongoing professional learning relevant to everyday teaching.
Analytic Sample
Cobblestone requested reading achievement data, along with demographic information for the (HMH Into Reading) treatment group and the control group directly from the participating district research offices. Cobblestone received data for a total of 1,860 K–3 students from the treatment group district. Of those, 1,350 students (72.6%) had all necessary data for statistical analysis (i.e., all demographics information and three Acadience Reading scores at the beginning of year, middle of year, and end of year). In addition, data was received for 5,421 K–3 students from the control group district, and 1,350 control students were successfully matched to treatment students, forming the analytical sample (N = 2,700), using a propensity score match procedure. (See Table 1 for demographic characteristics of the analytical sample. See Appendix, Table 2 in the Full Report for the demographic characteristics by grade level.)
Propensity Score Matching
A propensity score match procedure was conducted to ensure that any differences in student reading outcomes between the treatment and the control group could be more confidently attributed to the program itself, rather than to pre-existing differences between the groups. Students from the treatment district (n = 1,350) were matched with students from the control district (n = 1,350) based on similar characteristics: grade level, Acadience beginning of year score, gender, socioeconomic status (i.e., eligibility for free/reduced-price lunch), English learner status, disability or special needs status, and ethnicity. A total of 2,700 students were successfully matched across both groups, forming the analytical sample. (See Appendix in the Full Report for additional details related to the propensity score matching procedure.)

Meeting ESSA Tier 2 Moderate Evidence Criteria
To meet ESSA’s Tier 2 level of evidence (Moderate Evidence), the current QED study must meet What Works Clearinghouse (WWC) standards with or without reservations. To assess the study’s eligibility to be reviewed by WWC and whether it is likely to meet its criteria with reservations, WWC guidelines for baseline equivalence in QEDs were used.
Standardized effect sizes (i.e., Hedge’s g) for the differences between the treatment and control groups’ Acadience beginning of year score by grade ranged between 0.010 to 0.248 (see Appendix, Table 3 in the Full Report). Although the effect size was smaller than 0.050 for Grades K and 2, Acadience beginning of year scores were controlled for in all grades during statistical analyses comparing between study groups. Of note, appropriate tests (i.e., independent-samples t-test or a Mann-Whitney test) found no significant difference in Acadience beginning of year score between the study groups for Grades K, 2, and 3 but did find a significant difference in Acadience beginning of year score for Grade 1. Given that all effect sizes were smaller than 0.250 and Acadience beginning of year scores were accounted for, the study is expected to meet the baseline equivalence requirement.
Acadience Reading
K–3 student reading achievement was assessed using Acadience Reading assessment scores at different timepoints over the 2021–2022 school year (i.e., beginning of year, middle of year, and end of year). Acadience Reading is a universal screening assessment that measures the acquisition of early literacy and reading skills from kindergarten through sixth grade. Acadience Reading consists of six brief measures that function as indicators of the essential skills that every child must master to become a proficient reader. Preliminary benchmarks are available for all measures and grades (Good & Kaminski, 2020).
Research Question 1 Analysis
To address the first research question (What is the impact of HMH Into Reading on students’ literacy skills over the school year?), Acadience Reading assessment data from K–3 students in the treatment group (i.e., schools that used HMH Into Reading during the 2021–2022 school year) was used. For students in Grades K through 2, a series of chi-square-goodness-of-fit tests were conducted for each grade level to determine if the distribution across three Acadience Reading assessment benchmark statuses (i.e., At or Above Benchmark, Below Benchmark, and Well Below Benchmark) was significantly different at the beginning, middle, and end of the year.
The Acadience Reading assessment statuses “At Benchmark” and “Above Benchmark” were combined into a single status, “At or Above Benchmark,” to align with the data format provided by the control district. Since the Acadience Reading assessment composite score is not comparable across different times of year for Grades K–2, the percent of students at different benchmark status levels was used instead for the K–2 analysis. For Grade 3 students, a repeated measures ANOVA was used to assess if there were significant differences in students’ composite scores on the Acadience Reading assessment at the three timepoints.
Findings
Kindergarten
There were significant differences in the distribution across the three Acadience Reading assessment benchmark statuses for treatment group kindergarten students over the 2021–2022 school year. The distributions at the beginning of year and the middle of year were significantly different, χ2 (2) = 178.21, p < .001, as well as the middle of year and end of year, χ2 (2) = 20.47, p < .001 (see Figure 1).
At each timepoint, there was an increase in the percentage of students who were “At or Above Benchmark” compared to the previous assessment. In alignment with these results, the distribution at the end of year was significantly different from the beginning of year distribution, χ2 (2) = 325.94, p < .001, showing the same trend. These results suggest that in schools using HMH Into Reading, kindergarten students’ literacy skills strengthen over the course of the school year.

Grade 1
There were significant differences in the distribution across the three Acadience Reading assessment benchmark statuses for treatment group Grade 1 students over the 2021–2022 school year. The distributions at the beginning of year and the middle of year were significantly different, χ2 (2) = 12.91, p = .002, as well as between the middle of the year and the end of the year, χ2 (2) = 11.81, p = .003 (see Figure 2).
At each timepoint, there was an increase in the percentage of students who were “At or Above Benchmark” compared to the previous assessment. The end of year distribution across benchmarks was significantly different from the beginning of year distribution, χ2 (2) = 31.68, p < .001, showing the same trend. These results suggest that in schools using HMH Into Reading, Grade 1 students’ literacy skills strengthen throughout the school year.

Grade 2
Significant differences in the distribution across the three Acadience Reading assessment benchmark statuses over the 2021–2022 school year was also found for Grade 2 students from the treatment group. Although the beginning of year and middle of year distributions were not significantly different, χ2 (2) = 1.73, p = .421, the distribution at the end of the year was significantly different from both the beginning of year, χ2 (2) = 10.62, p = .005, and middle of year distribution, χ2 (2) = 7.19, p = .027 (see Figure 3).
There was an overall increase in the percentage of students who were “At or Above Benchmark” over the school year, with a significant rise occurring from the middle of the year to the end of the year. These results suggest that in schools using HMH Into Reading, Grade 2 students' literacy skills strengthen throughout the school year, particularly from mid-year to end of year.

Grade 3
There was a significant effect of time of measurement of the Acadience Reading assessment (i.e., beginning, middle, and end of year) on Grade 3 students’ scores, F(1.80, 609.34) = 533.23, p < .001, with time of measurement accounting for 61% of the variance in scores, η²p = .61. Mauchly’s test indicated that the assumption of sphericity was violated, χ2 (2) = .88, p < .001. Therefore, a Huynh-Feldt correction was applied. (In addition, test results should be interpreted with caution since Acadience Reading assessment scores were not entirely normally distributed at the three timepoints.)
Post-hoc comparisons were conducted using the Bonferroni correction. Post-hoc tests showed that there was a significant difference between students’ Acadience beginning-of-year scores (M = 194.71; SD = 122.71) and middle-of-year scores (M = 249.41; SD = 130.14), MMOY-MBOY = 54.70, p < .001, as well as between their middle of year scores and end of year scores (M = 311.37; SD = 149.16), MEOY-MMOY = 61.96, p < .001 (see Figure 4).
These results suggest that Grade 3 students using HMH Into Reading performed significantly better on the Acadience Reading assessment over the school year, with performance improving from the beginning of the year to the middle of the year and further improving from the middle of the year to the end of the year.

Kindergarten
There was a significant effect of study group on kindergarten students’ Acadience end of year scores, F(1, 698) = 30.01, p < .001 (see Figure 5). Taking students’ beginning of year Acadience scores and demographic characteristics into account, kindergarten students using HMH Into Reading had significantly higher Acadience scores (M = 126.7; SD = 48.5) compared to students not using the program (M = 110.5; SD =54.5) at the end of the school year. The effect size was small to medium, η²p = .04, suggesting that HMH Into Reading had a small to moderate impact on kindergarten students’ literacy skills.

Grade 1
There was no significant effect of study group on Grade 1 students’ Acadience end of year scores, F(1, 610) = 0.94, p = .33210 (see Figure 6). Taking students’ beginning of year Acadience scores and demographic characteristics into account, end of year Acadience scores of Grade 1 students using HMH Into Reading (M = 135.6; SD = 101.9) were similar to those of students not using the program (M = 145.1; SD = 93.8).

Grade 2
There was no significant effect of study group on Grade 2 students’ Acadience end of year scores, F(1, 688) = 0.22, p = .64010 (see Figure 7). Taking students’ beginning of year Acadience scores and demographic characteristics into account, end of year Acadience scores of Grade 2 students using HMH Into Reading (M = 215.0; SD = 115.7) were similar to those of students not using the program (M = 214.5; SD = 109.3).

Grade 3
There was a significant effect of study group on Grade 3 students’ Acadience end of year scores, F(1, 672) = 10.59, p < .001 (see Figure 8). (These results should be interpreted with caution since the Acadience end of year scores were not entirely and normally distributed in either study group.)
Taking students’ beginning of year Acadience scores and demographic characteristics into account, Grade 3 students using HMH Into Reading had significantly higher Acadience scores (M = 311.4; SD = 149.2) compared to students not using the program (M = 301.8; SD =140.4) at the end of the school year. A small effect size, η²p = .02, suggests that HMH Into Reading had a slight impact on Grade 3 students’ literacy skills.

This Into Reading QED study was designed to determine the potential impact of HMH Into Reading program on K–3 students’ reading outcomes. The Cobblestone research team designed a study that aimed to meet the ESSA Tier 2 moderate level of evidence (i.e., a well-designed QED study that meets the WWC standards with reservations and demonstrates a statistically significant positive effect with no previous negative findings, with at least 350 participants, and is conducted in more than one district or school). Given that the study included a carefully matched sample of students from two school districts in Arizona—an intervention district and a control district—and met the WWC baseline equivalence requirement, the study is expected to qualify for ESSA Tier 2 Moderate Evidence.
Two research questions addressed the reading outcomes of K–3 students using HMH Into Reading during the 2021–2022 school year, specifically performance over time, and a comparison between students who used the program and those who did not. To address each research question, Cobblestone analyzed Acadience Reading assessment scores separately for each grade level. The first analysis compared students’ performance using HMH Into Reading at three points (beginning, middle, and end of the year). Results indicate that students in all grade levels (K, 1, 2, and 3) had significant improvement on their reading performance from the beginning to the end of the year. These results are not surprising, as one would expect that students’ reading would improve as a natural effect of maturation and presence in school over time, and the extent to which HMH Into Reading was solely responsible for this increase is not likely. The second analysis compared students using HMH Into Reading with carefully matched control students. Results indicate that treatment students in kindergarten and Grade 3 significantly outperformed control students at the end of the year, and students in Grades 1 and 2 performed similarly in both study groups. These findings suggest that HMH Into Reading significantly improved students’ reading skills in kindergarten and Grade 3, with differences not likely attributable to other factors.
A recent study by Relyea and colleagues (2023) exploring the impact of COVID-19 on student reading achievement indicates that younger students may have been more sensitive to the effects of the pandemic compared to older students, supporting results observed in the current study. In addition, it is noteworthy that the 2021–2022 school year was the first year since the COVID-19 pandemic in which no students were out of the classroom for a significant period of time, but it is likely that there was still a lingering impact on students’ reading achievement.
In other words, given previous research showing negative consequences for changes in schooling that occurred during the COVID-19 pandemic, and since the data for the current study was collected during the first “typical” school year following the pandemic, the results should be interpreted with these factors in mind. Specifically, participating kindergarten students entered school in 2021–2022 without any previous interruption to their formal schooling. Similarly, Grade 3 students in the current study had a minimal amount of developmental disruption since they had likely started acquiring reading skills before the pandemic occurred. However, second-grade students likely had the highest level of interruption in their schooling since the spring of their kindergarten year was interrupted by the pandemic. First-grade students had a similar level of interruption since their first experience with school was in the middle of the pandemic in the fall of 2020.
Hence, Cobblestone posits that the comparisons of kindergarten and Grade 3 students were truer tests of the effectiveness of the HMH Into Reading curriculum than the comparisons of Grades 1 and 2 students. Although these ideas are speculative given the dearth of research related to the timing of pandemic on specific developmental stages and consequent reading performance, they are worth noting. In addition, as with so many co-occurrences during the pandemic, it is unknown how individual students engaged in activities that either facilitated or undermined their reading skills, although we can only assume that behaviors were distributed equally across our study groups. Future research should continue to investigate how the timing of the pandemic interruptions specifically affected students’ acquisition of reading skills during critical developmental phases, as well as how such findings impact subsequent comparisons of reading curriculum.
Austin, P. C. (2010). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics, 10, 150–161. https://doi.org/10.1002/pst.433
Eddy, R. M., Alchehayed, A., Zuker, S., & Mendelsohn, D. A. (2023, August). HMH Into Reading Texas Study Report. Cobblestone Applied Research & Evaluation.
Good, R. H. & Kaminski, R. A. (2020). Acadience Reading K–6 assessment manual. Acadience.
Institute of Education Sciences. (n.d.) ESSA tiers of evidence: What you need to know. https://ies.ed.gov/ncee/edlabs/regions/midwest/pdf/blogs/RELMW-ESSA-Tiers-Video-Handout-508.pdf
Kuhfeld, M., Lewis, K., & Peltier, T. (2023). Reading achievement declines during the COVID-19 pandemic: Evidence from 5 million U.S. students in grades 3–8. Reading and Writing, 36(2), 245–261. https://doi.org/10.1007/s11145-022-10345-8
National Assessment of Educational Progress. (2022). NAEP report card: Reading: National student group scores and score gaps. https://www.nationsreportcard.gov/highlights/reading/2022/
Relyea, J. E., Rich, P., Kim, J. S, & Gilbert, J. (2023). The COVID-19 impact on reading achievement growth of grade 3–5 students in a U.S. urban school district: Variation across student characteristics and instructional modalities. Reading and Writing, 36(2), 317–346. https://doi.org/10.1007/s11145-022-10387-y
Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39, 33–38. https://doi.org/10.2307/2683903
Stuart, E. A., Lee, B. K., & Leacy, F. P. (2013). Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), 84–90. https://doi.org/10.1016/j.jclinepi.2013.01.013
Download the Full Report to view the Appendix for the HMH Into Reading K–3 QED study. The Appendix includes the demographic characteristics of the study participants by grade and study group (Table 2), the beginning of year Acadience scores by grade and study group (Table 3), and the propensity score matching procedures used to establish the study groups and the related Tables (Tables 4, 5, 6, and 7).