Test–Retest Reliability of the One-Repetition Maximum (1RM) Strength Assessment: a Systematic Review

Background The test–retest reliability of the one-repetition maximum (1RM) test varies across different studies. Given the inconsistent findings, it is unclear what the true reliability of the 1RM test is, and to what extent it is affected by measurement-related factors, such as exercise selection for the test, the number of familiarization trials and resistance training experience. Objectives The aim of this paper was to review studies that investigated the reliability of the 1RM test of muscular strength and summarize their findings. Methods The PRISMA guidelines were followed for this systematic review. Searches for studies were conducted through eight databases. Studies that investigated test–retest reliability of the 1RM test and presented intra-class correlation coefficient (ICC) and/or coefficient of variation (CV) were included. The COSMIN checklist was used for the assessment of the methodological quality of the included studies. Results After reviewing 1024 search records, 32 studies (pooled n = 1595) on test–retest reliability of 1RM assessment were found. All the studies were of moderate or excellent methodological quality. Test–retest ICCs ranged from 0.64 to 0.99 (median ICC = 0.97), where 92% of ICCs were ≥ 0.90, and 97% of ICCs were ≥ 0.80. The CVs ranged from 0.5 to 12.1% (median CV = 4.2%). ICCs were generally high (≥ 0.90), and most CVs were low (< 10%) for 1RM tests: (1) among those without and for those with some resistance training experience, (2) conducted with or without familiarization sessions, (3) with single-joint or multi-joint exercises, (4) for upper- and lower-body strength assessment, (5) among females and males, and (6) among young to middle-aged adults and among older adults. Most studies did not find systematic changes in test results between the trials. Conclusions Based on the results of this review, it can be concluded that the 1RM test generally has good to excellent test–retest reliability, regardless of resistance training experience, number of familiarization sessions, exercise selection, part of the body assessed (upper vs. lower body), and sex or age of participants. Researchers and practitioners, therefore, can use the 1RM test as a reliable test of muscular strength.


Introduction
Muscular strength can be defined as "the ability to exert a force on an external object or resistance" [1]. Higher levels of muscular strength may result in better performance in a range of sport-specific tasks and decrease the risk of injuries in athletes [1]. An adequate level of muscular strength is also needed for a range of activities of daily life. In older adults, for example, greater strength improves physical functioning and quality of life and reduces the risk of falls [2][3][4]. Higher muscular strength is also associated with a reduced risk of premature mortality [5]. Taking these factors into account, it is not surprising that organizations such as the American College of Sports Medicine (ACSM) and the World Health Organization (WHO) recommend participating in muscular-strengthening activities on a regular basis [6,7]. Investigating aspects of strength as a muscular quality in relation to performance in different exercise tasks is important from a sports performance perspective. Studying associations of strength with health outcomes, such as mortality risk, chronic disease, and quality of life, is important to advance public health.
Resistance training is the most commonly used exercise intervention for increasing muscular strength [6]. Resistance training can be performed using isometric muscle actions (i.e., with no net change in muscle length), isokinetic muscle actions (i.e., with a constant rate of movement), and, the most commonly selected, dynamic muscle actions (i.e., coupled eccentric and concentric actions) [6]. To determine the efficacy of a given resistance training program, it is paramount to measure the level of strength as accurately as possible. Furthermore, studies that explore the acute effects of resistance exercise on physiological parameters, such as muscle protein synthesis, hormonal responses, muscle soreness, electromyography outcomes, as well as studies on ergogenic effects of supplements, also use muscle strength testing as a basis for their respective exercise protocols [8][9][10][11][12][13]. Additionally, exercise prescription for repetition ranges in resistance training is also often based on a given percentage of maximal strength values [6], which further highlights the need for an accurate method of testing strength.
In laboratory-based settings, muscular strength is most commonly assessed using isokinetic dynamometers [14]. However, a disadvantage of such tests is the cost of the necessary equipment [14]. Another limitation of isokinetic dynamometers is that they are generally only singlejoint-based tests of strength. A commonly used fieldbased test of strength is the one-repetition maximum (1RM) test [15]. As suggested by the name, the 1RM is defined as the maximal weight that can be lifted once, while maintaining the correct lifting technique [15]. The 1RM test has several distinct advantages over a laboratory-based test. In the 1RM test, eccentric actions are usually coupled with concentric actions, which is more reflective of dynamic muscle actions that are most commonly used in resistance training and of natural movement in most activities of sport and daily living.
The 1RM test allows for assessing strength in multi-joint exercises. Given it does not require expensive equipment, it is highly cost-effective. In trained individuals, 1RM test is also commonly performed using the same exercises as in the training sessions, which might reduce the need for prior familiarization with the test. In addition to these advantages over isokinetic dynamometers, the 1RM test has been shown as safe across different populations, even among children, older adults, and clinical individuals [16][17][18]. Even though 1RM test can be time-consuming when strength is assessed in a large number of participants, many researchers consider it as the "gold standard" test of dynamic strength [15].
Test-retest reliability represents the consistency of results in a given test across repeated measurements [19,20]. Reliability of strength tests may be influenced by a number of measurement-related factors, as well as by biological and technical variation in performing a given exercise [20]. Low reliability may reduce statistical power and thus increase the probability of a type II error [20]. In the sport and exercise science area, reliability is commonly expressed using the intra-class correlation coefficient (ICC) and the coefficient of variation (CV). A detailed description of ICC and CV as measures of reliability can be found elsewhere [19,20].
The test-retest reliability of the 1RM test varies significantly across different studies [16,18,. For example, in one study [48], ICC was 0.64, while in another [26], it was 0.99. Similarly, in the Seo et al. [46] study, CV was 0.5%, while in the Ribeiro et al. [40] study, it was 12.1%. Given the inconsistent findings, it is unclear what the true reliability of the 1RM test is and to what extent it is affected by measurement-related factors, such as exercise selection for the test, number of familiarization trials, and resistance training experience. No previous systematic review has summarized evidence on the test-retest reliability of the 1RM dynamic strength assessment. Therefore, this paper aimed to investigate the reliability of the 1RM test reported in individual studies and summarize their findings.

Search Strategy
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed for this systematic review [51]. English-language literature searches of PubMed/MEDLINE, Scopus, Academic Search Elite, CINAHL, MasterFILE Premier, PsycINFO, and SPORTDiscus databases were conducted on January 5th 2020 using the following search syntax: (1RM OR "1 RM" OR 1-RM OR "1 repetition maximum" OR "one repetition maximum") AND (reliability OR repeatability OR reproducibility). To minimize the study selection bias, the searches were performed independently by two authors (JG and BL) of the review.

Inclusion Criteria
To be included in the review, studies were required to meet the following criteria: (1) published in English and in a peer-reviewed journal, (2) investigated test-retest reliability of the 1RM test, and (3) presented ICC and/or CV values. As suggested by Koo and Li [52], ICC values were deemed to indicate poor (less than 0.50), moderate (0.50 to 0.75), good (0.75 to 0.90), and excellent (> 0.90) reliability. Even though there are no universally accepted thresholds for classifying CV, values lower than 5% are generally deemed acceptable [53].

Data Extraction
Two authors (JG and BL) of the review independently extracted the following data to an Excel spreadsheet: (1) details regarding the sample (including sample size, age, and resistance training experience), (2) protocol used for the 1RM test (including the warm-up protocol, number of days between the assessments, and rest between attempts), (3) ICC and/or CV values, and (4) any adverse events associated with the 1RM test. Coding files were checked between the authors, and all discrepancies were resolved through discussion and consensus.

Methodological Quality
To assess the methodological quality of the included studies, Form B of the validated COSMIN checklist was used [54], which is designed for reliability studies. This form has 11 items that refer to reporting of missing items, adequacy of the sample size, number of measurements, measurement administration, time interval between the assessments, similarity of conditions for both measurements, important flaws in the study design, and the reporting of ICCs. Additional details about the form can be found elsewhere [54]. In all of the questions (besides question ten), the answer "yes" corresponds to one point. Question 10 is as follows: "Were there any important flaws in the design or methods of the study?" In this question, the answer "no" corresponds to a point. The maximal score on the checklist is 11. Studies scoring 10 to 11 points were considered as being of "excellent" methodological quality. Studies scoring 7 to 9 points were considered as being of "moderate" quality, while studies that scored less than 7 points were considered as being of "poor" methodological quality. Studies were rated independently by two reviewers (JG and BL). Any observed differences in the assessment between the reviewers were resolved through discussion and mutual agreement. Study quality was not an inclusion/exclusion criterion in this review.

Search Results
The searches through the databases yielded 1024 search results (Fig. 1). Of these, 955 documents were excluded based on their titles and abstracts, while 69 papers were read in full. After assessing the full texts, 37 additional studies were excluded as they did not meet the inclusion criteria. The study selection process, therefore, resulted in the inclusion of 32 studies in this review [16,18,.

Study Characteristics
The pooled number of participants from all included studies was 1595 (median = 35; range = 10-376). Most of the studies were conducted among apparently healthy individuals with two studies examining the reliability of the 1RM test in clinical populations (individuals with Parkinson's disease and older adults with chronic heart failure [16,29], respectively). Fourteen studies were conducted among individuals with some resistance training experience, while 22 studies included individuals without any previous resistance training experience (note that four studies included both groups). The period between 1RM test and retest varied between 1 and 10 days. Out of fourteen studies that included familiarization sessions, nine studies used one session, four studies used two sessions, and one study used three familiarization sessions. All but one study presented ICCs, while 15 studies reported CVs (14 studies presented both ICCs and CVs). Table 1 summarizes relevant information pertaining to the included studies.

1RM Test Protocols
Out of the studies that detailed their respective warm-up protocols, 16 studies used one submaximal set, 10 studies used two or three submaximal sets, and 2 studies used five submaximal sets for the warm-up (Table 1). Submaximal sets were most commonly performed with loads ranging from 40 to 80% of estimated 1RM. The repetition range in the submaximal sets generally ranged from 1 to 10 repetitions. Eleven studies also incorporated some form of light aerobic exercise during the warm-up (e.g., 5 min of cycling; Table 1). The number of 1RM attempts per testing session ranged from 3 to 8, with 1 to 5 min of rest between attempts.

Methodological Quality
Based on the COSMIN checklist, all studies were classified as either having excellent (17 studies) or moderate (15 studies) methodological quality. The mean ± standard deviation values of the checklist were 9 ± 1 points (range = 8 to 11 points). The results of the quality assessment can be found in Table 2.

Overall Reliability of 1RM Test
Test-retest reliability of 1RM assessment is summarized in Table 3 and Fig. 2. When considering all available studies, ICCs ranged from 0.64 to 0.99 (median ICC = 0.97), where 92% of ICCs were ≥ 0.90, and 97% of ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 12.1% (median CV = 4.2%).

Reliability in Relation to Training Status and Familiarization
Twenty-two studies included untrained individuals. ICCs for 1RM tests among untrained individuals ranged from 0.64 to 0.99 (median ICC = 0.97), where 92% of ICCs were ≥ 0.90, and 99% of ICCs were ≥ 0.80. The range of reported CVs was from 1 to 12.0% (median CV = 5.5%). Fourteen studies included individuals with some previous resistance training experience. ICCs for 1RM tests among individuals with previous resistance training experience ranged from 0.64 to 0.99 (median ICC = 0.98), where 93% of ICCs were ≥ 0.90, and 96% of ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 7.8% (median CV = 3.3%).
Eighteen studies did not include a familiarization session. ICCs in these studies ranged from 0.64 to 0.99 (median ICC = 0.96), where 90% of ICCs were ≥ 0.90, and 96% of ICCs were ≥ 0.80. The range of reported CVs was from 1.0 to 9.0% (median CV = 5.3%). Fourteen studies included one or more familiarization sessions. In these studies, ICCs ranged from 0.64 to 0.99 (median ICC = 0.98), where 90% of ICCs were ≥ 0.90, and 93% of ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 12.1% (median CV = 3.8%).

Reliability in Relation to Exercise Selection and Body Region
Seventeen studies used single-joint exercises. ICCs for 1RM tests using single-joint exercises ranged from 0.74 to 0.99 (median ICC = 0.97), where 93% of ICCs were ≥ 0.90, and 96% of ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 9.0% (median CV = 4.1%).       Bench press (men): 4.9 kg Bench press (women): ↔ Leg press (men): ↔ Leg press (women): ↔ ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 12.1% (median CV = 4.0%). Twelve studies included older adult participants. ICCs for 1RM tests among older adults ranged from 0.80 to 0.99 (median ICC = 0.97), where 93% of all ICCs were ≥ 0.90. The range of reported CVs was from 1.0 to 9.0% (median CV = 5.4%). Twenty-two studies included young to middle-aged adult participants. ICCs for 1RM tests among young and middle-aged adults ranged from 0.64 to 0.99 (median ICC = 0.98), where 91% of all ICCs were ≥ 0.90, and 97% of ICCs were ≥ 0.80. The range of reported CVs was from 0.5 to 12.1% (median CV = 3.5%).

Systematic Changes in Results Between Repeated Measurements
In 66% of the analyses that assessed potential systematic changes in 1RM test results between the repeated measurements, no significant changes were found. The remaining studies found higher 1RM values in the retest condition. For lower-body exercises, the reported increases in 1RM ranged from 1.1 to 17.5 kg (median = 5.5 kg). For upper-body exercises, the reported increases in 1RM ranged from 0.5 to 4.9 kg (median = 1.8 kg).

Main Findings of the Review
The main finding of this systematic review is that the 1RM test generally has excellent test-retest reliability, regardless of the previous resistance training experience, sex, and age of the participants; whether or not the testing procedure includes familiarization sessions; whether the exercises are classified as single-or multi-joint movements; and whether the testing is conducted for upper-or lower-body musculature. This finding is based on 32 included studies that showed either excellent or moderate methodological quality.

Reliability in Relation to Training Status and Familiarization
Research has established that the response to resistance exercise varies between resistance-trained and untrained individuals [55,56]. For example, studies have reported differential molecular and epigenetic responses between trained and untrained individuals following an acute bout of resistance exercise [55,56]. Duez et al. [57] also reported larger action potentials and electric activity of motor units in resistance-trained participants, compared with untrained participants. Accordingly, some authors [58] speculated 1RM test reliability may be different between resistance-trained and untrained individuals. However, when we grouped the ICCs and CVs according to training status, the data showed similar reliability for individuals with and without resistance training experience. These results suggest that resistance training experience might not be as important for the 1RM test as previously thought [58]. From a practical perspective, the results suggest that exercise practitioners may consider using the 1RM test as a reliable test of strength even among untrained participants. Furthermore, the 1RM test seems to be generally safe, as the studies reported very few adverse events associated with the measurement. Most commonly, only muscle soreness was reported (Table 1).
In the Ploutz-Snyder and Giamis study [59], the authors reported that untrained individuals needed as much as eight familiarization sessions with the 1RM test to obtain a reliable measurement. Specifically, these authors reported an average increase in the 1RM test by 13 kg from the first to the final testing session (~1.6 kg per session). They employed a protocol in which the 1RM test was conducted every two days over a period of 2 to 3 weeks. The included participants were required to return to testing if their 1RM on one session exceeded their 1RM on the previous session by 1 kg. Such a strict familiarization procedure might be inefficient and could potentially lead to an increase in the dropout rates of participants. Also, such a testing design might even result in an unwanted training effect, as studies show that merely practicing the 1RM test can produce similar strength gains as high-volume resistance training routine [60]. Studies that did not include any familiarization and studies that included at least one familiarization session showed very high and similar ICC values (over 90% of ICCs were ≥ 0.90). These results suggest that familiarization sessions are not necessary for a reliable assessment of 1RM. While the results would suggest that a familiarization session is likely not required for a reliable 1RM assessment, there may be cases when some familiarization with the exercise to needed, e.g. when a practitioner estimates that the participant's skill in a given exercise is not sufficient and that, therefore, performing the test without further familiarization may increase the risk of injury. To avoid the abovementioned potential issues, in such cases, familiarization can be incorporated into the first testing session, as done by Benton and colleagues [24,25].

Reliability in Relation to Exercise Selection and Body Region
Besides training experience, variables such as exercise complexity have been suggested to play an impactful role in the reliability of the 1RM test [48]. For example, one study used the squat and knee extension exercises for the 1RM test [48]. For the squat, which is the more complicated exercise to perform, the ICC was 0.64, while for the knee extension exercise, the ICC was 0.90. However, when examining the whole body of literature, the data for single-and multi-joint exercises showed that the reliability of the 1RM test is high regardless of the resistance exercise selection. Indeed, even studies that assessed the 1RM test using very complex exercises, such as the power clean, reported ICCs of 0.98 and 0.99 [28,30], albeit these findings are specific to young athletes. Similar results, indicating no substantial differences in reliability, were seen in the subgroup analyses for upper-and lower-body exercises.

Reliability in Relation to Sex and Age of Participants
Even though there are physiological differences between men and women, especially in muscle contractile properties, fiber type proportion, and perfusion [61], we found no clear indication of a difference in 1RM test reliability between sexes. Research has also established physiological differences in voluntary muscle activation by age, with younger adults having higher muscle activation than their older counterparts [62]. However, we found no clear indication that age affects the test-retest reliability of the 1RM test. It should be noted that making direct comparisons between sex and age groups across different studies is challenging, given that exercise selection and other elements of the testing protocol vary. The evidence base would benefit from more studies that include analyses stratified by sex and age groups within a single study. Nevertheless, the currently available evidence suggests that the 1RM test is a reliable test of muscle strength among both sexes and different age groups.

Systematic Changes in Results Between Repeated Measurements
Most studies did not find systematic changes in results between the repeated measurements. In those that did, the observed changes were generally small. Their size was well below the average increases in strength commonly found in strength training interventions [63][64][65][66][67]. This is important to consider given that the most common application of the 1RM test is for evaluating changes in strength following a given training program.

Methodological Quality of Included Studies
The included studies were classified as having excellent or moderate methodological quality based on the COS-MIN checklist. While 31 studies presented ICC values and thus received a point on item 11, one study presented only CV values (Table 3). Therefore, future studies should consider presenting ICC coupled with the CV values as both can provide valuable information about reliability. Detailed reasoning for presenting both of the reliability coefficients is available in the paper by Atkinson and Nevill [20]. Despite the moderate-to-excellent quality of the included studies, there is one limitation noted that needs to be highlighted. Namely, not all studies presented the type of ICC used in the analysis. There are ten different types of ICCs that provide different estimates of reliability [52]. When calculated from the same data, one study demonstrated that six different types of ICC ranged from 0.51 to 0.87 [68]. This issue is not limited to the studies included herein as recent reviews that focused on the test-retest reliability of the Yo-Yo test and the 30-15 Intermittent Fitness Test (30-15 IFT) also highlighted this as a limitation [69,70]. Even though not all studies reported the specific type of ICC types they used, 92% of all ICCs were still ≥ 0.90, suggesting that this limitation might not have had a profound impact on the findings of this review. Nevertheless, future studies conducted on this topic should clearly state which ICC was used for the analysis, to allow for better-informed comparisons of results between studies.

Recommendations for Future Research
Evidence on the reliability of the 1RM test in clinical populations is scarce, as our search revealed only two such studies. Buckley and Hass [16] included 46 individuals with Parkinson's disease and explored the reliability of 1RM test assessment of four resistance training exercises. The authors reported ICC values ranging from 0.91 to 0.97. Ellis et al. [29] included individuals with chronic heart failure and reported excellent reliability of the 1RM test for the leg press (ICC = 0.97). These findings would suggest that the 1RM test is a highly reliable test of strength even among clinical individuals. However, the evident lack of studies that explored specific clinical populations highlights the need for future research.
The included studies generally focused on test-retest reliability. However, four studies [23,29,47,48] also provided data for inter-rater reliability. The respective ICCs ranged from 0.85 to 0.98, where 83% of all ICCs were higher than 0.90. Although it seems that the interrater reliability of the 1RM test is also high, given that the number of studies was relatively small, this topic should be further explored in future research.
The warm-up protocols varied across the included studies. For example, the studies used between one and five sets with submaximal loads for the warm-up (Table  1). Additionally, some studies also incorporated light aerobic exercise into the warm-up ( Table 1). The number of 1RM attempts in some studies was limited (usually to a maximum of three to five attempts), whereas others used progressive increases in the load until the participant could no longer perform a successful 1RM attempt (Table 1). Despite the differences in the warm-up and testing protocols, the reliability of the 1RM test was generally high across all studies. However, future studies may consider exploring the influence of different warm-up strategies and testing protocols on the reliability of 1RM test.

Limitations of the Review
There are some limitations that need to be considered when interpreting the findings of this review. While there are different statistical measures to express testretest reliability, the current review focused only on ICC and CV as the two most commonly used reliability coefficients in this research area. Twelve included studies additionally used Bland-Altman plots [18, 21, 23, 28-30, 34, 40, 42, 44, 47, 48] and found relatively narrow 95% limits of agreement (LoA). For example, 95% LoA for the bench press, power clean, leg press, and squat were ± 3-5 kg, ± 5-8 kg, ± 8-13 kg, and ± 10-15 kg, respectively [23, 28-30, 40, 44, 48], which further indicates a high reliability of the 1RM test. However, given the small number of studies that used Bland-Altman plots, future research may also consider using this statistic to provide further insights into LoA for other resistance exercises used for the 1RM test.

Conclusion
Accurate assessment of strength is the foundation upon which optimal resistance training programs for dynamic strength gains can be developed and evaluated. Based on the results of this review, it can be concluded that the 1RM test generally has good-to-excellent test-retest reliability. The reliability of the 1RM test tends to be excellent regardless of resistance training experience, number of familiarization sessions, exercise selection, part of the body assessed (upper vs. lower body), and sex or age of participants. No or only small systematic changes in 1RM are expected between repeated measurements. Researchers and practitioners can, therefore, use the 1RM test as a reliable test for assessing maximal dynamic muscular strength.