The main result of the present study was that the six investigated indicators for mechanical efficiency formed three separated groups by rank correlation: the first group (group I) was formed by GE, NE, and T; the second one (group II) by DE and WEe; and the third one (group III) by WEm. Identical grouping would be achieved also by Somers’ D function (not shown). There were strong correlations within the groups, whereas correlations between the groups were at most moderate. As the six indicators of mechanical efficiency fall into different groups, they can be interpreted to measure fundamentally different aspects of mechanical efficiency.
One speculative reason for the observed grouping might be that the baseline subtraction is altogether an erroneous way to approach efficiency: in this paper, it is shown that E0,e does not differ from Erest, that the confidence interval for DE is too large to be reliable, and that WEm provides too large values for work efficiency. These facts give rise to a question of whether DE and WE measure what they supposedly should measure. Previously, there has been mostly theoretical criticism against NE, DE, and WE [8, 12, 13, 25], but the present study is one of the rarely seen methodological study to address this question.
In theory, as shown in the “Methods” section, every indicator of mechanical efficiency approaches DE if pedaled external work could increase unbounded, as the role of internal energy expenditure E0 comes negligible compared to total energy expenditure Etot. That the efficiency indices did form different groups was because one cannot pedal large enough intensities aerobically; theoretically, DE and GE are not near each other (0.5% point) until at 1350 W intensity. This is somewhat contrast to the case study of world-class champion in [40], where the GE and DE were found to be within 0.1% point distance already at 300–400 W power output. Based on the estimations of the present study, this kind of unity between GE and DE at so low power level is highly exceptional.
The measured efficiency indices of the present study seem to be in line with the literature. From a review article [8] in literature, GE has mostly been around 18–20% at 150 W ([8], Fig. 2), and mean ±SD for DE from 14 studies was 23.8± 2.6% ([8], Table 1). These values are well in line with the present study with GE =20.0 ± 0.8% and DE=23.8 ± 1.9%.
Efficiency Groups
In general, group I can be interpreted to illustrate the mechanical efficiency of a whole body in a cycling work. GE and T belong to the same group, understandably, as the former one is a refined version of the latter one. The fact that NE belongs to this group indicates that there are no great differences nor adaptations in resting energy expenditure between individuals. On the other hand, in theory, groups II and III try to grab the efficiency of an isolated musculoskeletal system in a cycling work by subtracting, in a one form or another, zero load energy expenditure from the examination. Hence, it seems that E0 plays a role, and a bigger one than Erest, when trying to explain why efficiency indices fall into different groups. The importance of E0 is well in line with a previous study [41], in which it was argued that differences in zero load cycling between individuals explain some of the observed variation in GE between individuals. Another possible reason for the differences between the groups lies in the difficulty and uncertainty of determining WEe and DE from Wext-Etot regression line.
DE and WEe belong to the same group as they are both calculated from the same Wext-Etot regression line. Furthermore, at each observation point, WEe can be seen as an inverse of a slope of a line through that observation point and E0,e, so that WEe can be interpreted, more or less, as a local delta efficiency. If all the observation points would fall on the same straight line, WEe and DE would coincide.
One can find indirect support from literature to this grouping. When studying correlations between different physiological factors to indicators of mechanical efficiency, it has been demonstrated that physiological factors affect differently to indicators from different groups. For example, a measure from group I has been reported to be significantly affected while the measure from group II has not, e.g., by the temperature of the skin [32], VO2max [10], and body weight [26]. In addition, group I has been reported to be affected while group III has not, e.g., by training [41]. Thus, the literature shows that, based on correlation to physiological factors, there seem to be some groupings for mechanical efficiency indices supporting indirectly our grouping.
Accuracy of W
ext-E
tot Regression Line and DE
It has been observed that the repeatability of DE is significantly weaker than GE [10], but this phenomenon has eluded explanations. Here, we argue that this phenomenon can be explained by the weak accuracy of Wext-Etot regression line, which is caused mainly by using too few observation points, typically 3 [37, 38], 4 [39], or at most 6 [32, 42]. In the present study, we replicated the usual way to calculate efficiency indices, which was the reason to include only 3–5 points to our Wext-Etot regression line. As the value (95% CI) for DE was 22.6% (19.2–26.1%) and for E0,e 6.9 kJ/min (− 16.6–30.4 kJ/min), it is plainly clear that more observation points would be required for a reliable Wext-Etot regression line, and hence, reliable DE and E0,e estimates. Noteworthy, the coefficient of determination, R2, is unable to distinguish this problem, as R2 value for Wext-Etot regression line in our study was 0.996± 0.004. It means that R2 is far from a sufficient test for explaining the accuracy of Wext-Etot regression line when there are too few observation points; after all, R2 with two observation points is always 1.00, although this kind of estimation contains huge potential error. Another factor, besides the number of observation points, affecting reliability of Wext-Etot regression line is cadence. It affects energy expenditure, and applying linear regression from [8] we can, purely theoretically, estimate that using four observation points DE can change as much as 1.1 %-points by only altering cadence from 80 by ±1 rpm. As keeping cadence closer than ± 1 rpm to a target cadence during the test is very challenging, it becomes clear that there is quite large built-in imprecision potential in DE measurements. In contrast, keeping cadence 80 ± 1 rpm affects theoretically GE only by 0.1% points.
A clear proposal to improve the accuracy of Wext-Etot regression line would be to use more data points. For example, in the study of Medbø et al. [43], it has been suggested to use at least 10 observation points when estimating Wext-Etot regression line. Another way to improve the estimation would be to include only aerobic intensities. For example, some efficiency studies have included 270 W loads for women [42] and 300 W loads for men [32] when calculating DE. However, without measuring blood lactates, the amount of anaerobic energy expenditure cannot accurately be estimated for these intensities. Not to mention about the potential impact of slow VO2 component, which can be present already when the intensity exceeds 50% VO2max [18, 19] skewing the linearity of Wext-Etot relation. Last proposal to get more precise DE would be to monitor accurately the used cadence.
It should be clear in mind that the accuracy of Wext-Etot regression line has more profound meaning than only that of determining DE and E0,e as it is also used, e.g., to extrapolate theoretical energy (or oxygen) consumption at high-intensity works [43, 44].
WEe vs. WEm
It has been widely recognized how E0,m is much greater than E0,e, the difference ranging from 20 to 350% [16, 17], being 140% in the present study. This means that they both cannot accurately describe E0 which they supposedly illustrate. Above, we have argued how the accuracy of E0,e is quite weak based on CI. Another, often ignored, charge against E0,e is that it does not differ from Erest (p = 0.60, Fig. 2), with half of the subjects in the present study having smaller E0,e than Erest, which sounds abnormal. Similar values can be seen, e.g., in [27]. One explanation might be that E0,e does not actually illustrate the energy expenditure which it has been thought to illustrate. For example, it has been observed in [15] that internal work is neither constant nor independent from external work. This can be interpreted so that, although Wext-Etot connection would be linear, the energy expenditure of a zero load is not found at the intersection with y-axis, as we do not know how the internal work is related to the total energy expenditure at different loads. One could also try to explain the possibility of Erest to be truly higher than E0, as starting an exercise against zero load could in principal increase the work load of the heart and legs but at the same time reduce even more the work load of other parts of the body, e.g., the digestive system and internal organs, but this is highly speculative.
On the other hand, also E0,m has many problems. Firstly, there are all the theoretical explanations, shown in the “Background” section, how E0,e offers better approximation for E0 than E0, m. Moreover, in literature, E0,m (and thus WEm) has been discarded because of its too high values [7, 8]. An isolated muscle has theoretically been discussed to have mechanical efficiency at most 30% [12, 13]. In the present study, WEm was 32.0± 2.9% (range 28.0–38.3%) and hence, too high for a mechanical efficiency of an isolated musculoskeletal system in a cycling work where the usage of elastic energy is minimal [45]. All in all, E0,e seems too small to be true energy expenditure for zero load and E0,m too large, and hence, both of them (and thus WEe and WEm) seem to contain unanswered methodological problems. More specifically, as was reported above, WEe and DE are quite parallel measurements for a mechanical efficiency, and as such, if the problems related to WEe cannot be solved, it casts doubts also on DE, even though its theoretical base would otherwise be firm enough.
Methodological Doubts on Baseline Subtractions
To bring the discussion to a conclusion, we have now seen how methodologically DE, WEe, and WEm all contain problems casting some serious doubts on sensibility of baseline subtractions. The previous doubts against NE, WE, and DE are essentially theoretical considerations based on the facts that energy expenditure cannot be divided into separated components and that the baseline subtractions are not invariant with different work intensities [8, 12, 13, 25]. The doubts of the present study are based on the methodological outcomes: essentially that WEm is too large, that E0,e is too similar to Erest, and that 95% CI of DE and E0,e are too wide. Based on these findings, NE would be the only methodologically sound mechanical efficiency index with a baseline subtraction. However, both NE and GE belong to the same group I. Thus, one can argue that GE carries basically the same information than NE, but without an additional inconvenience and possible source of error by having to measure Erest. In this way, the present study suggests methodologically that also the need for NE is questionable.
Limits of the Study
Although the outcome of our study is quite distinctive with three separated groups for mechanical efficiency indices, some weaknesses could affect this conclusion. Letting each participant choose their own natural cadence could have influenced the outcome, as cadence is known to affect the efficiency indices [8]. Although we acknowledged this, the same cadence was not chosen to impose for everyone, as we were interested in individual differences and dividing individuals into different classes based on their natural cycling patterns. We felt that imposing an unnatural cadence to subjects could interfere with that aim. It should be also mentioned that we did not record cadence from pedal revolution to another, which means there might be a small load to load sway in cadence for each participant. This deviation then mostly affects Wext-Etot regression line, and hence, values of DE and WEe.
In this study, both male and female subjects were included, as our main interest was to compare different indices of mechanical efficiency for subjects of broad backgrounds. We acknowledge that there is a mild gender difference in GE, E0,m, and E0,e, but that they can be explained mostly by the difference in lean leg volume [17]. As interindividual variation in GE and E0 can in general be explained mostly by body mass and especially by leg mass [11, 14, 26], we felt that gender question was not too restricting in our approach: allowing also female subjects to take part, we felt that we mainly expanded our study to include also lighter body masses. It should be mentioned that the results are unaltered when analyzed with men only (data not shown).