The main result of the present study was that the six investigated indicators for mechanical efficiency formed three separated groups by rank correlation: the first group (group I) was formed by GE, NE, and *T*; the second one (group II) by DE and WE_{e}; and the third one (group III) by WE_{m}. Identical grouping would be achieved also by Somers’ D function (not shown). There were strong correlations within the groups, whereas correlations between the groups were at most moderate. As the six indicators of mechanical efficiency fall into different groups, they can be interpreted to measure fundamentally different aspects of mechanical efficiency.

One speculative reason for the observed grouping might be that the baseline subtraction is altogether an erroneous way to approach efficiency: in this paper, it is shown that *E*_{0,e} does not differ from *E*_{rest}, that the confidence interval for DE is too large to be reliable, and that *WE*_{m} provides too large values for work efficiency. These facts give rise to a question of whether DE and WE measure what they supposedly should measure. Previously, there has been mostly theoretical criticism against NE, DE, and WE [8, 12, 13, 25], but the present study is one of the rarely seen methodological study to address this question.

In theory, as shown in the “Methods” section, every indicator of mechanical efficiency approaches DE if pedaled external work could increase unbounded, as the role of internal energy expenditure *E*_{0} comes negligible compared to total energy expenditure *E*_{tot}. That the efficiency indices did form different groups was because one cannot pedal large enough intensities aerobically; theoretically, DE and GE are not near each other (0.5% point) until at 1350 W intensity. This is somewhat contrast to the case study of world-class champion in [40], where the GE and DE were found to be within 0.1% point distance already at 300–400 W power output. Based on the estimations of the present study, this kind of unity between GE and DE at so low power level is highly exceptional.

The measured efficiency indices of the present study seem to be in line with the literature. From a review article [8] in literature, GE has mostly been around 18–20% at 150 W ([8], Fig. 2), and mean ±SD for DE from 14 studies was 23.8± 2.6% ([8], Table 1). These values are well in line with the present study with GE =20.0 ± 0.8% and DE=23.8 ± 1.9%.

### Efficiency Groups

In general, group I can be interpreted to illustrate the mechanical efficiency of a whole body in a cycling work. GE and *T* belong to the same group, understandably, as the former one is a refined version of the latter one. The fact that NE belongs to this group indicates that there are no great differences nor adaptations in resting energy expenditure between individuals. On the other hand, in theory, groups II and III try to grab the efficiency of an isolated musculoskeletal system in a cycling work by subtracting, in a one form or another, zero load energy expenditure from the examination. Hence, it seems that *E*_{0} plays a role, and a bigger one than *E*_{rest}, when trying to explain why efficiency indices fall into different groups. The importance of *E*_{0} is well in line with a previous study [41], in which it was argued that differences in zero load cycling between individuals explain some of the observed variation in GE between individuals. Another possible reason for the differences between the groups lies in the difficulty and uncertainty of determining WE_{e} and DE from *W*_{ext}-*E*_{tot} regression line.

DE and WE_{e} belong to the same group as they are both calculated from the same *W*_{ext}-*E*_{tot} regression line. Furthermore, at each observation point, WE_{e} can be seen as an inverse of a slope of a line through that observation point and *E*_{0,e}, so that WE_{e} can be interpreted, more or less, as a local delta efficiency. If all the observation points would fall on the same straight line, WE_{e} and DE would coincide.

One can find indirect support from literature to this grouping. When studying correlations between different physiological factors to indicators of mechanical efficiency, it has been demonstrated that physiological factors affect differently to indicators from different groups. For example, a measure from group I has been reported to be significantly affected while the measure from group II has not, e.g., by the temperature of the skin [32], VO_{2max} [10], and body weight [26]. In addition, group I has been reported to be affected while group III has not, e.g., by training [41]. Thus, the literature shows that, based on correlation to physiological factors, there seem to be some groupings for mechanical efficiency indices supporting indirectly our grouping.

### Accuracy of *W*
_{ext}-*E*
_{tot} Regression Line and DE

It has been observed that the repeatability of DE is significantly weaker than GE [10], but this phenomenon has eluded explanations. Here, we argue that this phenomenon can be explained by the weak accuracy of *W*_{ext}-*E*_{tot} regression line, which is caused mainly by using too few observation points, typically 3 [37, 38], 4 [39], or at most 6 [32, 42]. In the present study, we replicated the usual way to calculate efficiency indices, which was the reason to include only 3–5 points to our *W*_{ext}-*E*_{tot} regression line. As the value (95% CI) for DE was 22.6% (19.2–26.1%) and for *E*_{0,e} 6.9 kJ/min (− 16.6–30.4 kJ/min), it is plainly clear that more observation points would be required for a reliable *W*_{ext}-*E*_{tot} regression line, and hence, reliable DE and *E*_{0,e} estimates. Noteworthy, the coefficient of determination, *R*^{2}, is unable to distinguish this problem, as *R*^{2} value for *W*_{ext}-*E*_{tot} regression line in our study was 0.996± 0.004. It means that *R*^{2} is far from a sufficient test for explaining the accuracy of *W*_{ext}-*E*_{tot} regression line when there are too few observation points; after all, *R*^{2} with two observation points is always 1.00, although this kind of estimation contains huge potential error. Another factor, besides the number of observation points, affecting reliability of *W*_{ext}-*E*_{tot} regression line is cadence. It affects energy expenditure, and applying linear regression from [8] we can, purely theoretically, estimate that using four observation points DE can change as much as 1.1 %-points by only altering cadence from 80 by ±1 rpm. As keeping cadence closer than ± 1 rpm to a target cadence during the test is very challenging, it becomes clear that there is quite large built-in imprecision potential in DE measurements. In contrast, keeping cadence 80 ± 1 rpm affects theoretically GE only by 0.1% points.

A clear proposal to improve the accuracy of *W*_{ext}-*E*_{tot} regression line would be to use more data points. For example, in the study of Medbø et al. [43], it has been suggested to use at least 10 observation points when estimating *W*_{ext}-*E*_{tot} regression line. Another way to improve the estimation would be to include only aerobic intensities. For example, some efficiency studies have included 270 W loads for women [42] and 300 W loads for men [32] when calculating DE. However, without measuring blood lactates, the amount of anaerobic energy expenditure cannot accurately be estimated for these intensities. Not to mention about the potential impact of slow VO_{2} component, which can be present already when the intensity exceeds 50% VO_{2max} [18, 19] skewing the linearity of *W*_{ext}-*E*_{tot} relation. Last proposal to get more precise DE would be to monitor accurately the used cadence.

It should be clear in mind that the accuracy of *W*_{ext}-*E*_{tot} regression line has more profound meaning than only that of determining DE and *E*_{0,e} as it is also used, e.g., to extrapolate theoretical energy (or oxygen) consumption at high-intensity works [43, 44].

### WE_{e} vs. WE_{m}

It has been widely recognized how *E*_{0,m} is much greater than *E*_{0,e}, the difference ranging from 20 to 350% [16, 17], being 140% in the present study. This means that they both cannot accurately describe *E*_{0} which they supposedly illustrate. Above, we have argued how the accuracy of *E*_{0,e} is quite weak based on CI. Another, often ignored, charge against *E*_{0,e} is that it does not differ from *E*_{rest} (*p* = 0.60, Fig. 2), with half of the subjects in the present study having smaller *E*_{0,e} than *E*_{rest}, which sounds abnormal. Similar values can be seen, e.g., in [27]. One explanation might be that *E*_{0,e} does not actually illustrate the energy expenditure which it has been thought to illustrate. For example, it has been observed in [15] that internal work is neither constant nor independent from external work. This can be interpreted so that, although *W*_{ext}-*E*_{tot} connection would be linear, the energy expenditure of a zero load is not found at the intersection with *y*-axis, as we do not know how the internal work is related to the total energy expenditure at different loads. One could also try to explain the possibility of *E*_{rest} to be truly higher than *E*_{0}, as starting an exercise against zero load could in principal increase the work load of the heart and legs but at the same time reduce even more the work load of other parts of the body, e.g., the digestive system and internal organs, but this is highly speculative.

On the other hand, also *E*_{0,m} has many problems. Firstly, there are all the theoretical explanations, shown in the “Background” section, how *E*_{0,e} offers better approximation for *E*_{0} than *E*_{0, m}. Moreover, in literature, *E*_{0,m} (and thus WE_{m}) has been discarded because of its too high values [7, 8]. An isolated muscle has theoretically been discussed to have mechanical efficiency at most 30% [12, 13]. In the present study, WE_{m} was 32.0± 2.9% (range 28.0–38.3%) and hence, too high for a mechanical efficiency of an isolated musculoskeletal system in a cycling work where the usage of elastic energy is minimal [45]. All in all, *E*_{0,e} seems too small to be true energy expenditure for zero load and *E*_{0,m} too large, and hence, both of them (and thus WE_{e} and WE_{m}) seem to contain unanswered methodological problems. More specifically, as was reported above, WE_{e} and DE are quite parallel measurements for a mechanical efficiency, and as such, if the problems related to WE_{e} cannot be solved, it casts doubts also on DE, even though its theoretical base would otherwise be firm enough.

### Methodological Doubts on Baseline Subtractions

To bring the discussion to a conclusion, we have now seen how methodologically DE, WE_{e}, and WE_{m} all contain problems casting some serious doubts on sensibility of baseline subtractions. The previous doubts against NE, WE, and DE are essentially theoretical considerations based on the facts that energy expenditure cannot be divided into separated components and that the baseline subtractions are not invariant with different work intensities [8, 12, 13, 25]. The doubts of the present study are based on the methodological outcomes: essentially that WE_{m} is too large, that *E*_{0,e} is too similar to *E*_{rest}, and that 95% CI of DE and *E*_{0,e} are too wide. Based on these findings, NE would be the only methodologically sound mechanical efficiency index with a baseline subtraction. However, both NE and GE belong to the same group I. Thus, one can argue that GE carries basically the same information than NE, but without an additional inconvenience and possible source of error by having to measure *E*_{rest}. In this way, the present study suggests methodologically that also the need for NE is questionable.

### Limits of the Study

Although the outcome of our study is quite distinctive with three separated groups for mechanical efficiency indices, some weaknesses could affect this conclusion. Letting each participant choose their own natural cadence could have influenced the outcome, as cadence is known to affect the efficiency indices [8]. Although we acknowledged this, the same cadence was not chosen to impose for everyone, as we were interested in individual differences and dividing individuals into different classes based on their natural cycling patterns. We felt that imposing an unnatural cadence to subjects could interfere with that aim. It should be also mentioned that we did not record cadence from pedal revolution to another, which means there might be a small load to load sway in cadence for each participant. This deviation then mostly affects *W*_{ext}-*E*_{tot} regression line, and hence, values of DE and WE_{e}.

In this study, both male and female subjects were included, as our main interest was to compare different indices of mechanical efficiency for subjects of broad backgrounds. We acknowledge that there is a mild gender difference in GE, *E*_{0,m}, and *E*_{0,e}, but that they can be explained mostly by the difference in lean leg volume [17]. As interindividual variation in GE and *E*_{0} can in general be explained mostly by body mass and especially by leg mass [11, 14, 26], we felt that gender question was not too restricting in our approach: allowing also female subjects to take part, we felt that we mainly expanded our study to include also lighter body masses. It should be mentioned that the results are unaltered when analyzed with men only (data not shown).