Skip to main content

Table 2 Final list of items ranked by level; corresponding definitions are also included

From: Consensus on measurement properties and feasibility of performance tests for the exercise and sport sciences: a Delphi study

 

Item

Definition

Level 1

Re-test reliability

The consistency of performers(s) results over repeated rounds of testing conducted over a period of typically days or weeks. This represents the change in a participant’s results between repeated tests due to both systematic and random error, rather than true changes in performance [27, 36, 46]

 

Intra-rater

The agreement (consistency) among two or more trials administered or scored by the same rater [4, 47]

 

Inter-rater

The level of agreement (consistency) between assessments of the same performance when undertaken by two or more raters [4, 46, 47]

 

Content validity

How well a specific test measures that which it intends to measure [4, 27]

 

Discriminant validity

The extent to which results from a test relate to results on another test which measures a different construct (i.e., the ability to discriminate between dissimilar constructs) [42, 48, 49]

 

Responsiveness/sensitivity to change

The ability of a test to detect worthwhile and ‘real’ improvements over time (e.g., between an initial bout of testing and subsequent rounds) [42, 50–54]

 

MID/SWC

The smallest change or difference in a test result that is considered practically meaningful or important [55–58]

 

Interpretability

The degree to which practical meaning can be assigned to a test result or change in result [25, 28]

 

Familiarity required

The need to undertake a test familiarisation session with all participants prior to main testing in order to reduce or eliminate learning or reactivity effects [4]

 

Duration

Expected and/or actual duration of the testing protocol [59, 60]

Level 2

Stability

The consistency of performer(s) results over repeated rounds of testing conducted over a period of months or years [40, 42, 61, 62]

 

Internal consistency

The degree of inter-relatedness among test components that intend to measure the same construct/characteristic [28]

 

Convergent validity

The extent to which results from tests that theoretically should be related to each other are, in fact, related to each other [42, 49]

 

Concurrent validity

The extent to which the test relates to an alternate, previously validated measure of the same construct administered at the same time [42, 63]

 

Predictive validity

The extent to which the test relates to a previously validated measure of a theoretically similar construct, administered at a future point in time [42, 63]

 

Floor and ceiling effects

The ability of a test to distinguish between individuals at the lower and upper extremities of performance (i.e., ability to distinguish between high results (ceiling effect) and low results (floor effect)) [28, 64]

 

Scoring complexity

The ease with which a test can be conducted and scored in a practical setting by the test administrator [65, 66]

 

Completion complexity

The ease with which a test can be completed by a participant [65–67]

 

Cost

The total amount of resources required for test administration including equipment, time, and administrator expertise/experience [25]

  1. Reference support for each definition has also been provided
  2. MID minimum important difference, SWC smallest worthwhile change