Consensus on measurement properties and feasibility of performance tests for the exercise and sport sciences: a Delphi study

Robertson, Sam; Kremer, Peter; Aisbett, Brad; Tran, Jacqueline; Cerin, Ester

doi:10.1186/s40798-016-0071-y

Sports Medicine - Open

Table 2 Final list of items ranked by level; corresponding definitions are also included

From: Consensus on measurement properties and feasibility of performance tests for the exercise and sport sciences: a Delphi study

	Item	Definition
Level 1	Re-test reliability	The consistency of performers(s) results over repeated rounds of testing conducted over a period of typically days or weeks. This represents the change in a participant’s results between repeated tests due to both systematic and random error, rather than true changes in performance [27, 36, 46]
	Intra-rater	The agreement (consistency) among two or more trials administered or scored by the same rater [4, 47]
	Inter-rater	The level of agreement (consistency) between assessments of the same performance when undertaken by two or more raters [4, 46, 47]
	Content validity	How well a specific test measures that which it intends to measure [4, 27]
	Discriminant validity	The extent to which results from a test relate to results on another test which measures a different construct (i.e., the ability to discriminate between dissimilar constructs) [42, 48, 49]
	Responsiveness/sensitivity to change	The ability of a test to detect worthwhile and ‘real’ improvements over time (e.g., between an initial bout of testing and subsequent rounds) [42, 50–54]
	MID/SWC	The smallest change or difference in a test result that is considered practically meaningful or important [55–58]
	Interpretability	The degree to which practical meaning can be assigned to a test result or change in result [25, 28]
	Familiarity required	The need to undertake a test familiarisation session with all participants prior to main testing in order to reduce or eliminate learning or reactivity effects [4]
	Duration	Expected and/or actual duration of the testing protocol [59, 60]
Level 2	Stability	The consistency of performer(s) results over repeated rounds of testing conducted over a period of months or years [40, 42, 61, 62]
	Internal consistency	The degree of inter-relatedness among test components that intend to measure the same construct/characteristic [28]
	Convergent validity	The extent to which results from tests that theoretically should be related to each other are, in fact, related to each other [42, 49]
	Concurrent validity	The extent to which the test relates to an alternate, previously validated measure of the same construct administered at the same time [42, 63]
	Predictive validity	The extent to which the test relates to a previously validated measure of a theoretically similar construct, administered at a future point in time [42, 63]
	Floor and ceiling effects	The ability of a test to distinguish between individuals at the lower and upper extremities of performance (i.e., ability to distinguish between high results (ceiling effect) and low results (floor effect)) [28, 64]
	Scoring complexity	The ease with which a test can be conducted and scored in a practical setting by the test administrator [65, 66]
	Completion complexity	The ease with which a test can be completed by a participant [65–67]
	Cost	The total amount of resources required for test administration including equipment, time, and administrator expertise/experience [25]

Reference support for each definition has also been provided
MID minimum important difference, SWC smallest worthwhile change

Back to article page