Randomized Response Technique
Because the doping issue is sensitive and admitting to use can be compromising, the survey used the randomized response technique (RRT) for questions on doping and use of medication. The rationale for this was twofold. First, the RRT has been shown to generate more reliable responses than those obtained by direct questioning [2, 12,13,14,15]. Second, using RRT ensures comparability of the results with other RRT-based doping surveys in recreational sports [16,17,18] and in elite sports [2, 19,20,21]. The primary reason why RRT generates more reliable results is that the method reduces social desirability bias, i.e., the tendency of survey respondents to answer questions in a manner that will be viewed favorably by others [22,23,24,25]. In the present study, this would result in underreporting of “bad” or undesirable behavior, such as the use of prohibited medicine for performance enhancement. RRT does not remove social desirability bias altogether, but if the respondent follows the instruction provided, the method gives a perfect assurance against unwanted exposure of their potential undesirable behavior, thereby reducing respondents’ inclination to be influenced by social desirability. The method works by the additional instruction given to respondents when answering a sensitive question. In our case, the respondent was first instructed to select one of five randomly generated 5-digit numbers. The numbers were generated in a way that ensured equal probabilities of the figures 0 to 9 on all 5 digits. This was to prevent sequence effects which could otherwise have led to biased estimates (see below). The respondents were instructed to use the same 5-digit number throughout the survey. Second, the respondent was given the option to either write down the number (or copy-paste it to their laptop) or have the questionnaire software save the list of numbers for them. This was to eliminate suspicions that the researchers could trace the respondent’s choice of random number. Third, the following instruction was given: “If the last digit of your random number is 1 or 2, please answer the question to the right. If the last digit of your random number is 3, 4 or 5, please answer the question to the left. Otherwise, please answer the question in the middle.” The question to the right was: “Does a week have 7 days?” The question to the left was: “Does a week have 9 days?” And the question in the middle was the sensitive question, for instance: “In 2019, did you knowingly use prohibited substances or methods to enhance your sporting performance?” As it appears, depending on the last digit of the random number, respondents will either answer the sensitive question or a corresponding harmless question. A respondent who complies with the instructions will always answer “yes” to the question to the right, and always “no” to the questions to the left.
The process is illustrated in Fig. 2. A respondent who understands the instructions will also realize that if the instructions are followed a certain proportion of respondents will always reply “yes,” whereby an honest “yes” will not attract attention. Since the researchers do not know the random number generated for the respondent, they cannot make any inferences from a “yes” answer regarding the respondent’s actual behavior. A “yes” may be the result of the respondent answering the question to the right (“Does a week have 7 days?”), or the sensitive question in the middle. The researchers will never know.
However, because the researchers know the distribution from which the random number is generated, they can derive the probability that the respondent is instructed to answer the sensitive question. From this, the estimated proportion of people in the population exhibiting the characteristic (here, respondents who intentionally used prohibited substances) can be calculated.
Despite the instructions, some respondents still do not comply with the procedure [12, 26,27,28]. They are “Instruction-Non-Compliant” (INC). They may deliberately be INC, they may not understand the instruction, or they may simply be making errors. Disregarding the reason, the fact that INC occur reduces the accuracy of the estimate. To control for such biases, the “INC detection model” has been developed [29, 30]. INC detection assumes that RRT estimates of certain specific shares of the population of responders are independent of the probability to answer the harmless questions or the sensitive question. To detect INC, the sample is randomly split into two (normally equally sized) subsamples with different probabilities (Fig. 3). With these two groups, researchers can estimate three population proportions, namely (1) the rate of honest-yes responders, (2) the rate of honest-no responders, and (3) the rate of INC responders.Footnote 1 In this study the probabilities for forced yes and forced no answers were p1y = 0.1, p1n = 0.2, p2y = 0.3, p2n = 0.2. These probabilities were selected to maximize the share of honest responders and thereby maximize the statistical efficiency of the estimator for honest-yes and honest-no responses. The cost of this is an increased variance of the INC estimator [30].
The best estimates for the three population shares are calculated by a maximum likelihood estimation. There are, however, cases when the estimate contravenes the mathematically “artificial” marginal conditions, that none of the shares can be above 100% or below zero. For these marginal cases, the remaining estimators and the likelihood of the data are calculated under the condition that one or two of the estimated parameters equal zero. The solution is the one with the highest likelihood. The procedure is explained in detail in the work by Feth et al. [30] (For the R-Code for the present study, refer to Additional file 3). Due to this unnatural limitation of the estimators, the distribution of yes answers is typically heavily skewed by the replacement of negative solutions from the maximum likelihood estimation with 0. For INC and honest no, this replacement of “illegal” solutions with 0 also leads to skewed distributions while there is an additional effect. As “no” answers can only be honest no or INC, the number of “no” answers will in these cases be fully assigned to the parameter which has not been restricted to 0 (these effects can be seen in the distribution of parameter estimations in Additional file 4). Therefore, we use nonparametric bootstrapping to estimate confidence intervals and for hypothesis testing [31].
The RRT method was chosen after discussing the advantages and disadvantages when compared with direct questioning. The reduction in social desirability bias when using RRT has been substantiated in several studies since its introduction in 1965 [for an overview, see@@ [13, 32]. Yet, some scholars have been skeptical about the method, addressing how the intended effect of the RRT depends on (1) the sensitivity of the question under study [33], (2) the respondents’ level of education for understanding the instructions [12, 15], (3) the respondent’s trust in the protection of their anonymity through the RRT [34].
Evidently, indirect questioning methods will add a cognitive load to respondents, which might lead to mistakes when answering [15, 33, 35]. Such mistakes (e. g. misunderstanding the randomization instruction) would lead to random false answers for both yes and no answers. For no answers, this was accounted for by using the no-INC-detection method, which was designed to measure the share of non-compliant no answers no matter if they were deliberate or due to mistakes. For false yes answers, this detection was not possible because of the sample size.
After careful balancing of advantages and disadvantages, we decided to use RRT for questions on doping, image enhancement, and use of medication, as we assessed these issues to be sensitive for recreational athletes. Additionally, other studies on doping and medicine in recreational and elite sport have used similar methods, which makes comparisons easier. Finally, indirect questioning was recommended by the WADA working group on doping prevalence [2, 9, 16,17,18, 21, 36].
Survey Questions and Dissemination
The original idea was to measure the point prevalence of doping in recreational sports in Europe in the autumn of 2020. However, as most sports were shut down during the COVID-19 pandemic, we could not ask respondents about their current behavior. After postponing for some months, and still no sign of a forthcoming general European reopening of sports and societies, we decided to run the survey in the spring of 2021 and inquire respondents about their behavior in 2019. Obviously, this entails a risk of recall bias, but due to time limitations of the study period, we had to accept this.
Language and Translation
To have a representation of northern, central, and southern Europe in the sample, eight European countries were included in the survey: Norway, Denmark, UK, Germany, Spain, Italy, Greece, and Cyprus. To assist with language issues and troubleshooting, an academic contact person was assigned for each country. The authors covered Denmark, Germany, and Italy, and four European colleagues were invited to cover the remaining five countries (one covering both Greece and Cyprus).
We worked from an English language template where questions, formulations and single words were discussed multiple times to get the best possible phrasing. The survey was then translated from English to six other languages (Greek for Greece and Cyprus, Danish, Norwegian, German, Italian, and Spanish). After translation, the academic partners checked the survey for comprehensibility, compared the version of their language to the English template, and ran small pilots with peers and students.
The questionnaire in the different languages is available at https://fp.socioeconomy.eu/index.php.
Survey Dissemination
The survey was disseminated to recreational athletes aged 15 years and older, primarily via snowball sampling using social media platforms. We engaged student assistants to disseminate the survey in each country (the Greek student covering both Greece and Cyprus). Each student assistant had direct contact with their academic partner. The student assistants established a network where they could share dissemination tactics, experiences, problems, and concerns regarding the dissemination. Still, the success of the students in terms of how many survey responses they generated varied greatly (see results below). During the active 12-week survey period, there were weekly meetings with the academic partners to update each other on the progression of the survey.
Data Quality Control
After data collection, the researchers assessed the data for untrustworthy data that were then deleted. This would, for instance, be records where the respondent reported to be born in 1929 and was still in school or born in 2004 and have obtained a doctorate degree as the highest level of education. Additionally, the time to answer the RRT questions was checked. Had respondents used less than 15 s to answer the first RRT question, the response was deemed untrustworthy. Likewise, data were deleted if the respondent used two seconds or less on the subsequent RRT questions @@[See Ref. [37] for further information on data quality control].
Weighting Procedures
To calculate the results, we applied weighted statistics. Weights for individuals were calculated to correct for the biased distribution in the number of records per country, by gender, and age in our dataset. Weights were calculated for each question separately to address different levels of question or item nonresponse.
For the data included, weights were selected to approach an overall population of recreational athletes in the eight participating countries as estimated from Eurostat population descriptions and the most recent Eurobarometer survey on sports and physical activity and for Norway from the Norwegian national statistical bureau [38,39,40] [for details of the weighting procedure,@@ see [37].