A brief history
While involved in editing the book, Clean Language Interviewing,[1] I decide to document the origin and development of the ‘cleanness rating’. Below is the story so far (dates are publication dates):

July 2010.[2] I produced an Expert Analysis Report for the Work-Life Balance (WLB) project. It included the statement:

These functions [of Clean language] allied with the requirement for minimal introduction of the questioner’s metaphors enables each of the interviewer’s questions/statements to be assessed for their ‘cleanness’.

Cleanness is a continuum. It would be possible to devise a scale (of say 1-5) that assessed the cleanness of each of the interviewers questions or statements and thereby produce a quantitative analysis of the overall cleanness of an interview. I have not had the time to do this. Instead I have read through the transcripts paying attention to questions that deviate from the standard clean questions. Having done this for many kind of analyses over the last 15 years I am highly attuned to ‘unclean’ questions.

For this research to meet its objectives it was vital that the interviews were conducted using ‘pure’ Clean Language. While the format of the most commonly asked clean questions was specified by David Grove, in a real-life interview sometimes the questioner has to modify, adjust or create a clean question depending on the interviewee’s wording and logic. …

My overall qualitative assessment is that the interviews were authentic examples of Clean Language. While there were a few questions which were not spotless, they were rare.

October 2010.[3] The report More than a Balancing Act?: ‘Clean Language’ as an innovative method for exploring work-life balance, was jointly published by the University of Surrey and the Clean Change Company. Authors included Margaret Meyer, Rupert Meese, Wendy Sullivan, Paul Tosey and myself. This report incorporated my expert analysis.

May 2011.[4] Paul Tosey presented a paper, “Symbolic Modelling” as an innovative phenomenological method in HRD research: the work-life balance project, at the 12th International HRD conference.

July 2011.[5] Almost exactly one year after I wrote the first Expert Analysis Report, I added ‘Appendix B: Analysis of Questions asked during Six WLB Interviews’. This included an analysis of the 242 questions asked during the face-to-face interviews. Concluding:

  • Less than 1% of questions were content-leading.
  • 99% of questions met the criteria of ‘clean’. Of which:
    • 88% were ’standard’ clean questions
    • 4% were predetermined WLB research questions
    • 7% were non-standard but still clean questions.

Clearly the cleanness rating categories are indicated, however, at this stage they had not been named. Although Wendy Sullivan has noted:[6]

Actually ‘Contextually Clean’ was born in 2009, a year before the WLB project, when Di Tunney invited me to observe her + a fellow researcher interviewing for a market research project on sweetcorn – the project we are writing up for the CLI book, as it happens – and asked me to work out how they could clean up the questions they were asking. It very quickly became clear that there were some not-Clean questions that I couldn’t clean up and that did need to be asked, given the research aims. Broadly speaking the non-clean words used in those questions related to concepts that had a close relationship to that context and so would be expected, and we came up with the label of ‘contextually Clean’ for those questions.

September 2021.[7] Penny Tompkins and I devised activities for the participants on Wendy Sullivan’s Clean Change Company Module 8 training in Symbolic Modelling to assess the cleanness of two interviews, one by an experienced clean language interviewer and another a publicly available interview. Participants were instructed to:

a. Read both interviews.

b. Evaluate each interview for it’s clean-ness on a scale of 0-10 where, given the context, 0 is highly leading and 10 is squeaky clean.

c. Mark your two scores on the flip-chart.

d. In 3’s:

– Identify questions that illustrate your criteria for assessing clean-ness of an interview, given the context.

– What you would tell someone who had to do an assessment of an interview for its ‘cleanness’, e.g. What would you tell them to assess for? What criteria to use?

January 2014.[8] The WLB research was formally published by Paul Tosey, Rupert Mees and myself in the British Journal of Management as ‘Eliciting Metaphor through Clean Language: an Innovation in Qualitative Research’:

For the purposes of this study it was important to establish that the interviews were authentic examples of the Clean Language questioning technique. In the judgement of the expert analyst, the face-to-face interviews constituted an authentic application both at a micro level (through appropriate use of Clean Language questions) and as a process of modelling. Analysis of the six initial interviews revealed that 242 questions were asked in total, ranging from 31 to 53 questions per interview, with an average of 40. Of these, 99% met the criterion of being ‘clean’. The 11 basic Clean Language questions given in Table 1 accounted for 85% of all questions during the interview proper. Indeed the interviewer was considered to have set a benchmark that any future research using this method could seek to emulate. (p. 639)

August 2014.[9] As part of a research project undertaken with Susie Linder-Pelz, I produced a Protocol for Validating ‘Cleanness’ of an Interview. Unfortunately I can only locate v2, a slightly edited protocol produced a week later which defined four categories of ‘The Cleanness Rating’ as: Classically & Contextually Clean; Mildly & Strongly Leading.

September 2014.[10] Susie and I published a Protocol for Validating ‘Cleanness’ of an Interview v3. This protocol defined how to allocate each interviewer’s contribution to one of six ‘cleanness rating’ categories:

Classically clean question
Clean statement
Contextually clean question
Clean research question
Mildly/Potentially leading
Strongly leading

October 2014.[11] As part of this research project I organised a cleanness rating day, which, to my knowledge, was the first have multiple raters (nine) assessing interview transcripts (it used the Protocol for Validating ‘Cleanness’ of an Interview v3).

September 2015.[12] Susie and my made passing reference to the above protocol when our research was published in the International Coaching Psychology Review. As far as I can tell this the first publication which names the categories of the cleanness rating:
To check the ‘cleanness’ of the interviews, we invited nine experienced Clean Language practitioners and researchers, working in teams, to give each of the inter- viewer’s questions and statements a ‘clean-ness rating’ (classically clean, contextually clean, mildly leading or strongly leading). The tabulated results were used to arrive at an overall assessment for each interview. The reviewers found that, on average, the interviewer contributed 50 questions or statements in each interview, 40 of which were classically or contextually clean, eight mildly leading and two strongly leading. While this was not quite as ‘clean’ as other Clean Language-based research (Tosey et. al., 2014), the reviewers concluded that the interviews substantially adhered to the CLI protocol and were, therefore, fit for the purpose of this research.

1 June 2016.[13] Susie and my follow-up article, published in Coaching: An International Journal of Theory, Research and Practice, was the first publication to define the categories. It explicitly referenced the 2014 protocol.

To check how closely the interviews adhered to the CLI protocol, a team of experienced Clean Language practitioners and researchers (not involved in this study) allocated one of four ‘cleanness’ ratings to every interviewer question or statement (Lawley & Linder-Pelz, 2014):

  • Classically clean: Drawn from the original Clean Language question set using only the interviewee’s words.
  • Contextually clean: Introduces only ‘neutral’ words based on the context of the research or logic inherent in the interviewee’s information.
  • Mildly leading: Introduces words with the potential to lead but with no discernible affect on the interviewee’s answers.
  • Strongly leading: Introduces words (especially metaphors), presuppositions, frames or opinions that cast doubt on the authorship of interviewee answers.

The tabulated results were used to arrive at a summary assessment for each interview. The reviewers concluded that the interviews adhered substantially to the CLI protocol and were appropriate for the purpose of this research.

2017.[14] Jan Nahyba and Petr Svojanovský’s chapter in Becoming a Teacher: The dance between tacit and explicit knowledge, described the use of CLI in a Masaryk University research project involving 44 in-depth interviews. They used the cleanness rating to analyse four interviews, each conducted by a different interviewer, comparing the results in relation to the amount of CLI training undertaken by the interviewers. They concluded that significant training is required to achieve high cleanness ratings.

2017.[15] I wrote a chapter in the same book which included a section on the ‘Need for a “cleanness” rating’. It provided the results of the use of the rating for 15 Clean Language interviews conducted by three experienced interviewers during three published research projects. This showed that ratings of close to 90% clean (and less than 2% ‘strongly leading’) could consistently be achieved by experienced Clean Language interviewers.

November 2017.[16] Heather Cairns Lee’s used CLI to interview 30 business leaders for her PhD. Of these three were rated by an external expert rater and Heather rated the other 27 herself, making this the largest use of the rating to date. Heather’s analysis confirmed that a 90% clean and less than 2% strongly leading rating could be maintained over a large number of interviews.

January 2020.[17] Jan Nehyba and I were the first to statistically analysed the inter-rater reliability of the cleanness rating using 19 cleanness ratings undertaken by multiple raters. Our results were published in the Journal of Consciousness Studies:

Also presented is a new systematic third-person method of validation that evaluates the questions and other verbal interventions by the interviewer to produce an adherence-to-method or ‘cleanness’ rating. A review of 19 interviews from five research studies provides a benchmark for interviewers seeking to minimize leading questions. The inter-rater reliability analysis demonstrates substantial agreement among raters with an average Intraclass Correlation Coefficient of 0.72 (95% CI). We propose that this method of validation is applicable not only to CLI but to second-person interviews more generally. (p. 94)

August 2021.[18] The most authoritative published description of the cleanness rating to date can be found in Heather Cairns-Lee, Paul Tosey and my article for The Journal of Applied Behavioral Science. It was the first to publish a typology of how interview questions can lead responses.

July 2022.[19] The first book on Clean Language Interviewing is publish with the cleanness rating featuring in chapter 2.

Conclusion. The idea of a ‘cleanness rating’ was born on the work-life balance project (2010) but was not formalised until my project with Susie Linder-Pelz when the names and definitions were allocated to the categories (2014). The term ‘contextually clean’ was adopted from Wendy Sullivan and Di Tunney’s (2009) project.

