Annual Review 2017/2018
Methodology behind the analysis presented in the Annual Review
We use a series of regression analyses and additional tests to analyse the quantitative data obtained from the Survey.
The 36 questions that form our core Survey are answered on a five-point Likert scale (strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, strongly disagree). This is an ordinal scale, i.e. one where responses can be sorted by a rank order. The primary regression model we apply to the data is therefore an ordered logit model.
Regressions are run for every question separately (so there are 36 different regressions for the 36 Survey questions), and at the level of the individual respondent. Using data from both 2016 and 2017, we ran our models across a total of 64,390 lines of data.
The left-hand side variable for each regression is the outcome that we are trying to explain, i.e. the five-point scale response to the question concerned. We also gather from Survey respondents demographic and institutional data relating to gender, tenure, location, role type, business type, and firm. We use these factors as the right-hand side explanatory variables in our models.1
The outputs and their interpretation
The coefficients for all variables are calculated and need to be interpreted relative to a base. The results for the variable ‘Line manager’, for example, should be interpreted relative to not having line management duties. (See Figure 1) shows the explanatory variables in our models and the base in each case.
The coefficients for this type of model are expressed as odds ratios. If, for example, the odds ratio on Q1 for the variable of ‘female’ is 1.2, this means that – controlling for all the other factors in our model – being female raises the odds of agreeing with the statement in Q1, relative to being male. Odds ratios of greater than 1 imply a positive likelihood, and vice versa.
For ease of interpretation, we reverse the order of the Likert scale where a question is negatively phrased, so that higher odds ratios imply a more positive likelihood for all questions.
Our regressions give us two pieces of information; an odds ratio (the size of the coefficient) and a p-value (whether the variable is statistically significant in explaining the outcome). To be able to observe patterns more easily across all the ordered logit regressions for our survey questions, the results are presented in the visual format shown in Figure 2. The size of the circles denotes the size of the impact of the variable, 2 and the colour reflects whether the variable is statistically significant, and if so in which direction green for positive and red for negative). The results of the regressions across all of our Survey questions is presented in this format in Figure 5.
Validating the results
Our ordered logit regressions violate one of the assumptions of such models – the parallel lines assumption. We explain the nature of this and how this is dealt with in greater detail in the box below. To validate the results of our primary regression model, we conduct additional non-parametric test and run further generalised ordered logit regressions. If the results across all our regressions and tests are consistent, we can be reasonably confident of the direction of the relationship between an explanatory variable and the outcome of interest, as well as the broad size of the effect. In practice, we have found that, for most major cases we explore, the regressions and tests validate each other.Fig 38 Series of regressions and tests
In presenting our analysis in this Annual Review we report primarily the results of our main (ordered logit) model. Where these results are contradicted through other tests, however, we report this as well.
Regression and tests for validation
An important assumption of ordered logit models is that the relationship between an explanatory variable and the dependent variable should not change for the different categories (in this case, the steps of the Likert scale). In other words, the odds of being in the lowest category (strongly disagree, for a positively phrased questions) versus all higher categories of the response variable are the same as the odds between the second lowest category (somewhat disagree) and all higher categories, etc. This is known as the proportional odds or parallel lines assumption.
Brant tests show that, in our regressions, this assumption is violated. Some literature does, however, suggest that tests of the proportional odds assumption are ‘anti-conservative’; they almost always result in the assumption being rejected, particularly when the number of explanatory variables or the size of the sample is large.3 To gain greater certainty in our results we conduct two further sorts of analysis.
First, we run non-parametric tests (Mann-Whitney and χ2) to test for differences between distributions. As an example, we test for each survey question whether the shift in the ordinal distribution from 2016 to 2017 is statistically significant or not.4
Second, we run generalised ordered logit regressions. This method has the advantage of freeing up the variables from the proportional odds assumption, and is therefore more statistically correct. It also allows us to see how the odds ratio vary at the different thresholds (the steps of the Likert scale). This type of model does, however, introduce greater complexity by generating four sets of coefficients for each regression, making it difficult to present results across our 36 core Survey questions in a way that allows the easy identification of patterns.
Third, we run a simpler logit model by collapsing the two most favourable response categories (strongly agree and agree for positively phrased questions) into one, the two least favourable response categories (strongly disagree and disagree for positively phrased questions) into one, and ignoring all neutral responses.
We compare the results of all our regressions and tests. Where these are consistent, we can be reasonably confident of the direction and size of the relationship between an explanatory variable and the outcome of interest. Where they are not, we report on this inconsistency.
Analysing qualitative information in a rigorous and consistent manner presents its own challenges. It was, however, a challenge that we were keen to meet, given the potential value of information from focus groups in particular about what distinguished firms or business areas that appeared to be doing something well. To help us identify differences and similarities between organisational environments associated with relatively high or low Survey results, we worked with academics from the London School of Economics to apply a technique known as ‘grounded theory’.
Grounded theory is a qualitative research methodology that allows the systematic discovery of novel propositions from a data set. It does this through making comparisons between cases that differ on some observed dimension.
We applied this technique to the Assessment evidence, focused on our three themes. We began by looking across the different business areas of participating firms as a whole, to identify those business areas that had either relatively high or low Survey scores on questions relating to our themes. We then selected (where available) the focus groups that comprised participants drawn from these business areas. We therefore put aside, for the purpose of this analysis, focus groups, putting aside all of the other focus groups from business areas that did not have relatively high or low scores, or were excluded for other reasons (e.g. because participants were drawn from a mixture of business areas within a firm). This is a sampling method that maximises the differences between cases; it does not attempt to represent the full spectrum of material available.
We then proceeded methodically to identify words, issues or observations that were mentioned in these focus group discussions. Pairs of analysts worked first independently and then together to interpret and code the focus group output.
This allowed us to identify codes (i.e. words, issues, observations) that were commonly used among focus groups drawn from the set of higher-scoring business areas and markedly different from those drawn from the set of lower-scoring business areas, as well as codes that were equally common across both sets of focus groups.
To ensure that we were as far as possible focusing on differences stemming from the organisational environment and culture rather than from the business area itself, a focus group was only included in the analysis if we could also include one or more focus groups from similar business areas that had very different scores. Coding from a focus group drawn from a relatively high scoring Risk & Compliance area, for example, would only be included in our analysis if we could also include focus groups with one or more relatively low scoring Risk & Compliance areas.Fig 3. Qualitative analysis: choosing focus groups based on Survey scores
As the sample of focus groups chosen is not built to be representative, further testing would of course be required to establish causal relationships. Nevertheless, by helping identify differences and similarities between sets of business areas with very different Survey scores, the results of this analysis will we hope contribute to our (and firms’) understanding and interpretation of the BSB Employee Survey results.Fig 4. Identifying practices associated with good environments