The unscience of evaluation

Evaluation is notoriously under done in the corporate sector.

And who can blame us?

With ever increasing pressure bearing down on L&D professionals to put out the next big fire, it’s no wonder we don’t have time to scratch ourselves before shifting our attention to something new – let alone measure what has already been and gone.

Alas, today’s working environment favours activity over outcome.

Pseudo echo

I’m not suggesting that evaluation is never done. Obviously some organisations do it more often than others, even if they don’t do it often enough.

However, a secondary concern I have with evaluation goes beyond the question of quantity: it’s a matter of quality.

As a scientist – yes, it’s true! – I’ve seen some dodgy pseudo science in my time. From political gamesmanship to biased TV and clueless newspaper reports, our world is bombarded with insidious half-truths and false conclusions.

The trained eye recognises the flaws (sometimes) but of course, most people are not science grads. They can fall for the con surprisingly easily.

The workplace is no exception. However, I don’t see it as employees trying to fool their colleagues with creative number crunching, so much as those employees unwittingly fooling themselves.

If a tree falls in the forest

The big challenge I see with evaluating learning in the workplace is how to demonstrate causality – ie the link between cause and effect.

Suppose a special training program is implemented to improve an organisation’s flagging culture metric. When the employee engagement survey is run again later, the metric goes up.

Congratulations to the L&D team for a job well done, right?

Not quite.

What actually caused the metric to go up? Sure, it could have been the training, or it could have been something else. Perhaps a raft of unhappy campers left the organisation and were replaced by eager beavers. Perhaps the CEO approved a special bonus to all staff. Perhaps the company opened an onsite crèche. Or perhaps it was a combination of factors.

If a tree falls in the forest and nobody hears it, did it make a sound? Well, if a few hundred employees undertook training but nobody measured its effect, did it make a difference?

Without a proper experimental design, the answer remains unclear.

Scientist holding two flasks, each containing blue liquid.

Evaluation by design

To determine with some level of confidence whether a particular training activity was effective, the following eight factors must be considered…

1. Isolation – The effect of the training in a particular situation must be isolated from all other factors in that situation. Then, the metric attributed to the staff who undertook the training can be compared to the metric attributed to the staff who did not undertake the training.

In other words, everything except participation in the training program must be more-or-less the same between the two groups.

2. Placebo – It’s well known in the pharmaceutical industry that patients in a clinical trial who are given a sugar pill rather than the drug being tested sometimes get better. The power of the mind can be so strong that, despite the pill having no medicinal qualities whatsoever, the patient believes they are doing something effective and so their body responds in kind.

As far as I’m aware, this fact has never been applied to the evaluation of corporate training. If it were, the group of employees who were not undertaking the special training would still need to leave their desks and sit in the classroom for three 4-hour stints over three weeks.

Why?

Because it might not be the content that makes the difference! It could be escaping the emails and phone calls and constant interruptions. It could be the opportunity to network with colleagues and have a good ol’ chat. It might be seizing the moment to think and reflect. Or it could simply be an appreciation of being trained in something, anything.

3. Randomisation – Putting the actuaries through the training and then comparing their culture metric to everyone else’s sounds like a great idea, but it will skew the results. Sure, the stats will give you an insight into how the actuaries are feeling, but it won’t be representative of the whole organisation.

Maybe the actuaries have a range of perks and a great boss; or conversely, maybe they’ve just gone through a restructure and a bunch of their mates were made redundant. To minimise these effects, staff from different teams in the organisation should be randomly assigned to the training program. That way, any localised factors will be evened out across the board.

4. Sample size – Several people (even if they’re randomised) can not be expected to represent an organisation of hundreds or thousands. So testing five or six employees is unlikely to produce useful results.

5. Validity – Calculating a few averages and generating a bar graph is a sure-fire way to go down the rabbit hole. When comparing numbers, statistically valid methods such as Analysis of Variance are required to get significant results.

6. Replication – Even if you were to demonstrate a significant effect of the training for one group, that doesn’t guarantee the same effect for the next group. You need to do the test more than once to establish a pattern and negate the suspicion of a one-off.

7. Subsets – Variations among subsets of the population may exist. For example, the parents of young children might feel aggrieved for some reason, or older employees might feel like they’re being ignored. So it’s important to analyse subsets to see if any clusters exist.

8. Time and space – Just because you demonstrated the positive effect of the training program on culture in the Sydney office, doesn’t mean it will have the same effect in New York or Tokyo. Nor does it mean it will have the same effect in Sydney next year.

Weird science

Don’t get me wrong: I’m not suggesting you need a PhD to evaluate your training activity. On the contrary, I believe that any evaluation – however informal – is better than none.

What I am saying, though, is for your results to be more meaningful, a little bit of know-how goes a long way.

For organisations that are serious about training outcomes, I go so far as to propose employing a Training Evaluation Officer – someone who is charged not only with getting evaluation done, but with getting it done right.

5 thoughts on “The unscience of evaluation

  1. Lots of very good points here. I think that many folks skip evaluation in part because it’s seen as a huge effort. It can be kept simple…but you’re right, in that we just have to keep some basic guidelines in mind to make sure the evaluation tells us a realistic story.

  2. I enjoyed your blog post, I think science has a lot to teach us about rigorous evaluation and trial design.

    I wanted to draw your attention to one error. The placebo effect is colloquially understood as mind over matter and the mysterious power of the mind to fix the body when it thinks it is being given an intervention.

    In research science, this isn’t actually what the placebo effect refers to and very few research scientists believe that the mind can heal the body in the way that has been popularised by people like Ben Goldacre. In reality, the placebo effect of a trial is a shorthand term for all of the biases that might be present in the data, rather than the patient.

    One example of this is regression to the mean, in other words, that people often get better over time regardless of the medicine they are given. Another example is experimenter bias, wanting to please the experimenter which in turn skews how you answer any questions that they might ask.

    In the context of your piece, this is very relevant as, in particular, in self-reported surveys, the potential for bias is very high. If you’re interested in reading more I have enclosed a link to an article written by someone much better versed in this than I am: https://sciencebasedmedicine.org/placebo-are-you-there/

    One other element that I think would be useful to include would be ‘blinding’, in the case of training it would be ideal if the person who is responsible for conducting training doesn’t also do the evaluation as, due to their knowledge of who took part, are more likely to present the data in a beneficial way, even unintentionally.

  3. Thanks for your comments, Tom. I’ll have to read about the placebo effect more, including that article.

    And blinding is an excellent addition. Cheers!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.