Want to understand big data? Then try understanding little data first

22 Aug
August 22, 2013

Big data is the current hot topic, but is it a case of “Here we go again?”  The next learning and development bandwagon is up and rolling and this time the wheels have been attached to big data.  We’re being told that we’ve got to concentrate on big data; we’ve got to learn about it and we’ve got to embrace it (so some would say), but what’s the point of trying to grapple with big data when most of the people profession can’t really get their heads around small data!

In this post I want to be challenging – as always – but I also wanted to offer some advice and learning as a way of ensuring that you glean something useful from this post, rather than it just being some random rant . .  so that’ll make a change then 😉

Do we know data from big data

The Learning and Performance Institute recently released the first six months of data from their innovative Capability Map.  Of the 983 people who took this personal assessment, only 32% felt able to assess themselves against the capability of ‘data interpretation’ and only 12% of this elite band scored themselves at the highest level.  This demonstrates that – for whatever reason – 68% of respondents felt unable to assess their data interpretation capabilities and only a rather lowly 3.7% of all respondents felt able to indicate that they excel at this skill.  Whichever way you look at it the picture is poor.  I appreciate that this is a generalisation, but as learning and development professionals we just don’t get data.  Given these facts it would seem rather foolish to rush off into the areas of big data when we hardly seem at ease with simple numbers.

What would we do with big data anyway?

But let’s say we did get data – which we don’t. What on earth would we do with big data anyway?  In his excellent post about big data Sukh Pabial is exceptionally honest when he discusses the benefits or otherwise of big data, saying: “So where does HR fit into all of this? Well, I’m not entirely sure.”  And perhaps that’s it – we know it’s out there but we’re unsure of how to deal with it.  As Craig Taylor (@CraigTaylor74) tweeted beautifully: “I’m not so sure I’m that interested in BIG data, I’m more interested in using the (small) data that I already have, to better effect.”  Spot on, Craig!

What the heck is big data?

The term ‘big data’ is misleading because ‘big’ is a fuzzy term.  Let me explain . . . as someone who stands 6’6” tall, most people would consider me to be ‘big’ – yet in 1999 the average height of the Cambridge Boat Race crew was 6’9”, which would make me ‘below average’ and a veritable dwarf when compared to some professional basketball players.  Big can mean many things to many people and so ‘big data’ needs some further explanation.

According to Wikipedia “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualisation.  As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data.”  The definition continues, “Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead massively parallel software running on tens, hundreds, or even thousands of servers.”

So that’s useful. Big data isn’t the stuff you’ve got in your average Excel spreadsheet or Access database, it’s the data that your entire organisation may hold. It’s BIG data with the emphasis on BIG!

Let’s not get too carried away though; we’ve been looking at big data for a while now.  If you’re in the UK and have a store loyalty card – which, according to the BBC, 85% of UK households do – then you’ll definitely be adding to the massive pool of data retailers already hold about you as this article from the Guardian explains.  It’s been suggested that big data allows retailers to know when you’re planning to start a family.  Big data has been helping credit card companies for years in their effort to detect fraud, and if you think that the recent revelations surrounding the PRISM ‘scandal’ are scary then think again – security services have been using big data for years in their constant effort to keep us safe.

Useful links

Here are some useful links which provide further insights into big data:

But what if it’s not big?

As we can see, big data really is BIG – massive in fact – probably more massive than most of us will ever come across. To be honest, for most of us I guess we think that data is little more than a collection of numbers.  Perhaps it’s the pass marks from tests, perhaps it’s the location of learners, perhaps it’s the length of service of a learner – or perhaps it’s all three. Perhaps it’s the ability to assess whether location and length of service has an impact on test scores and therefore potentially performance. Well no.  That’s just data.

Returning to the top of this story, the problem is – and is likely to remain – that we just don’t understand data.  Perhaps then, instead of being wooed by big data, we should spend some time getting to know data, what it is, how it works and how it relates to other data.


Here are some questions all about data.  Feel free to use the feedback option of the blog to share your thoughts and discuss what you think the answers are.  This isn’t intended to show people up – it’s a real opportunity to share and learn.

Question 1:  A meeting of entrepreneurs contains 100 people.  Data shows that the average net worth of each of the entrepreneurs is £100 million.  List as many things as possible you can determine from this data.

Question 2:  It is war time.  You are tasked with reviewing damaged planes coming back from sorties over enemy territory to see which areas of the plane should be protected further.  You find that 70% of the fuselage and 30% of the fuel system of returned planes are much more likely to be damaged by bullets or flak than any other part of the planes.  Which single area of the plane would you reinforce – assuming there is no difference in cost or performance?

Question 3:  A study is looking at the success rates of two different approaches to teaching people to drive – called Approach A and Approach B.  The success rates for both male and female learners over a four month period are as follows:

  • Month 1 – Approach A: 93% success, 81 out of 87 passed first time
  • Month 2 – Approach B: 87% success, 234 out of 270 passed first time
  • Month 3 – Approach A: 73% success, 192 out of 263 passed first time
  • Month 4 – Approach B: 69% success, 55 out of 80 passed first time

You have to select one approach to be used across the country.  Which one do you choose, Approach A or Approach B?


Get to know data before you try and get to grips with big data.  The truth is that if you don’t understand the former then you’ll never understand the latter.  Although for many learning and development professionals data is a horrible thing to think about, challenge your fears and learn to love data – because if you do then you’ll make far better decisions in the long run.

Call to action

A great starting point for getting to know and love data is the BBC’s More or Less radio programme.

Tags: , , , , ,
8 replies
  1. Andrew Jacobs says:

    Hi Jonathan, thanks for this piece. I think you’ve highlighted a couple of issues.

    Firstly, I think there is a lack of understanding of what the term big data means. There is an argument that anyone working in a L&D function ‘should’ have an understanding of what this big data means in the context of their organisation. If an L&D professional doesn’t have a sense of the outputs, or more likely, outcomes that the organisation delivers, how can they place any faith in their activity making a difference? As a result, I believe that there IS a need for L&D people to look up more often; there’s a real danger that we’re looking down, looking in and at data from a range of sources. However, there is a problem that we try and prove that correlation = causation.

    This leads me to my second issue, and that is around the development of the Tin Can API. I’m not an expert on this by a very long way but I keep hearing how I’ll be able to capture more data, better data, data into a new place, data from lots of places, data from different and new sources. If we can’t interpret data now, what hope do we have?

    Thanks for the blog; interesting.

  2. Jonathan Kettleborough says:

    Hi Andrew, thanks for your positive comments – most welcome.

    I’m glad you agree about the confusion about big data – the be honest it’s not an ideal label – massively insanely huge data may be a more fitting description! for most L&D professionals, even with a workforce of 100,000 people they would be hard pushed to generate the amount of data that qualifies for the big data label.

    The understanding of data may seem ‘geeky’ however we need to understand the basics and as you so rightly point out correlation does NOT equal causation although for many any form of ‘weak’ alignment is a signal to rush towards a conclusion.

    Turning to the Tin Can API I too am very much of novice here. There is talk that the amount of data that could be gathered is huge and that there could well be a number of data protection issues but the person who I’m sure can help you out is Neil Lasher – you can reach vim via Twitter @NeilLasher

    Thanks again for your comments and nay you always know your mode from your median 😉

  3. Neil Lasher says:

    As usual some thought provoking content from you Jonathan.

    There is more to big data than just the volummetric size. A billion statements to the local butchers would be huge, however the same billion statements to the US government would be a drop in the ocean.

    So it is more to do with number of touch points and number of times these touch points get touched. There is no rule of thumb for when data becomes big data, but obviously having been in the learning industrty all these years, if there is a buck to be earned, someone will be there to sell you something.

    So with numbeers out of the way let’s look at types of data, this is far more important. LMS provides ‘metric’ a fixed amount of data, defined on the LMS’s ability to collate limited range data, based as a historical fact of something that happened and fixed by a number of contraints. We do not know how to understand this data because there is very little use in L&D. If you are going to tell me that 80% of the staff did the learning and passed an inane test with a score that is meaningless and not a real measure of any abililty, then the data is of little use. What do you do with it now? Keep it for a regulator? Stop kidding yourself.

    However if we start to look at analytics of data, we begin to realise we can decide what the data will be. We define what we collect and how and when. We decide the context, the timing, the actual result. We can use this data as ‘live’ data collected in real time over particular periods and we can analyze and find patterns of common occurance. We then can begin to extrapolate at earlier stages of data what is likely to happen next. From a training perspective we can then design the next step to change the behaviour before it happens. Amazon does this very well with recommendations and Tesco by giving you discounts on future shopping based on your likely product need.

    xAPI is the standard that can drive this, the data however is raw. it is all backoffice and what is required are good analytics tools to enable the organisation to decipher the data they collate.

    This of course means we need better design of learning. Designed to leave a data trail. We also need better tools (not the LMS) to capture this data. Finally we need people with knowledge of analyics to decipher the data. Let’s not pretend that is another role for L&D

  4. Jonathan Kettleborough says:

    Hi Neil,

    Thanks for your comments – as always they are a very useful addition.

    You raise a number of good issues, Starting with the ‘size’ issue you are correct – to some organisations a billion statements would be overwhelming – to others a walk in the park. You also mention the local butcher and this provides another big data link for it’s not just the local butcher but the link of data between the butcher the baker and the candlestick maker.

    In L&D it’s not just the data about learning but also the data about performance – individually and commercially and about the route taken to achieve performance – as well as so much more,

    For me though it is your closing remarks that are the most telling – we can’t expect an LMS to give us all the data we need – though there ail be many who’ll no doubt try and sell a big data LMS add-on – and that the details of analytics should NOT be another role for L&D. As always – very insightful.

    Thanks again Neil – some great additions to this important debate.

  5. Ger Driesen says:

    Hello Jonathan,
    I’m aware of your existence and your ideas only as from June 2013. In the meantime I’ve read your book and posts and think your views are important and very useful. But this time I disagree with your post so it’s time to post a comment . I think you fell deep in an obvious thinking trap about Big Data. Your explanation about Big Data is nice and relevant but I think after that you took some kind of lineair thinking path that keeps you from seeing the bigger picture of possibilities of Big Data and the reason L&D pro’s don’t have to become data experts small or big. With the introduction of the personal computer and application software we started using it for e-learning. But let’s be honest: for quite some time we didn’t get much further than ‘digital page turning’. It has become better in recent years with connectivity, broadband and social media possibilities. But do we all have to undrstand programming or in depth use of applicationsoftware to be a good L&D pro? No, we have to be good at designing the right blend of learning and performance support interventions to meet learning and performance improvement goals and work together with the right (e-learning) experts. So Big Data has not that much to do with little data in my opinion. Of course Big Data is a confusing word and we might think the translation to ‘learning analytics’ is helpfull but for me it isn’t. We have to break loose from our ‘normal’ thinking pattern because it will only focus us on the use of data as feedback technology to get a better insight on how well our learning & performnce interventions worked. Of course that is important and we should do so nad also in new ways if big data can help us. But the real deal is looking for the FEED FORWARD use of big data in learning and performance support. Real time data gathering and analysis will give us the possibility to provide people at work learning interventions and/or performance support at the moment of need. Even at the ‘before you know it moment of need’ as I like to call it. Our surroundings are filled up with so much (embedded) sensors in the near future that we are evolving to an ambient intelligent workplace. Speech recognition (Siri, Dragon) eye movement tracking, (limb) movement tracking, blood pressure and heartbeat measuring (X-box kinect), functional MRI, app use tracking, measure of mindset via social media: we will be quantified as worker soon. Every online activity we do at work combined with theese sensors will produce the big data and ‘predictive analytics’ that will help us create a whole new range of learning and performnce support interventions. We don’t have to understand how it works (nor coding e-learning nor coding big or small data applications) we have to know the possibilities and use them to create new approaches and blends for corporate learning and performance support. So my idea is we have to approach big data and L&D in an ambient intelligent workplace perspective to see it’s real merits. (some more ideas here http://bit.ly/e0I330 ) So thanks for bringing this theme into discussion Jonathan and hope to share and explore some more with you and the others: looking forward to Neil’s webinar Cheers, Ger

  6. Jonathan Kettleborough says:

    Hi Ger,

    Firstly my apologies for taking so long to respond to you thoughtful and comprehensive comment.

    I agree! That is, I agree with your comments that Big Data has so much to offer the L&D profession – Im currently planning a more detailed post on this subject and will include some of you content at that time.

    The purpose of my post was not to say that Big Data is irrelevant for the L&D profession – I think it has massive relevance – rather it was to challenge those who throw themselves into the realms of Big Data without having a basic understanding of the concepts of data and the issues that drive data.

    In my experience – as an example – L&D people can often get very mixed up with data. They think that correlation is the same as causation, they don’t understand the concepts of average, mode and median and they lack strength in a number of other basic data principles.

    I would agree completely that Big Data has so much to give us in terms of performance analytics and the effect that our L&D interventions have on the performance aspects of the people we touch. I am concerned however – and hence the reason for my post – that if we don’t have a basic grasp of data concepts then we are in REAL danger of mistakingly making the wrong decisions when faced with a plethora of linked data.

    Big data is here to stay – of that I have no doubt – but as L&D professionals we have to show – either through our own abilities or by employing others who can – that we have a real understanding of the underlying driving issues – and this is why I ‘issued’ the data challenge in my post, which to date not a single person has taken up.

    Big Data can transform the L&D profession – but only if we understand data itself. If we get it wrong then we can really mess up our people, our businesses and our customers.

    Thanks so much again for your great comments and I hope that my future post on the benefits of Big Data will have you cheering!


  7. Ger Driesen says:

    Hello Jonathan,

    thanks for your reply. I’m looking forward to your next post on big data. Maybe we should understand data ‘just enough’ to be able to work together with data specialists on relevant data related L&D tasks.

    It’s an honor te be the first one to answer the questions as stated in your post and I hope I’ll be the first ‘sheep’ and many others will follow me.

    Answer on Q1: entrepreneurs are people, there are at least 100 entrepreneurs, net worth seems a attribution in relation to entrepreneurs, entrepreneurs meet sometimes, net worth is a figure of entrpreneurs that is available as data, average assumes some are worth more, some less.

    Answer on Q2: I might like to reinforce the skills of the pilots to decrease overall damage. Otherwise it depends: dmage of the fuel system seems a bigger risk (bigger consequences) to me than general fuselage damage. so I would like to reinforce the fuel system. But I would be more interested in the reasons why other planes didn’t come back and look for reinforcement from that perspective.

    Answer on Q3: I could choose approach B because the overall succes rate is 82,5% over the 78% of the A approach. But I think I would choose the A approach: it has proven to deliver a ‘best practise’ of 93% so I would invesigate some further on the specific destinctions between the mounth 1 approach and the mounth 2,3 and 4 approach to find the ‘secret sauce’ for an – at least 93% success rate and mayby more.

  8. Jonathan Kettleborough says:


    Thanks so much for your contribution, I’ve responded privately so that I don’t give the answers away however thanks so much for sticking your neck out and being the first to take the Big Data challenge.

    Thanks again.



Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *