Tools like Analytics and Emerge are great for drilling down to see what’s going on for individual pupils, and to do so in a really granular way. For example, want to find out if Joanna’s behaviour is better on some days than in others? No problem. Want to check whether Fred’s absences coincide with Amin’s? Easy peasy. Having that level of detail available enables you to fine-tune what you do in order to address issues -- even before they become intractable (see, for example, Warning signs of serial truants).
But there is also a lot of potential value in using anonymised big data, which Groupcall Analytics can generate. To make it truly anonymised, you need to strip out not only pupils’ names, but also other identifying characteristics, which could even include the courses they are taking. For instance, if you think about it, if you refer to the three pupils taking ‘A’ Level Chemistry, it won’t take someone much effort to find out their names.
Indeed, according to research carried out in 2015, only four bits of data are needed to ‘de-anonymise’ anonymous data, while other research has discovered that only three pieces of data are required. So, making data anonymous is not just a matter of deleting their names, but other identifying data such as date of birth. (You might not think it essential to go to such lengths, but if your analysis of the data identifies a problem of truancy, and it would be easy for a third party to find out the names of those pupils, then there would be a problem, and not least from a data protection point of view.)
So, back to big data. The crucial difference between traditional research and analysis on the one hand, and big data on the other, is that with the former, you start with a hypothesis or a question. You might ask, for example, “Is lateness worse on Mondays than on other days?” or “Is Fred absent at the same times as Amin?”. With big data, though, all you have is a mass of data which you feed into the computer program to find out if there are any correlations that you were not aware of, and which you could potentially act upon.
Suppose, for instance, that if you had Analytics analyse the data from all the schools in your MAT, that is from several thousand pupils, and discovered that on Mondays more pupils were late than on other days, and as a result more pupils missed the first lesson. This is the kind of data that would be known to individual tutors, and to individual headteachers, but they or the Executive Headteacher would not necessarily be aware that it was a MAT-wide issue. Once such a fact has become apparent, steps can be taken to address it.
Another possibility is to analyse pupils’ Learning Platform log-in data. On an individual level, failure to log in very often could be an early warning sign of disengagement. Combined with attendance and punctuality data, such analysis should provide the school with the information needed to make early interventions to prevent pupils dropping out of school or going ‘off the rails’ in some other way. But on a big data level, if it is discovered that very few pupils as a whole don’t log in very often, that suggests there is something wrong with the Learning Platform or the logging in process itself. It might even indicate that, having logged in once, pupils are kept permanently logged in rather than being ‘timed out’. Again, steps can be taken to fix the problem.
It’s important to bear in mind, of course, that correlation does not necessarily mean causation, and some of the trends highlighted might be more apparent than real. Therefore, your big data analysis might be best thought of as a way of generating questions rather than answers. Is this a real phenomenon? What might be causing it? Once you’ve formulated several such questions, you will be in a good position to adopt a more traditional approach: if we try X, will that improve the situation? And the good news is that a MAT is in a prime position to try small-scale pilot approaches in different schools, and then comparing results.