What is Injury Prediction, Really?

Musculoskeletal injury prediction has been referred to as the holy grail within military and sports circles, a storied endpoint that can solve all pain, figuratively, and literally. In short, a prediction, or forecast, is a statement about a future event. Yet, it is commonplace to hear practitioners say that predicting injuries cannot be done; there are too many factors. This statement is a misguided viewpoint on what prediction actually means because prediction has been happening for decades, if not centuries, around other outcomes with equal or more factors than musculoskeletal injuries.

Despite the clear science supporting injury prediction, it is easy to make dismissive statements about the idea because the word predictive has been thrown around carelessly, much like “analytics,” in the world of big data, especially relating to health technology. Many claim to be predictive of injuries, yet they have little to no data to support the claim. Instead, the message about predicting injuries is anecdotal; visuals of fancy graphs/charts and infographics of how much money is lost per year due to injuries. 

To use a more specific definition, predictive analytics and predictive models suggest increased odds of an outcome based on a change. Instead of diving into statistical “jargon,” we will use an example from a game of chance we are all more familiar with, Blackjack. When playing the game of Blackjack, the probability of getting “Blackjack” or 21 is 4.78% or about 20:1. But, if the first card dealt is an Ace, the probability increases to 31.07% or closer to 2:1. To a coach, understanding the odds of hitting 21 increasing significantly based on the first card drawn is relatively easy to understand. Having an Ace does not guarantee you will receive a Blackjack, nor does not having an Ace guarantee not receiving a Blackjack. However, drawing an Ace first does significantly increase these odds.

The reality is that predictive models require years of consistent and reliable data collection that is meaningful (valid). A predictive model is built, showing increased or decreased odds ratios of the likelihood that specific injuries or performances could occur. Think about the scenario of predicting injuries like predicting Blackjack – there are no guarantees from the information, only increased or decreased chances of certain outcomes with each hand. 

The last piece in creating predictive models is to validate the model. Use data observations that weren’t included in the creation of the model, and run the data through your predictive model to see if it holds up. Without the right infrastructure in place, the process of gathering enough data to build a predictive model and enough data to validate it takes years, decades even. 

The key aspects for prediction, assigning odds to a future outcome (in this case, an injury), really comes down to collecting “good,” clean data, and a lot of it:

1. Frequent and longitudinal collection of your leading KPI 

By collecting data frequently, over a long period of time, your data is more likely to be close to the actual desired outcome of the prediction (an injury). Collecting force plate data last January and trying to link to injury in October is difficult, particularly if the majority of data has large gaps.

2. Standardized data collection for reliability 

Open-source software allows you to collect anything and anywhere, yet also prevents any normalized settings so data cannot be compared across different environments. Think about blood fasting versus non-fasting. It is very difficult to compare an outcome when looking at the two different blood panel results. If individuals perform jumps, does the software standardize data collection by indicating when they have been still long enough prior to the jump? What if they double bounce to enhance the jump artificially? What if you collect balance data with one leg on the force plate while the other is touching the ground?

3. Meaningful data from proper tests & variables

Ultimately, prediction is supported by meaningful metrics. Bodyweight is an excellent example that satisfies points #1 and #2 above. It is easy to collect body weight data frequently, plus the process is relatively standardized due to its basic requirement of simply standing there. Yet bodyweight often fails here in point #3 because it does not always reflect function (outcomes), ultimately questioning its value in predicting injury.

So, when we hear injury prediction cannot be done, it is often stated through the lens of practicing poor tests and/or testing protocols. Movement testing is often done 1-2x a year, which rightfully fuels this skepticism due to data quality and quantity. The real question is, can we see through this noise, the fog, to actually predict when it will be sunny again? With a reliable & valid database, the answer is yes, and the certainty of that response gets stronger every day with more good data.