In general, this is a good thing: more data is being generated. But without some amount of standardization (as every good researcher knows) we run the risk of collecting low-quality (unreliable) data. Since we know the data itself is much more important than the models, this is the real challenge and prerequisite step for the real-world operational utility of data.
In their book Real World AI, authors Alyssa Simpson Rochwerger and Wilson Pang highlight this as a key difference between industry and academic data science cultures:
“This is a reversal of the typical paradigm represented by academia, where data science PhDs spend most of their focus and effort on creating new models. But the data used to train models in academia are only meant to prove the functionality of the model, not solve real problems. Out in the real world, high-quality and accurate data that can be used to train a working model is incredibly tricky to collect.”
Organizations attempting to pursue a data-driven approach need to proceed with caution if the data collection process isn’t optimized for this. In this case, we have two suboptimal results: researchers or data scientists need to participate in all data collection OR the quality of the data collected by the layperson may be questionable.
There is no free lunch
It is important to note that even with what we’d call high-quality or reliable data, there is no guarantee that modeling will be useful. For example, we likely wouldn’t be able to develop a useful weather forecast for Adelaide, Australia using historical data from Tulsa, Oklahoma. There is no “free lunch” with predictive modeling.
Here at Sparta Science we have been collecting and storing human movement assessment, outcome, and contextual data for over a decade with amazing learnings. But the applicability of existing insights can be more limited in new populations or unique environments as we continue to grow into new areas. Our machine-learning pipeline and historical data modeling still provide significant out-of-the-box utility, but for organizations interested in optimizing and customizing insights, the time needed to generate relevant high-quality data is not something that can be bought.
To leverage the true value of data, a long-term perspective is required. Not only is this approach best to manage expectations, it also enables organizations and individuals to objectively validate the data and modeling themselves. It can be easy to point to the results of an existing research paper or a related organization’s success as validation for your organization’s utilization of data. But without truly validating, do you know? Additionally, the appropriate interpretation of results and downstream utility of data for decision-making may be unique for each organization or customer, and thus the data and technology needs to be optimized appropriately.
NOT a replacement for human expertise, but a multiplier
This cannot and should not be seen as a threat to the human practitioners in the field. With almost every new major innovation concern arises that automation and computers are going to make humans obsolete; the reality couldn’t be further from the truth. The ability to automate manufacturing processes has greatly improved efficiency and safety for workers, without removing the need for humans. Technology and data innovations in human movement health are no different, advances here will better inform and enable humans to make better decisions.
While some challenges may be better addressed through data, many of the most important ones require human intellect and interaction, with the combination of data and expertise being most effective. Data and technology can’t replace, but can help to make each individual exponentially more effective: it’s a multiplier.
- Image used from: Ruddy, J. D., Cormack, S. J., Whiteley, R., Williams, M. D., Timmins, R. G., & Opar, D. A. (2019). Modeling the risk of team sport injuries: a narrative review of different statistical approaches. Frontiers in physiology, 10, 829.