December 26, 2017

    Does Your Data Make Sense? Why Norms Matter

    1-Rep Max testing has been used for decades to measure absolute strength and evaluate strength gains and losses over time. This type of testing is engrained in our industry, and continues to be extremely popular as the data is thought to be relatively easy to interpret. Athlete A is better than Athlete B because he/she has a higher 1RM. Athlete C has gotten better because their 1RM has improved, while Athlete D has not. Pretty simple right? Well, no. While 1RMs are seductively simple, these measures are not valid predictors of performance. While there is no doubt strength is important, more is not always better.

    Despite this, the standard strength, speed, agility, and power tests are still common place in the world of strength and conditioning. This is largely because these values make the most sense to athletes and sport coaches. Generally speaking, they are aware of the norms and can identify good from bad performances because they are familiar with these tests. For example, even though acceleration and max velocity are thought to be better indicators of performance, the 40-yard sprint is still the most commonly used test of speed. If you instead test an athlete’s acceleration using a 10-yard sprint, the first question you will get after reporting their time will always be, “so… is that good?” Rather attempting to explain or even find the answer to that question, many coaches choose the comfortable path and continue to test the 40, or the squat, or bench press… because well, that’s what we’ve always done.

    Translating the Science

    With the lack of validity in these more standard testing metrics (and a boom in technology), scientists and researchers have dove head first into trying to identify better KPIs that are able to predict performance. Force plates, accelerometers, high speed cameras, and global positioning systems (just to name a few) give us thousands of different variables to measure and analyze. The easily understood data such as time (seconds) weight (lbs.) and distance (inches) unfortunately doesn’t show the validity that we would like. More advanced measures of force (N), velocity (m/s), and power (watts) show much more promise, but are harder for a coach or athlete to understand. These advanced concepts may be invaluable but most our coaches and athlete will never have the knowledge (nor should they) of biomechanics or physiology required to truly understand them. We must be able to simplify and translate. As scientists continue to dive deeper, the ability to take action on this data will likely be limited by the ability to explain it to those who matter most.

    Where do norms come from?

    Not only do we need to be able to explain these metrics in simpler terms, but we need to be able to immediately answer that first question we will no doubt get: “so… is that good?” To do this we need to collect enough data from across the population to understand the distribution and be able to identify what these norms are. This takes time. Unfortunately we cannot simply compile massive amounts of unstandardized, subjective, unreliable data and expect to create a meaningful dataset. In computer science the saying “garbage in, garbage out” simply explains this concept that if the input data is flawed there is no amount of analyses that can be run to create a meaningful output.

    For example the subjectivity of squat depth will greatly influence the amount of weight that athlete is able to successfully lift. Simply type Max Squat Test into YouTube and take a look at the wide range of squat depths that exist. Is my team better than yours because I have more 600 lb. squatters? Unlikely.

    Attempting to compare these numbers across organizations is comparing apples to oranges, just as comparing hand-timed 40’s to laser times does not allow for accurate data interpretation. Little Johnny’s hand-timed 4.2 forty yard sprint on the track with blocks and spikes will hardly translate to an NFL players combine time. Even “normative” data taken from texts and articles can be dangerous as sample sizes are often small and methods can differ. For example, how you measure something as simple as vertical jump height (force plate, contact mat, Vertec) will greatly influence the results! Only with the standardization of equipment and protocols are we able to create data reliable enough that we can aggregate to find these norms.

    Ok… so… is that good?

    Finally, we can answer our original question. Here at Sparta, we do this by utilizing a statistical tool known as a T-Score. The T-Score was first popularized when measuring Bone Mineral Density (BMD) to identify risk of osteoporosis. The following is an expert from an article discussing the history of the T-Score:

    “As bone density technology evolved, it became clear BMD expressed in raw units would be difficult to interpret. Ideally, for BMD measurements to be clinically useful, they should be presented in terms that are readily understandable by patients and clinicians, as well as independent of the densitometer used or the skeletal site measured.”

    “Unlike common clinical measurements, such as blood pressure or cholesterol, the accepted normal values for BMD are not generally known. The T-score was suggested by researchers to simplify the interpretation of the bone density result and avoid the use of raw BMD values.” (1)

    The challenges these practitioners faced are very similar to challenges in sports science today, instead of reinventing the wheel we can simply learn and apply, “standing on the shoulders of giants.”

    Immediately after an individual performs our countermovement jump assessment, their results (or Movement Signature) are displayed as a series of vertical bars normalized to the database. A value of 50 represents the mean or average of the population, with 10 T-Scores in either direction (40 or 60) representing one standard deviation away from the mean. By normalizing data using T-Scores, coaches can compare values between individuals and across populations using a standardized scale. The fact that these raw variables have different units and different scales doesn’t affect the ability to interpret the results, and MOST importantly quickly relay this information to those who matter most!

    Without these T-Scores derived from a large and diverse database, the only way to evaluate scores would be within-individual percent change. Furthermore, while you could compare or rank individuals within a population there would be no way to know the potential biases of that specific population. For example, a group of elite basketball players might all have similar absolute scores for force plate variables, but these patterns may be a result of similar athletic background and training history. By incorporating a much larger and broader population, it is possible to better recognize strengths and weaknesses within a global movement assessment. While often overlooked, the power of Sparta’s database that provides T-Scores is one of the key advantages.

    Simplicity and Alignment

    Football player engaged in on field practice

    While the simplicity of traditional performance tests is appealing, we can do better.  The advancements in research and technology have allowed us to identify better KPIs, but these assessments won’t become mainstream until we can explain the results effectively.  Utilizing an aggregated database and statistical tools such as T-Scores, practitioners can have the best of both worlds; highly technical and scientific measures with clear results anyone can understand.


    1. Faulkner, Kenneth G. “The Tale of the T-score: Review and Perspective.” (2005): 347-352.

    Other posts you might be interested in:

    View All Posts