November 30, 2023

    Complete Collection: The Data You See and The Data You Don't

    What You'll Learn:
    Take me to the summary


    In the era of advanced analytics and continuous learning, the importance of data in understanding and improving human experiences cannot be overstated. Concerning human data, each layer tells a unique story—from the raw intricacies captured by wearable sensors to the high-level metrics that summarize our health and habits. Every layer of information contributes to a comprehensive understanding of human behavior. 

    This article not only explores examples of the various layers of human data but also underscores the critical importance of collecting and storing data at multiple levels. A complete approach to human data collection, embracing all available layers, is indispensable for those seeking actionable insights and continual learning.

    The Deeper Layers of Human Data

    Complete data collection begins with raw sensor data.

    In the consumerized realm of wearables, it's easy to overlook that much of the human data we engage with originates from complex sensors utilizing fundamental physics principles (gravity, light, electromagnetics) to generate time-series dataan often unseen layer of data.

    wearable data source

    This data undergoes processing by software, resulting in interpretable metrics such as current glucose levels, last night's HRV, or last month's step count—the data you see. Yet, even at this stage, the time-series layer has undergone a transformation.

    Complete collection ensures that all available and relevant data layers are captured and stored.

    Wearables are not the sole contributors to the multi-layered human data spectrum. This previous article discusses our approach to the different layers (or levels) of data captured from force sensors. ECGs (image below) provide another example of human data collected using sensors and initially provide us with time-series data that is often transformed into summary metrics. 

    In the example below, the information enclosed in the red box at the top indicates computer-estimated values (i.e. summary metrics) for heart rate, PR interval, QRS duration, Mean P QRS, and T wave electrical axis.

    ecg data

    Image from

    Hidden Layers of "Analog Data"

    Beyond physically measurable phenomena, subjective, cognitive, and expert-derived human data often occupies distinct layers.

    Scores from psychological tests, surveys, or prompts provide insights into the intricacies of human thoughts and emotions. Many such tests utilize individual questions to generate scores reflecting different domains, instrumental in determining specific cutoffs. It's not just about storing an overall score but understanding responses to individual questions.

    Sometimes, the richness of human experiences lies in textual notes—from interviews, clinical observations, or personal narratives—providing a qualitative layer to quantitative data. Neglecting these textual elements could mean overlooking valuable context and nuances linked to overall scores, diagnoses, or recommendations.

    The Importance of Complete Data Collection

    While storing all this raw information may sound interesting, you may argue that it is often just the calculated, domain-specific, or summary metrics that offer actionable insights.

    Heart rate variability, sleep quality scores, and activity indices—again, the data you see— are metrics derived from raw sensor data and are more useful in practice.

    So let us explain why capturing the data you don't see is critical:

    1. Preserving Information Integrity

    Each layer of data holds a unique piece of the puzzle. Complete data collection ensures that no valuable information is lost in translation. By preserving the integrity of raw sensor data and calculated metrics, organizations can delve deep into the intricacies of human behavior.

    2. Evaluating Reliability

    Sensor malfunctions or inaccuracies in complex systems may go unnoticed if only processed data is stored. Preserving raw data allows anomalies to be traced back to their source. In scenarios with human input, storing only final scores makes it difficult to trace and correct errors in survey responses or manually recorded data before analysis. This is especially critical in real-world data collected in uncontrolled settings.

    Cluster Analysis

    3. Facilitating Advanced Analytics

    For those aiming to harness the power of advanced analytics, the ability to analyze data at various levels is paramount. Predictive modeling, machine learning, and artificial intelligence thrive on diverse datasets. Complete human data collection lays the groundwork for more accurate predictions and insights.

    4. Future-proofing

    Progress in clinical and performance research and development often leads to improvements in algorithms or calculation methods. If the raw data is preserved, organizations can apply these innovations retrospectively to historical datasets. In healthcare, diagnostic criteria or scoring system advancements can be applied to historical patient data. Storing raw data ensures compatibility with evolving technology and future-proofs datasets.

    5. Enabling Continuous Learning

    Continuous learning relies on a constant stream of quality data. Organizations can iteratively refine models, algorithms, and processes by collecting and storing data comprehensively. This, in turn, enhances the ability to adapt and improve based on evolving human behaviors.

    Challenges and Considerations

    While the benefits of complete data collection are evident, challenges abound. Privacy concerns, ethical considerations, and the sheer volume of data pose significant hurdles. Striking a balance between collecting enough data for robust analysis and respecting individual privacy is a delicate task that requires careful planning and execution.

    Take Home

    • Complete Collection ensures that all available and relevant data layers are captured and stored.
    • Both technology-derived and traditional paper and pencil data collection often result in different layers of data being collected and stored (or discarded). 
    • Complete data collection is critical for preserving information integrity, evaluating reliability, facilitating advanced analytics, adapting to changes or innovations, and enabling continuous learning. This helps to pave the way for a future of informed decision-making and innovation.


    Other posts you might be interested in:

    View All Posts