Visualizing 2 Years of Sleep Data with Fitbit, Pandas, and Tableau

“The best bridge between despair and hope is a good night’s sleep.”
— Matthew Walker, Why We Sleep

There comes a time in every man’s life when they recognize the importance of sleep. For me, that was around the time I got my Fitbit Charge 3. With its hassle-free sleep tracking, and gamified sleep summaries, I quickly got into the habit of monitoring my sleep patterns. Little did I know that years down the line, all of this data collection could pay off with an insightful data story about sleep.

From the time I first put on my Fitbit (December 26, 2018) to the time I downloaded my data from Fitbit’s cloud service (January 28, 2021), I managed to record 774 individual sleep sessions. Though the device’s sleep monitoring is far from perfect, the combination of movement, heart rate, and temperature monitoring make Fitbit more reliable than other common methods for sleep tracking.

Primary Data Cleaning and Wrangling with Pandas

In order to assess what questions to investigate in this inquiry, some cursory data cleaning and exploratory data analysis was required. The first step was to open up JupyterLab and read the data into a Pandas dataframe. The bulk of Fitbit’s archived data came packaged as JSON files in weekly installments. After loading those into a single dataframe, here’s what I’m working with.

All in all, this appears to be a fairly well-structured and clean tabular data set. A few things stand out right away:

First — the ‘levels’ column which contains the second by second observations of each sleep session is contained in a nested JSON object. Decomposing this will be critical to finding deeper insights.

A look into a single row’s ‘levels’ object . . . . . . . . . . . . . . . . . . . A summation of time spent in each level of sleep

There is a lot of rich data in each ‘levels’ object, but for my purposes, I’m happy with summing the total minutes spent in each level of sleep.

Second — the ‘minutesToFallAsleep’ column contains five out of five ‘0’ values. This suggests a data quality issue right away as I certainly am not the kind of person to fall asleep as soon as my head hits the pillow.

Minutes to Fall Asleep Histogram

A quick histogram confirms my suspicion that this feature will not be reliable enough to use. The sole non-zero value was recorded on the first night of use at a value of 14 minutes.

Oddly, Fitbit does not record sleep data and sleep score data in the same table. So I take a look at that in pandas as well.

A look at sleep_score.csv

All looks well with this table except for different names for the sleep log ID. This will not be an issue after the tables are joined in Tableau. The only major drawback is that this dataframe only contains 508 of the original 774 sleep records.

With the primary wrangling tasks complete, it is time to bring the data into Tableau.

Onward to Visualizations

The obvious first thing to look at (for me) is the distribution of time spent asleep. A quick conversion of ‘minutes asleep’ into ‘hours asleep’, and the data can be thrown into a histogram.

How is my sleep duration distributed?

Hours Asleep Histogram — Blue bars denote naps

Excluding the 52 naps that were recorded, this distribution appears to be fairly normal. The most frequent bin captured sleeps from 7.0 to 7.5 hours in duration (N = 232). The mean of this distribution is 7.23 hours with a standard deviation of 1.16 hours (thanks, numpy). Though this falls below the recommendation of 8 hours of sleep every night, it is about what I expected.

Interestingly, of the 52 naps recorded (Main Sleep == False), the mean duration was 2.02 hours. This is likely skewed from my real nap mean as the Fitbit does not recognize short sleeps (≤ 30 minutes).

The next question I was interested in answering involved how the day of the week might affect my sleep quality. To do this, I had created a custom ‘Day of Week’ column and plotted that against ‘Overall Score’ in a box plot.

How does the Day of the Week affect my sleep score?

For the most part, the trend seems to stay fairly consistent every day of the week. The highest score was on a Friday with 93, while the lowest score was on a Saturday at 45. The median values of these distributions were all either 79 or 80, with the exception of Friday with the highest median of 81.

Overall, day of the week does not seem to significantly affect my sleep quality.

Having enrolled in a Master’s Program in Data Science & Analytics (Hello, Professor), I was curious how well I’ve been keeping up with my sleep as a Graduate student. A good night’s sleep has many benefits including improvements in cognitive functioning. Conversely, poor sleep quality can have deleterious effects.

In order to assess my long term sleep quality, I decided to create a field calculated field called ‘Daily Deficit’. This field is a simple calculation of [Minutes Asleep] — 480 (8 hours in minutes). If I get more than 8 hours sleep, Daily Deficit will be positive and vice versa.

In the following graph, I plot ‘Minutes Asleep’ against ‘Daily Deficit’ from the start of my program to the most recent date in the data set.

How well have I slept during this program?

Blue line is ‘Minutes Asleep’ while the green line is a 14 day moving average for ‘Daily Deficit’

There’s a lot going on in this graph so let me explain:
- The x-axis is the date from the end of August, 2020 to the end of January, 2021
- The left y-axis is minutes asleep.
- The right y-axis is daily deficit.
- The horizontal constant line is both 8 hours in ‘Minutes Asleep’ and a ‘Daily Deficit’ of 0.
- The vertical constant line is the changeover from one set of classes to the next in the fall term.
- The shaded bands are reading week and winter break, when no classes were held.
- The blue line graph is individual data points for ‘Minutes Asleep’.
- The green line graph is a 14 day moving average for ‘Daily Deficit’.

I chose to use 14 days for the moving average because 7 days was more volatile than a bad day trader’s brokerage account. Unfortunately, the volatility still found its way in because a few days of bad sleep at the start of my program took 14 days to get pushed out of the moving average.

Observing the graph, I notice that there are dips in the sleep deficit toward the end of each set of classes I was enrolled in. No where is it more obvious than at the start of the winter break where a nearly linear decline in sleep deficit reverses at the start of the break.

There is certainly room for improvement, but this seems standard for grad students. (right?)

Next up is the topic that no one can seem to get away from for the past year: The CoViD-19 pandemic.

These have been trying times for everyone; parents, students, health care workers, and major corporations alike have all had to adapt to the uncertainty of 2020. Our collective sense of reality and what the future held in store was discombobulated in an instant.

How did the initial pandemic lockdown change my sleep patterns?

Gantt Chart of time spent asleep in February, March, and April 2019/2020; Duration mapped to colour

I chose to focus on the months of February, March, and April for both 2019 and 2020. Though Canadians at-large did not understand what was upon them until mid-March 2020, the context is important for comparison’s sake.

With a constant line at midnight, it is easy to recognize a fairly consistent night-owl sleep pattern through to February 2020. The pattern seems to change in March as I wound up going to bed even later more frequently. Some of the missing data points are due to data quality issues (taking off my watch) while some are truly sleepless nights.

Why not stay up to watch the sunrise if time isn’t real, right?

Finally, I wanted to get back to the ‘levels’ data that I mentioned in the first section. Fitbit tracks how ‘deep’ you are sleeping with multiple levels including: awake, REM, light, and deep. Decomposing the ‘levels’ object for each sleep session and summing the total time spent in each sleep level allows me to see how deep I sleep.

What is the proportion of sleep levels across all data points?

Pie Chart for Proportion of Total Sleep Levels

The infamous pie chart is perfectly suited for this type of visualization. A staggering 53% of my time spent sleeping has been classified by my Fitbit as ‘Light’ sleep. The proportions attained through this method put me right in Fitbit’s typical ranges for men my age, which is reassuring.

Conclusion

This project has been an excellent learning experience for me. The data retrieved through Fitbit was well-structured and had personal value to me. Finally I have the graphs to confirm why I feel tired all the time.

Though I’m just short of the golden standard of 8 hours a night, my sleeping habits leave room for improvement.

Time to get some rest…

— Joel