Aside from exchanging playlists with my partner every once in a while, I’m not much of a Spotify user. Around this time every year, though, all of my friends start posting their Spotify Wrapped, and I get jealous, as the platform that I listen to music on doesn’t have anything like it. Of course, though, it collects data about me (it’s 2023!); last year, I got to wondering whether I could make a lo-fi knockoff of wrapped using R, the tidyverse, and the data that I have access to. You already know:
If you’re an R user and a listener of local files on the Mac Music app, this post is for you.🎁
Importing the data
In the Mac music app, navigate to:
Music app > File > Library > Export Library
…to export a .xml
file. Last year, I griped about how much of a pain in the ass it was to tidy the resulting output. This year, we can all just install the package I wrote last year and forget about our woes:
pak::pak("simonpcouch/wrapped")
The wrapped package contains a function, wrap_library()
, to tidy that .xml file into a tabular data frame.
library(wrapped)
wrapped <- wrap_library("data/Library.xml", 2022:2023)
wrapped
# A tibble: 12,545 × 8
id track_title artist album genre date_added skip_count play_count
<int> <chr> <chr> <chr> <chr> <date> <dbl> <dbl>
1 11729 Atom Mediu… Heal… Indi… 2023-02-11 15 234
2 11862 Reelin' Matt … Ever… Indi… 2023-03-24 19 208
3 11732 Gimme Back My Soul Mediu… Heal… Indi… 2023-02-11 11 195
4 12179 Swim Noah … If T… Sing… 2023-07-21 11 191
5 11733 Never Learned To D… Mediu… Heal… Indi… 2023-02-11 12 161
6 12088 Get The Girl Seafo… Get … Coun… 2023-06-16 26 159
7 11855 Everything's Fine Matt … Ever… Indi… 2023-03-24 12 153
8 11656 Never Learned To D… Mediu… Neve… Indi… 2022-12-26 6 143
9 12388 Desert Land Matt … Dese… Indi… 2023-10-22 11 141
10 12097 Given Justi… Dayd… R&B/… 2023-06-16 11 132
# ℹ 12,535 more rows
After that, Spotify Wrapped is just group_by() %>% summarize() %>% arrange()
in a trench coat.🧥
For easier printing in this blog post, I’ll rearrange this data to show the most commonly noted output:
wrapped <-
wrapped %>%
select(-id) %>%
relocate(date_added, skip_count, .after = everything()) %>%
relocate(play_count, .before = everything())
wrapped
# A tibble: 12,545 × 7
play_count track_title artist album genre date_added skip_count
<dbl> <chr> <chr> <chr> <chr> <date> <dbl>
1 234 Atom Medium B… Heal… Indi… 2023-02-11 15
2 208 Reelin' Matt Cor… Ever… Indi… 2023-03-24 19
3 195 Gimme Back My Soul Medium B… Heal… Indi… 2023-02-11 11
4 191 Swim Noah Gun… If T… Sing… 2023-07-21 11
5 161 Never Learned To Dance Medium B… Heal… Indi… 2023-02-11 12
6 159 Get The Girl Seaforth Get … Coun… 2023-06-16 26
7 153 Everything's Fine Matt Cor… Ever… Indi… 2023-03-24 12
8 143 Never Learned To Dance Medium B… Neve… Indi… 2022-12-26 6
9 141 Desert Land Matt Cor… Dese… Indi… 2023-10-22 11
10 132 Given Justin N… Dayd… R&B/… 2023-06-16 11
# ℹ 12,535 more rows
Analyzing it
Top songs
The output is already arranged in descending order by play count, so we can just print the first few rows:
# A tibble: 6 × 3
track_title artist play_count
<chr> <chr> <dbl>
1 Atom Medium Build 234
2 Reelin' Matt Corby 208
3 Gimme Back My Soul Medium Build 195
4 Swim Noah Gundersen 191
5 Never Learned To Dance Medium Build 161
6 Get The Girl Seaforth 159
Medium! Build!
Top artists
wrapped %>%
group_by(artist) %>%
summarize(play_count = sum(play_count, na.rm = TRUE)) %>%
arrange(desc(play_count)) %>%
head()
# A tibble: 6 × 2
artist play_count
<chr> <dbl>
1 Medium Build 1921
2 Matt Corby 1622
3 Justin Nozuka 1058
4 Noah Gundersen 907
5 Patrick Droney 569
6 Mac Ayres 546
group_by() %>% summarize()
! I told you!
I will fly to Australia to see Matt Corby play live if I have to.
Top genres
One of my first steps after buying a new record is to edit it’s metadata to fit into one of a few pre-defined genres. Many of these categorizations are sort of silly as a result, but it does make for a nice summary:
wrapped %>%
group_by(genre) %>%
summarize(play_count = sum(play_count, na.rm = TRUE)) %>%
arrange(desc(play_count)) %>%
head(5)
# A tibble: 5 × 2
genre play_count
<chr> <dbl>
1 Indie/Alternative 5337
2 Singer-Songwriter/Folk 3937
3 R&B/Soul 2855
4 Country 2258
5 Indie Pop 971
Sort of confused by the existence of the “Indie Pop” category.r
emo::ji(“confused”)` Definitely need to clean up some of those entries.
You can selectively use the n
argument to head()
to hide things that you’re embarrassed about.
Top albums
wrapped %>%
group_by(album, artist) %>%
summarize(play_count = sum(play_count, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(play_count)) %>%
head()
# A tibble: 6 × 3
album artist play_count
<chr> <chr> <dbl>
1 Everything's Fine Matt Corby 1217
2 Health - EP Medium Build 971
3 Never Learned To Dance Medium Build 819
4 Daydreams and Endless Nights Justin Nozuka 736
5 If This Is The End Noah Gundersen 598
6 Comfortable Enough Mac Ayres 428
Bonus points
There are a couple summarizations that Wrapped doesn’t do that I’m curious about.
Top song by month
I don’t have the right level of observation to see which songs I listened to the most every month, but I do have a variable giving the data I added a given song. We can use that information to find the top songs by month added:
wrapped %>%
mutate(month = month(date_added)) %>%
group_by(month) %>%
summarize(
track_title = track_title[which.max(play_count)],
artist = artist[which.max(play_count)]
) %>%
head(11)
# A tibble: 11 × 3
month track_title artist
<dbl> <chr> <chr>
1 1 Sad Song Brandon Ratcliff
2 2 Atom Medium Build
3 3 Reelin' Matt Corby
4 4 Be Yourself Wilder Woods
5 5 tennessee is mine Alana Springsteen
6 6 Get The Girl Seaforth
7 7 Swim Noah Gundersen
8 8 You Take The High Road Bruno Major
9 9 Better Days Noah Gundersen
10 10 Desert Land Matt Corby
11 11 PANIC ATTACK Clinton Kane
Top artist by genre
wrapped %>%
group_by(genre, artist) %>%
summarize(play_count = sum(play_count, na.rm = TRUE), .groups = "drop") %>%
group_by(genre) %>%
summarize(
artist = artist[which.max(play_count)],
play_count = play_count[which.max(play_count)]
) %>%
arrange(desc(play_count)) %>%
head()
# A tibble: 6 × 3
genre artist play_count
<chr> <chr> <dbl>
1 Indie/Alternative Matt Corby 1459
2 R&B/Soul Justin Nozuka 1005
3 Indie Pop Medium Build 971
4 Singer-Songwriter/Folk Noah Gundersen 598
5 Country Alana Springsteen 538
6 Bluegrass Mighty Poplar 369
Moved on
How many albums in my library did I not listen to at all this year? (I reset the play count for all of my library to zero each time I do this analysis.)
wrapped %>%
group_by(album, artist) %>%
summarize(play_count = sum(play_count, na.rm = TRUE), .groups = "drop") %>%
filter(play_count == 0) %>%
count()
# A tibble: 1 × 1
n
<int>
1 1195
That number is a lot bigger than I thought.😬