Title: | Data from all Seasons of Survivor (US) TV Series in Tidy Format |
---|---|
Description: | Several datasets which detail the results and events of each season of Survivor. This includes details on the cast, voting history, immunity and reward challenges, jury votes and viewers. This data is useful for practicing data wrangling, graph analytics and analysing how each season of Survivor played out. Includes 'ggplot2' scales and colour palettes for visualisation. |
Authors: | Daniel Oehm [aut, cre], Carly Levitz [ctb], Dario Mavec [ctb] |
Maintainer: | Daniel Oehm <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.3.4 |
Built: | 2024-11-15 11:29:48 UTC |
Source: | https://github.com/doehm/survivor |
A dataset containing the details and characteristics of each idol and advantage. This maps to 'advantage_movement'
advantage_details
advantage_details
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
advantage_id
The ID / primary key of the advantage
advantage_type
Advantage type e.g. hidden immunity idol, extra vote, steal a vote, etc
clue_details
Details if a clue existed for the advantage and if so where was the clue found
location_found
The location the idol or advantage was found
conditions
Extra details about the unique conditions of the idol or advantage
There are split idols which need to be combined to be played. In these case the first one found is given an ID. The second or subsequent parts are given the same ID with a trailing letter. For example in season 40 Denise found an idol that was split (USHI4002). Later she found the other half (USHI4002b). When played the second half is considered to have 'absorbed' into the first idol. The first idol found is always considered the primary idol.
A dataset containing the movement details of each advantage or hidden immunity idol. Each row is considered an event e.g. the idol was found, played, etc. If the advantage changed hands it records who received it. The logical flow is identified by the 'sequence_id'.
advantage_movement
advantage_movement
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
castaway
Name of the castaway involved in the event e.g. found, played, received, etc.
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
advantage_id
The ID / primary key of the advantage
sequence_id
The sequence of events. For example 'sequence_id == 1' usually means the advantage was found. Each subsequent event follows the 'sequence_id'
day
The day the event occurred
episode
The episode the event occurred
event
The event e.g. the advantage was found, played, received, etc
played_for
If the advantage or idol was played this records who it was played for
played_for_id
the ID for who the advantage or idol was played for
success
If the play was successful or not. Only relevant for advantages since playing a hidden immunity idol is always successful in terms of saving who it was played for.
votes_nullified
In the case of hidden immunity idols this is the count of how many votes were nullified when played
The details of the items purchased at the Survivor Auction.
survivor_auction
is at the castaway level and includes all castaways whether or not
they purchased an item and auction_details
is at the item level.
auction_details
auction_details
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
item
Item number
item_description
Item description
category
The item category. See details for more.
castaway
Castaway
castaway_id
Castaway ID
covered
If the item was covered or not
cost
The amount paid for the item
money_remaining
How much money the castaway has remaining
auction_num
If the same item is auctioned for a second time it has a value of 2
participated
The names of castaways that could participate in the purchased item e.g. sharing a tub of peanut butter with the tribe
notes
Additional notes
alternative_offered
If and alternative was offered to the player after purchase
alternative_accepted
If they accepted the alternative offer
other_item
Description of the refused item
other_item_category
Category of the refused item
Each item has been categorised into 5 main categories: 1. Food and drink: The most common item. It may be simply food or drink, not necessarily both. 2. Comfort: Things like a shower, toothpaste, etc 3. Letters from home 4. Advantage: Could be a clue to a hidden immunity idol, advantage in the next challenge, or in the current auction 5. Bad item: The not good item, typically one of the covered items. Whether or not it's actually bad is subjective, but where someone is hoping for pizza and gets bat soup I consider it a bad item.
https://survivor.fandom.com/wiki/Main_Page
A mapping table for easily filtering to the set of castaways that are still in the game after a specified number of boots.
boot_mapping
boot_mapping
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
order
The number of boots that there have been in the game e.g. if 'order == 2' there have been 2 boots in the game so far and there are N-2 castaways left in the game
final_n
The final number of castaways e.g. you can filter to the final 4 by 'filter(boot_mapping, final_n == 4)'. There are missing values where players have returned to the game. This means there are multiple stages of the game where there is a different make up of the final 8, for example. This field just takes the last set so that you can filter for 'final_n' and it will return a single set of castaways.
n_boots
Similar to 'final_n' but the number of boots in the game. This is different to 'order' where order counts if someone has been booted twice. 'n_boots' is simply the number of people in the season minus the 'final_n'.
sog_id
Stage of game ID for joining to vote_history
and challenge_results
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
castaway
Name of the castaway
tribe
Name of the tribe the castaway was on
tribe_status
The status of the tribe e.g. original, swapped, merged, etc. See details for more
game_status
Logical flag to identify if the castaway is currently in the game. If 'FALSE' the castaway is on Redemption Island or Edge of Extinction.
https://en.wikipedia.org/wiki/Survivor_(American_TV_series) https://survivor.fandom.com/wiki/Main_Page
A dataset containing details on the castaways for each season
castaway_details
castaway_details
This data frame contains the following columns:
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU (TBA).
full_name
Full name of the castaway
full_name_detailed
A detailed version of full_name for plotting e.g. 'Boston' Rob Mariano
castaway
Short name of the castaway. Name typically used during the season. Sometimes there are multiple people with the same name e.g. Rob C and Rob M in Survivor All-Stars. This field takes the most verbose name used
date_of_birth
Date of birth
date_of_death
Date of death
gender
Gender of castaway
african
TRUE
if African-American or African-Canadian as per https://survivor.fandom.com/wiki/Main_Page
asian
TRUE
if Asian-American or Asian-Canadian as per https://survivor.fandom.com/wiki/Main_Page
latin_american
TRUE
if Latin-American as per https://survivor.fandom.com/wiki/Main_Page
native_american
TRUE
if Native-American as per https://survivor.fandom.com/wiki/Main_Page
bipoc
Black, Indigenous, or Person of Colour
lgbt
LGBTQIA+ status as listed on the survivor wiki.
personality_type
The Myer-Briggs personality type of the castaway
occupation
Occupation
three_words
Answer to the question "three words to describe you?"
hobbies
Answer to the question "what are you favourite hobbies?"
pet_peeves
Answer to the question "what are your pet peeves?"
race
Race (if known)
ethnicity
Ethnicity (if known)
Race and ethnicity data is included if known and can point to a source, rather than making an assumption about an individual.
poc
has been deprecated and replaced with bipoc
which is now logical and only for the US. bipoc
is
TRUE
if any of african
, asian
, latin_american
, or native_american
is TRUE
.
https://survivor.fandom.com/wiki/Main_Page, https://www.personality-database.com/
library(dplyr) castaway_details |> count(gender)
library(dplyr) castaway_details |> count(gender)
A dataset containing details on the results for every castaway and season
castaways
castaways
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season
Season number
season_name
Season name
full_name
Full name of the castaway
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU (TBA).
castaway
Name of castaway. Generally this is the name they were most commonly referred to or nickname e.g. no one called Coach, Benjamin. He was simply Coach
age
Age of the castaway during the season they played
city
City of residence during the season they played
state
State of residence during the season they played
episode
Episode number
day
Number of days the castaway survived. A missing value indicates they later returned to the game that season
order
Boot order. Order in which castaway was voted out e.g. 5 is the 5th person voted of the island
result
Final result
result_number
Result number i.e. the final place. NA for castaways that were voted out but later returned e.g. Redemption Island
jury_status
Jury status
original_tribe
Original tribe name
finalist
Logical. TRUE
if the castaway was a finalists
jury
Logical. TRUE
if the castaway was a jury member
winner
Logical. TRUE
if the castaway was the winner
acknowledge
Did the contestant acknowledge their teammates in one of these specific ways after snuffing — or just walk away?
ack_gesture
for any physical gestures towards the tribe after torch snuffing. Types: wave, nod, wink, bow or prayer sign with hands
ack_look
For making eye contact with one or more members of the tribe after torch snuffing
ack_smile
For smiling at the tribe after torch snuffing
ack_speak
For any verbal communication directed at the tribe after torch snuffing
ack_quote
What, if anything, the contestant said. Direct quotes only.
ack_score
The score is derived from the four subcategories of acknowledgment: words, look, gesture, and smile. Each true value in these categories adds 1 to the score.
Note that in the seasons where castaways returned to the game e.g. Redemption Island, a castaway may appear twice.
https://en.wikipedia.org/wiki/Survivor_(American_TV_series);
https://survivor.fandom.com/wiki/Main_Page;
ack_
features from Matt Stiles https://github.com/stiles/survivor-voteoffs
library(dplyr) castaways %>% filter(season == 40)
library(dplyr) castaways %>% filter(season == 40)
A dataset detailing the challenges played and the elements they include over all seasons of Survivor
challenge_description
challenge_description
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
challenge_id
Primary key
challenge_number
challenge_type
name
The name of the challenge
recurring_name
Challenges can go by different names but are often associated with a particular challenge or element
of a challenge. Some challenges use combinations of other challenges so it's not perfect but consistent with the wiki page.
Use recurring_name
to analyse how often a challenge has been run.
description
Description of the challenge
reward
Description of the reward
additional_stipulation
Some challenges come with various rules or success criteria. This states those conditions.
race
If the challenge is a race between tribes, teams or individuals
endurance
If the challenge is an endurance event e.g. last tribe, team, individual standing
turn_based
If the challenge is turn bases i.e. conducted in rounds
puzzle
If the challenge contains a puzzle element
puzzle_slide
If the challenge contained a slide puzzle
puzzle_word
If the challenge contained a word puzzle
precision
If the challenge contains a precision element e.g. shooting an arrow, hitting a target, etc
precision_catch
If the challenge featured catching a ball or similar
precision_roll_ball
If the challenge featured rolling a ball
precision_slingshot
If the challenge featured a slingshot, either the large version or handheld version
precision_throw_balls
If the challenge featured throwing balls
precision_throw_coconuts
If the challenge featured throwing coconuts
precision_throw_rings
if the challenge featured throwing rings
precision_throw_sandbags
if the challenge featured throwing sandbags
strength
If the challenge has a strength based
balance
If the challenge contains a balancing element. My refer to the player balancing on something or the player balancing an object on something e.g. The Ball Drop
balance_beam
If the challenge featured a balance beam of similar they were required to balance on
balance_ball
If the challenge featured balancing a ball on something
food
If the challenge contains a food element e.g. the food challenge, biting off chunks of meat
knowledge
If the challenge contains a knowledge component e.g. Q and A about the location
memory
If the challenge contains a memory element e.g. memorising a sequence of items
fire
If the challenge contains an element of fire making / maintaining
water
If the challenge is held, in part, in the water
water_swim
If castaways had to swim in the challenge
water_paddling
If castwways were required to paddle a boat or similar
obstacle_blindfolded
If the challenge required castaways to be blindfolded
obstacle_cargo_net
If the challenge featured a cargo net
obstacle_chopping
If castaways were required to chop a rope or similar
obstacle_combination_lock
If the challenge feature a combination lock
obstacle_digging
If the challenge involved digging
obstacle_knots
If the challenge involved untying knots
obstacle_padlocks
If the challenge featured opening padlocks
mud
If the challenge required castaways to get covered in mud
This data set contains the name, description, and descriptive features for each challenge where it is known. Challenges can go by different names so have included the unique name and the recurring challenge name. These are taken directly from the [Survivor Wiki](https://survivor.fandom.com/wiki/Category:Recurring_Challenges). Sometimes there can be variations made on the challenge but go but the same name, or the challenge is integrated with a longer obstacle. In these cases the challenge may share the same recurring challenge name but have a different challenge name. Even if they share the same names the description could be different.
The features of each challenge have been determined largely through string searches of key words that describe the challenge. It may not be 100 different and inconsistent descriptions but in most part they will provide a good basis for analysis.
If any descriptive features need altering please let me know in the [issues](https://github.com/doehm/survivoR/issues).
For updated data please see the git version.
https://survivor.fandom.com/wiki/Category:Challenges https://survivor.fandom.com/wiki/Main_Page
library(dplyr) library(tidyr) challenge_description
library(dplyr) library(tidyr) challenge_description
A dataset detailing the challenges played including reward and immunity challenges.
challenge_results
challenge_results
This data frame contains the following columns
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
n_boots
The number of boots that there have been in the game e.g. if 'n_boots == 2' there have been 2 boots in the game so far and there are N-2 castaways left in the game
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU (TBA).
castaway
Name of castaway. Generally this is the name they were most commonly referred to or nickname e.g. no one called Coach, Benjamin. He was simply Coach
outcome_type
Whether the challenge is individual or tribal. Some individual reward challenges may involve multiple castaways as the winner gets to choose who they bring along
tribe
Current tribe the castaway is on
tribe_status
The status of the tribe e.g. original, swapped, merged, etc. See details for more
challenge_type
The challenge type e.g. immunity, reward, etc
challenge_id
Primary key to the challenge_description
data set which contains features of the challenge
result
Result of challenge
result_notes
Additional notes about the result of the challenge
order_of_finish
Order of finish for tribal challenges. Useful when there are 3 or more tribes to see who actually came first, second and who lost the challenge.
chosen_for_reward
If after the reward challenge the castaway was chosen to participate in the reward
sit_out
TRUE
if they sat out of the challenge or FALSE
if they participate
team
Team allocation when they are split into teams
sog_id
Stage of game ID for joining to boot_mapping
and vote_history
https://en.wikipedia.org/wiki/Survivor_(American_TV_series) https://survivor.fandom.com/wiki/Main_Page
library(dplyr) library(tidyr) challenge_results %>% filter(season == 40)
library(dplyr) library(tidyr) challenge_results %>% filter(season == 40)
A dataset summarising challenge_results
challenge_summary
challenge_summary
This data frame contains the following columns
category
The category of the challenge e.g. tribal, individual, individual immunity, duel, etc. This makes it easy
to split out the difference types of challenges and avoid complications such as 'Team / Individual' challenges where there is a
dependent outcome structure. Join to challenge_results
using challenge_id
, version_season
and castaway_id
version_season
Version season key
challenge_id
Primary key to the challenge_description
data set which contains features of the challenge
challenge_type
The challenge type e.g. immunity, reward, etc
outcome_type
Whether the challenge is individual or tribal. Some individual reward challenges may involve multiple castaways as the winner gets to choose who they bring along
tribe
Current tribe the castaway is on
castaway
Name of castaway. Generally this is the name they were most commonly referred to or nickname e.g. no one called Coach, Benjamin. He was simply Coach
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU (TBA).
n_entities
Number of entities competing for the win e.g. the number of tribes, teams, or people.
n_winners
Number of winners (or winning entities) e.g. if there are two tribes there is only one winning tribe, if there are three tribes like the new era there are two winning tribes and one that goes to tribal council.
n_in_team
The number of people in the tribe or team
won
If the castaway won
https://en.wikipedia.org/wiki/Survivor_(American_TV_series) https://survivor.fandom.com/wiki/Main_Page
library(dplyr) library(tidyr) challenge_summary %>% filter(version_season == 46)
library(dplyr) library(tidyr) challenge_summary %>% filter(version_season == 46)
A dataset containing the count of confessionals per castaway per episode. A confessional is when the castaway is speaking directly to the camera about their game.
confessionals
confessionals
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
castaway
Name of the castaway
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
confessional_count
The count of confessionals for the castaway during the episode
confessional_time
The total time for all confessionals for the episode for each castaway
index_count
The index based on the confessional counts. See details.
index_time
The index based on the confessional time. See details.
Confessional data has been counted by contributors of the survivoR R package and consolidated with external sources. The aim is to establish consistency in confessional counts in the absence of official sources. Given the subjective nature of the counts and the potential for clerical error no single source is more valid than another. Therefore, it is reasonable to average across all sources.
In the case of double or extended episodes, if the episode only has one title it is considered a single episode. This means the average number of confessionals per person is likely to be higher for this episode given it's length. If there are two episode titles the confessionals are counted for the appropriate episode. This is to ensure consistency across all other datasets.
In the case of recap episodes, this episode is left blank.
The indexes are a measure of how many more confessional counts or time the castaway has received given the point in the game. For example a 'index_count' of 1 implies the castaway has received the expected number of confessionals given equal share within tribe. An index of 1.5 implies have have received 50 typically receives more confessionals for the episode. Makes sense. 'index_time' is the same but using time instead of counts.
If you also count confessionals, please get in touch and I'll add them into the package.
A dataset containing a summary of all US episodes seasons of Survivor
episode_summary
episode_summary
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
episode
Episode number
episode_summary
summary of the episode
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
A dataset containing details for each episode
episodes
episodes
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
Season number
episode_number_overall
The cumulative episode number
episode
Episode number for the season
episode_title
Episode title
episode_label
A standardised episode label
episode_date
Date the episode aired
episode_length
Episode length in minutes
viewers
Number of viewers (millions) who tuned in
imdb_rating
IMDb rating for the episode on a scale of 0-10
n_ratings
The number of ratings submitted to IMDb
episode_summary
Description of the episode from wikipedia
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
Returns the URL for the image of the specified castaways by their 'castaway_id' and season / version they were in
get_castaway_image(castaway_ids, version_season)
get_castaway_image(castaway_ids, version_season)
castaway_ids |
Castaway ID |
version_season |
Version season key for the season they played |
Character vector of URLs
library(dplyr) survivoR::castaways %>% filter(version_season == "US42") %>% mutate(castaway_image = get_castaway_image(castaway_id, version_season))
library(dplyr) survivoR::castaways %>% filter(version_season == "US42") %>% mutate(castaway_image = get_castaway_image(castaway_id, version_season))
Takes the output of the times recorded from the Shiny app and aggregates to the final
confessional times and confessional counts. confessional_time
is the total duration
in seconds for the episode. confessional_count
is the number of confessionals
recorded to be at least 10 seconds apart.
get_confessional_timing(x, .vs, .episode, .mda = 3)
get_confessional_timing(x, .vs, .episode, .mda = 3)
x |
Either a data frame or path(s) to the csv file containing all the time stamps from the Shiny app |
.vs |
Version season |
.episode |
Episode |
.mda |
Missing duration adjustment (MDA) in seconds. If either start or stop is missing from the records, the missing value is imputed with a 3 second adjustment by default. |
data frame
# After running app and recording confessionals, run... # Example from a saved timing file library(readr) path <- system.file(package = "survivoR", "extdata/US4412.csv") df_us4412 <- read_csv(path) get_confessional_timing(df_us4412, .vs = "US44", .episode = 12)
# After running app and recording confessionals, run... # Example from a saved timing file library(readr) path <- system.file(package = "survivoR", "extdata/US4412.csv") df_us4412 <- read_csv(path) get_confessional_timing(df_us4412, .vs = "US44", .episode = 12)
A dataset containing details on the final jury votes to determine the winner for each season
jury_votes
jury_votes
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
castaway
Name of the castaway
finalist
The finalists for which a vote can be placed
vote
Vote. 0-1 variable for easy summation
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
finalist_id
The ID of the finalist for which a vote can be placed. Consistent with castaway ID
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
library(dplyr) jury_votes %>% filter(season == 40) %>% group_by(finalist) %>% summarise(votes = sum(vote))
library(dplyr) jury_votes %>% filter(season == 40) %>% group_by(finalist) %>% summarise(votes = sum(vote))
Launches the confessional timing app in either a browser or viewer. Default is set to browser. The user is required to provide a path for which the time stamps are recorded.
launch_confessional_app(browser = TRUE, path = NULL, write = TRUE)
launch_confessional_app(browser = TRUE, path = NULL, write = TRUE)
browser |
Open in browser instead of viewer. Default |
path |
Parent directory for output files. Default is a sub-folder |
write |
Write to disc. Default |
An active R shiny application
## Only run this example in interactive R sessions if(interactive()) { # launch app # launch_confessional_app() }
## Only run this example in interactive R sessions if(interactive()) { # launch app # launch_confessional_app() }
A dataset summarising the screen time of contestants on the TV show Survivor. Currently only contains Season 1-4 and 42.
screen_time
screen_time
This data frame contains the following columns:
version_season
Version season key
episode
Episode number
castaway_id
ID of the castaway (primary key). Also includes two special IDs of host (i.e. Jeff Probst) or unknown (the image detection couldn't identify the face with sufficient accuracy)
screen_time
Estimated screen time for the individual in seconds.
Individuals' screen time is calculated, at a high-level, via the following process:
Frames are sampled from episodes on a 1 second time interval
MTCNN detects the human faces within each frame
VGGFace2 converts each detected face into a 512d vector space
A training set of labelled images (1 for each contestant + 3 for Jeff Probst) is processed in the same way to determine where they sit in the vector space. TODO: This could be made more accurate by increasing the number of training images per contestant.
The Euclidean distance is calculated for the faces detected in the frame to each of the contestants in the season (+Jeff). If the minimum distance is greater than 1.2 the face is labelled as "unknown". TODO: Review how robust this distance cutoff truly is - currently based on manual review of Season 42.
A multi-class SVM is trained on the training set to label faces. For any face not identified as "unknown", the vector embedding is run into this model and a label is generated.
All labelled faces are aggregated together, with an assumption of 1 full second of screen time each time a face is seen.
A dataset containing palettes generated from the season logos
season_palettes
season_palettes
This nested data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
palette
The season palette
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
A dataset containing a summary of all seasons of Survivor
season_summary
season_summary
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
Season name
season
Season number
n_cast
Number of cast in the season
n_tribes
Number of starting tribes
n_finalists
Number of finalists
n_jury
Number of jury members
location
Location of the season
country
Country the season was held
tribe_setup
Initial setup of the tribe e.g. heroes vs Healers vs Hustlers
full_name
Full name of the winner
winner_id
ID for the winner of the season (primary key)
winner
Winner of the season
runner_ups
Runner ups for the season. Either one or two runner ups as a string
final_vote
Final vote allocation. See the jury_votes
data set for better aggregation of this data
timeslot
Timeslot of the show in the US
premiered
Date the first episode aired
ended
Date the season ended
filming_started
Date the filming of the season started
filming_ended
Date the filming ended (39 or 42 days after the start)
viewers_premiere
Number of viewers (millions) who tuned in for the premier
viewers_finale
Number of viewers (millions) who tuned in for the finale
viewers_reunion
Number of viewers (millions) who tuned in for the reunion
viewers_mean
Average number of viewers (millions) who tuned in over the season
rank
Season rank
https://en.wikipedia.org/wiki/Survivor_(American_TV_series) https://survivor.fandom.com/wiki/Main_Page
A dataset showing who attended the Survivor Auction during the seasons they were held.
survivor_auction
is at the castaway level and includes all castaways whether or not
they purchased an item and auction_details
is at the item level.
survivor_auction
survivor_auction
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
n_boots
The number of boots so far in the game
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU (TBA).
castaway
Name of castaway. Generally this is the name they were most commonly referred to or nickname e.g. no one called Coach, Benjamin. He was simply Coach
tribe_status
The status of the tribe e.g. original, swapped, merged, etc. See details for more
tribe
Tribe name
currency
Currency
total
Total amount either given to or found by the castaway
https://survivor.fandom.com/wiki/Main_Page
ggplot2
scales for each season of Survivor.
survivor_pal(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_fill_survivor(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_colour_survivor(season = NULL, scale_type = "d", reverse = FALSE, ...)
survivor_pal(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_fill_survivor(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_colour_survivor(season = NULL, scale_type = "d", reverse = FALSE, ...)
season |
Season number |
scale_type |
Discrete or continuous. Input |
reverse |
Logical. Reverse the palette? |
... |
Other arguments passed on to methods. |
Palettes are created from the logo for the season.
Scale functions for ggplot2
Scale functions for ggplot2
Scale functions for ggplot2
library(ggplot2) library(dplyr) mpg %>% ggplot(aes(x = displ, fill = manufacturer)) + geom_histogram(colour = "black") + scale_fill_survivor(40)
library(ggplot2) library(dplyr) mpg %>% ggplot(aes(x = displ, fill = manufacturer)) + geom_histogram(colour = "black") + scale_fill_survivor(40)
A dataset containing the tribe colours for each season
tribe_colours
tribe_colours
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
tribe
Tribe name
tribe_colour
Colour of the tribe
tribe_status
Tribe status e.g. original, swapped or merged. In the instance where a tribe is formed at the swap by splitting 2 tribes into 3, the 3rd tribe will be labelled 'swapped'
https://survivor.fandom.com/wiki/Tribe
library(ggplot2) library(dplyr) library(forcats) df <- tribe_colours %>% group_by(season_name) %>% mutate( xmin = 1, xmax = 2, ymin = 1:n(), ymax = ymin + 1 ) %>% ungroup() %>% mutate( season_name = fct_reorder(season_name, season), font_colour = ifelse(tribe_colour == "#000000", "white", "black") ) ggplot() + geom_rect(data = df, mapping = aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = df$tribe_colour) + geom_text(data = df, mapping = aes(x = xmin+0.5, y = ymin+0.5, label = tribe), colour = df$font_colour) + theme_void() + facet_wrap(~season_name, scales = "free_y")
library(ggplot2) library(dplyr) library(forcats) df <- tribe_colours %>% group_by(season_name) %>% mutate( xmin = 1, xmax = 2, ymin = 1:n(), ymax = ymin + 1 ) %>% ungroup() %>% mutate( season_name = fct_reorder(season_name, season), font_colour = ifelse(tribe_colour == "#000000", "white", "black") ) ggplot() + geom_rect(data = df, mapping = aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = df$tribe_colour) + geom_text(data = df, mapping = aes(x = xmin+0.5, y = ymin+0.5, label = tribe), colour = df$font_colour) + theme_void() + facet_wrap(~season_name, scales = "free_y")
A mapping for castaways to tribes for each day (day being the day of the tribal council) This is useful for observing who is on what tribe throughout the game.
tribe_mapping
tribe_mapping
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
day
The day of the tribal council
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
castaway
Name of the castaway
tribe
Name of the tribe the castaway was on
tribe_status
The status of the tribe e.g. original, swapped, merged, etc. See details for more
Each season by episode and day holds a complete list of castaways still in the game and which tribe they are on. Moving through each day you can observe the changes in the tribe. For example the first day has all castaways mapped to their original tribe. The next day has the same minus the castaway just voted out. This is useful for observing the changes in tribe make either due to castaways being voted off the island, tribe swaps, who is on Redemption Island and Edge of Extinction.
https://en.wikipedia.org/wiki/Survivor_(American_TV_series) https://survivor.fandom.com/wiki/Main_Page
To create scale functions for ggplot. Given a season of Survivor, a palette is created from the tribe colours for that season including the merged tribe.
tribes_pal(season = NULL, scale_type = "d", reverse = FALSE, tribe = NULL, ...) scale_fill_tribes(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_colour_tribes(season = NULL, scale_type = "d", reverse = FALSE, ...)
tribes_pal(season = NULL, scale_type = "d", reverse = FALSE, tribe = NULL, ...) scale_fill_tribes(season = NULL, scale_type = "d", reverse = FALSE, ...) scale_colour_tribes(season = NULL, scale_type = "d", reverse = FALSE, ...)
season |
Season number |
scale_type |
Discrete or continuous. Input |
reverse |
Logical. Reverse the palette? |
tribe |
Tribe names. Default |
... |
Other arguments passed on to methods. |
If it is intended the colours will correspond to the tribes e.g. a stacked bar chart of votes given to each finalist and the colour corresponds to their original tribe (as in the example below), the tribe vector needs to be passed to the scale function (for now). If no tribe vector is given it will simply treat the tribe colours as a colour palette.
Scale functions for ggplot2
Scale functions for ggplot2
Scale functions for ggplot2
library(ggplot2) library(stringr) library(dplyr) library(glue) ssn <- 35 labels <- castaways %>% filter( season == ssn, str_detect(result, "Sole|unner") ) %>% select(castaway, original_tribe) %>% mutate(label = glue("{castaway} ({original_tribe})")) %>% select(label, castaway) jury_votes %>% filter(season == ssn) %>% left_join( castaways %>% filter(season == ssn) %>% select(castaway, original_tribe), by = "castaway" ) %>% group_by(finalist, original_tribe) %>% summarise(votes = sum(vote)) %>% left_join(labels, by = c("finalist" = "castaway")) %>% { ggplot(., aes(x = label, y = votes, fill = original_tribe)) + geom_bar(stat = "identity", width = 0.5) + scale_fill_tribes(ssn, tribe = .$original_tribe) + theme_minimal() + labs( x = "Finalist (original tribe)", y = "Votes", fill = "Original\ntribe", title = "Votes received by each finalist" ) }
library(ggplot2) library(stringr) library(dplyr) library(glue) ssn <- 35 labels <- castaways %>% filter( season == ssn, str_detect(result, "Sole|unner") ) %>% select(castaway, original_tribe) %>% mutate(label = glue("{castaway} ({original_tribe})")) %>% select(label, castaway) jury_votes %>% filter(season == ssn) %>% left_join( castaways %>% filter(season == ssn) %>% select(castaway, original_tribe), by = "castaway" ) %>% group_by(finalist, original_tribe) %>% summarise(votes = sum(vote)) %>% left_join(labels, by = c("finalist" = "castaway")) %>% { ggplot(., aes(x = label, y = votes, fill = original_tribe)) + geom_bar(stat = "identity", width = 0.5) + scale_fill_tribes(ssn, tribe = .$original_tribe) + theme_minimal() + labs( x = "Finalist (original tribe)", y = "Votes", fill = "Original\ntribe", title = "Votes received by each finalist" ) }
A dataset containing the viewer history for each season and episode
viewers
viewers
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
Season number
episode_number_overall
The cumulative episode number
episode
Episode number for the season
episode_title
Episode title
episode_label
A standardised episode label
episode_date
Date the episode aired
episode_length
Episode length in minutes
viewers
Number of viewers (millions) who tuned in
imdb_rating
IMDb rating for the episode on a scale of 0-10
n_ratings
The number of ratings submitted to IMDb
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
A dataset containing details on the vote history for each season
vote_history
vote_history
This data frame contains the following columns:
version
Country code for the version of the show
version_season
Version season key
season_name
The season name
season
The season number
episode
Episode number
day
Day the tribal council took place
tribe_status
The status of the tribe e.g. original, swapped, merged, etc. See details for more
tribe
Tribe name
castaway
Name of the castaway
immunity
Type of immunity held by the castaway at the time of the vote e.g. individual, hidden (see details for hidden immunity data)
vote
The castaway for which the vote was cast
vote_event
Extra details on the vote e.g. Won or lost the fire challenge, played an extra vote, etc
vote_event_outcome
The outcome of the vote event
split_vote
If there was a decision to split the vote this records who the vote was split with. Helps to identify successful boots
nullified
Was the vote nullified by a hidden immunity idol? Logical
tie
If the set of votes resulted in a tie. Logical
voted_out
The castaway who was voted out
order
Boot order. Order in which castaway was voted out e.g. 5 is the 5th person voted of the island
vote_order
In the case of ties this indicates the order the votes took place
castaway_id
ID of the castaway (primary key). Consistent across seasons and name changes e.g. Amber Brkich / Amber Mariano. The first two letters reference the country of the version played e.g. US, AU.
vote_id
ID of the castaway voted for
voted_out_id
ID of the castaway voted_out
sog_id
Stage of game ID for joining to boot_mapping
and challenge_results
challenge_id
Primary key to the challenge_description
data set which contains features of the challenge. The helps map the immunity challenge which result in the tribal.
This data frame contains a complete history of votes cast across all seasons of Survivor. While there are consistent
events across the seasons there are some unique events such as the 'mutiny' in Survivor: Cook Islands (season 13)
or the 'Outcasts' in Survivor: Pearl Islands (season 7). For maintaining a standard, whenever there has been a change
in tribe for the castaways it has been recorded as swapped
. swapped
is used as the
term since 'the tribe swap' is a typical recurring milestone in each season of Survivor. Subsequent changes are recorded with
a trailing digit e.g. swapped2
. This includes absorbed tribes e.g. Stephanie was 'absorbed'
in Survivor: Palau (season 10) and when 3 tribes are
reduced to 2. These cases are still considered 'swapped' to indicate a change in tribe status.
Some events result in a castaway attending tribal but not voting. These are recorded as
Win
The castaway won the fire challenge
Lose
The castaway lost the fire challenge
None
The castaway did not cast a vote. This may be due to a vote steal or some other means
Immune
The castaway did not vote but were immune from the vote
Where a castaway has immunity == 'hidden'
this means that player is protected by a hidden immunity idol. It may not
necessarily mean they played the idol, the idol may have been played for them. While the nullified votes data is complete
the immunity
data does not include those who had immunity but did not receive a vote. This is a TODO.
In the case where the 'steal a vote' advantage was played, there is a second row for the castaway that stole the vote.
The castaway who had their vote stolen are is recorded as None
.
Many castaways have been medically evacuated, quit or left the game for some other reason. In these cases where no votes
were cast there is a skip in the order
variable. Since no votes were cast there is nothing to record on this
data frame. The correct order in which castaways departed the island is recorded on castaways
.
In the case of a tie, voted_out
is recorded as tie
to indicate no one was voted off the island in that
instance. The re-vote is recorded with vote_order = 2
to indicate this is the second round of voting. In
the case of a second tie voted_out
is recorded as tie2
. The third step is either a draw of rocks,
fire challenge or countback (in the early days of survivor). In these cases vote
is recorded as the colour of the
rock drawn, result of the fire challenge or 'countback'.
https://en.wikipedia.org/wiki/Survivor_(American_TV_series)
# The number of times Tony voted for each castaway in Survivor: Winners at War library(dplyr) vote_history %>% filter( season == 40, castaway == "Tony" ) %>% count(vote)
# The number of times Tony voted for each castaway in Survivor: Winners at War library(dplyr) vote_history %>% filter( season == 40, castaway == "Tony" ) %>% count(vote)