library(tidyverse)
library(skimr)
Deal or No Deal - Investigating Shark Tank Deals Throughout the Show
Proposal
Data 1 - Shark Tank
Introduction and data
Identify the source of the data.
- The first data source called
shark_tank.csv
is located in thedata
folder. This dataset is a collection of observations from business pitches from the first 14 seasons of the American TV show “Shark Tank.” The dataset contains 1038 observations, with each observation being a unique business pitch. Associated with each observation are 52 columns that contain information regarding some pitch.
- The first data source called
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The dataset, which is available for public use via
kaggle.com
, was curated by a collaborator namedSatya Thirumani
. The page containing the dataset can be found at this link. The dataset is continually updated by this collaborator as Shark Tank airs new episodes and continues to release seasons of the show. The dataset was last updated on March 9, 2023. For the sake of this project, we will not include additional updates to this dataset past this March 9th update.
- The dataset, which is available for public use via
Write a brief description of the observations.
- Each observation contains 52 columns regarding the business pitch. Within these columns, there is a wide range of information. First, there’s details about the pitch as it relates to the Shark Tank show (ie; season, episode, when it aired, etc.). There are columns regarding the company being pitched such as its name and industry, as well as the entrepreneurs who run the company. Finally, there’s a great deal of information for each observation about the economics of each pitch. This includes data about the amount of money requested, the company’s original valuation, and ultimately which sharks (investors), if any, invested in the company and at what new valuation. Due to the nature of the show, where not all companies receive investment, it makes sense that many column/row pairings have blank (ie; NA) values.
Research question
- Research question(s)
- What makes a Shark Tank pitch more likely to reach a deal on the show?
- This question is important as it allows us to visualize trends in the types of Shark Tank pitches that ultimately reach a deal. The data may reveal that certain industries, investors, or investment deals prove to have higher success rates, which could serve as a guide to future entrepreneurs looking to pitch their business on the show.
- When a company makes a deal on Shark Tank, how often do they receive the deal they requested? Do companies usually have to lower their valuations to reach a deal?
- This question is important as it may show that certain investment offers proposed by entrepreneurs are more or less beneficial toward reaching a deal based on the metrics. It may reveal a ‘sweet spot’, where there is a trend in the ranges of valuations that ultimately result in a deal with an investor.
- Which sharks (investors) on the show are most likely to invest in a company given certain metrics about the company? Do certain sharks typically invest in companies in certain industries? Or in companies where the entrepreneurs come from a certain background?
- This question is important as it could unveil biases or tendencies in the deals that certain investors tend to make with entrepreneurs. Certain investors may be more inclined to reach a deal with a business if it meets the specific qualifications (if any) that they prefer in a pitch.
- What makes a Shark Tank pitch more likely to reach a deal on the show?
- Description of research topic
- Analysis of Shark Tank business pitches: how a company does or doesn’t reach a deal on the hit show “Shark Tank”
- Hypotheses on the topic
Companies that reach deals on Shark Tank typically must lower their valuation from their original ask.
Sharks typically invest more often in companies that are in their respective areas of expertise.
Sharks are more likely to invest in companies whose entrepreneurs come from a similar background as themselves.
- Types of variables in the research question
- In order to answer the questions listed above, there are a number of categorical and quantitative variables that are needed. In terms of quantitative variables, there must be data regarding capital/equity that was initially requested at the onset of the pitch, as well as capital/equity that is agreed upon if a deal is ultimately reached. In terms of categorical data, there will have to be data points regarding the business such as its industry and name. There will also have to be data about the background of the pitchers/entrepreneurs. Moreover, there should be data about the background of the sharks (investors). There also must be a crucial variable - whether or not the pitch results in a deal.
Glimpse of data
<- read_csv("data/shark_tank.csv") shark_tank
Rows: 1038 Columns: 52
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): Season Start, Season End, Original Air Date, Startup Name, Industr...
dbl (38): Season Number, Episode Number, Pitch Number, Multiple Entrepreneur...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(shark_tank)
Name | shark_tank |
Number of rows | 1038 |
Number of columns | 52 |
_______________________ | |
Column type frequency: | |
character | 14 |
numeric | 38 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Season Start | 0 | 1.00 | 9 | 9 | 0 | 14 | 0 |
Season End | 7 | 0.99 | 9 | 9 | 0 | 13 | 0 |
Original Air Date | 408 | 0.61 | 9 | 9 | 0 | 154 | 0 |
Startup Name | 0 | 1.00 | 3 | 32 | 0 | 1036 | 0 |
Industry | 0 | 1.00 | 6 | 23 | 0 | 15 | 0 |
Business Description | 0 | 1.00 | 5 | 92 | 0 | 1036 | 0 |
Pitchers Gender | 5 | 1.00 | 4 | 10 | 0 | 3 | 0 |
Pitchers City | 540 | 0.48 | 3 | 18 | 0 | 250 | 0 |
Pitchers State | 299 | 0.71 | 2 | 6 | 0 | 46 | 0 |
Pitchers Average Age | 992 | 0.04 | 3 | 6 | 0 | 4 | 0 |
Entrepreneur Names | 557 | 0.46 | 8 | 60 | 0 | 479 | 0 |
Company Website | 570 | 0.45 | 9 | 65 | 0 | 466 | 0 |
Guest Name | 837 | 0.19 | 9 | 17 | 0 | 24 | 0 |
Notes | 899 | 0.13 | 9 | 191 | 0 | 129 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Season Number | 0 | 1.00 | 6.76 | 3.11 | 1.00 | 4.00 | 7.00 | 9.00 | 1.400e+01 | ▃▇▅▇▁ |
Episode Number | 0 | 1.00 | 12.12 | 7.74 | 1.00 | 5.00 | 11.00 | 18.00 | 2.900e+01 | ▇▆▅▅▂ |
Pitch Number | 0 | 1.00 | 519.50 | 299.79 | 1.00 | 260.25 | 519.50 | 778.75 | 1.038e+03 | ▇▇▇▇▇ |
Multiple Entrepreneurs | 487 | 0.53 | 0.35 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.000e+00 | ▇▁▁▁▅ |
US Viewership | 416 | 0.60 | 6.10 | 1.35 | 2.31 | 5.15 | 6.38 | 7.11 | 8.640e+00 | ▁▃▅▇▃ |
Original Ask Amount | 0 | 1.00 | 281798.65 | 379843.24 | 10000.00 | 100000.00 | 200000.00 | 300000.00 | 5.000e+06 | ▇▁▁▁▁ |
Original Offered Equity | 0 | 1.00 | 14.64 | 8.91 | 1.50 | 10.00 | 10.00 | 20.00 | 1.000e+02 | ▇▁▁▁▁ |
Valuation Requested | 0 | 1.00 | 3163290.63 | 4804725.88 | 40000.00 | 600000.00 | 1485294.00 | 3333333.00 | 4.500e+07 | ▇▁▁▁▁ |
Got Deal | 0 | 1.00 | 0.58 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.000e+00 | ▆▁▁▁▇ |
Total Deal Amount | 436 | 0.58 | 290921.37 | 378899.37 | 0.00 | 100000.00 | 200000.00 | 300000.00 | 5.000e+06 | ▇▁▁▁▁ |
Total Deal Equity | 436 | 0.58 | 25.51 | 16.18 | 0.00 | 15.00 | 25.00 | 33.00 | 1.000e+02 | ▇▇▂▁▁ |
Deal Valuation | 436 | 0.58 | 2042821.14 | 3718413.81 | 0.00 | 336206.75 | 800000.00 | 2000000.00 | 3.600e+07 | ▇▁▁▁▁ |
Number of sharks in deal | 436 | 0.58 | 1.32 | 0.63 | 1.00 | 1.00 | 1.00 | 2.00 | 5.000e+00 | ▇▂▁▁▁ |
Investment Amount Per Shark | 436 | 0.58 | 245115.72 | 350301.99 | 0.00 | 75000.00 | 150000.00 | 300000.00 | 5.000e+06 | ▇▁▁▁▁ |
Equity Per Shark | 436 | 0.58 | 21.55 | 15.17 | 0.00 | 10.00 | 20.00 | 25.00 | 1.000e+02 | ▇▅▁▁▁ |
Royalty Deal | 987 | 0.05 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▇▁▁ |
Loan | 1001 | 0.04 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▇▁▁ |
Barbara Corcoran Investment Amount | 940 | 0.09 | 143520.41 | 137398.90 | 12500.00 | 50000.00 | 100000.00 | 200000.00 | 1.000e+06 | ▇▂▁▁▁ |
Barbara Corcoran Investment Equity | 940 | 0.09 | 23.98 | 13.09 | 5.00 | 15.00 | 20.00 | 32.25 | 5.500e+01 | ▇▇▂▂▂ |
Mark Cuban Investment Amount | 857 | 0.17 | 245649.17 | 278613.24 | 12500.00 | 75000.00 | 150000.00 | 300000.00 | 2.000e+06 | ▇▁▁▁▁ |
Mark Cuban Investment Equity | 857 | 0.17 | 18.80 | 15.40 | 2.50 | 10.00 | 15.00 | 25.00 | 1.000e+02 | ▇▃▁▁▁ |
Lori Greiner Investment Amount | 882 | 0.15 | 205993.59 | 198022.87 | 17500.00 | 75000.00 | 150000.00 | 250000.00 | 1.000e+06 | ▇▂▁▁▁ |
Lori Greiner Investment Equity | 882 | 0.15 | 16.61 | 12.03 | 0.00 | 10.00 | 12.50 | 20.00 | 6.500e+01 | ▇▅▁▁▁ |
Robert Herjavec Investment Amount | 938 | 0.10 | 290973.33 | 581148.81 | 5000.00 | 86458.33 | 150000.00 | 300000.00 | 5.000e+06 | ▇▁▁▁▁ |
Robert Herjavec Investment Equity | 938 | 0.10 | 18.66 | 13.36 | 0.00 | 10.00 | 15.00 | 25.00 | 1.000e+02 | ▇▃▁▁▁ |
Daymond John Investment Amount | 943 | 0.09 | 186805.26 | 319390.55 | 5000.00 | 50000.00 | 100000.00 | 240000.00 | 3.000e+06 | ▇▁▁▁▁ |
Daymond John Investment Equity | 943 | 0.09 | 26.06 | 16.18 | 0.00 | 15.82 | 25.00 | 33.30 | 1.000e+02 | ▇▇▁▁▁ |
Kevin O Leary Investment Amount | 942 | 0.09 | 236276.04 | 315926.33 | 20000.00 | 80000.00 | 150000.00 | 250000.00 | 2.500e+06 | ▇▁▁▁▁ |
Kevin O Leary Investment Equity | 942 | 0.09 | 15.83 | 11.65 | 0.00 | 8.56 | 10.83 | 25.00 | 5.000e+01 | ▇▃▂▁▁ |
Guest Investment Amount | 969 | 0.07 | 216606.28 | 239754.19 | 0.00 | 75000.00 | 125000.00 | 250000.00 | 1.250e+06 | ▇▂▁▁▁ |
Guest Investment Equity | 969 | 0.07 | 16.71 | 15.52 | 0.00 | 10.00 | 11.25 | 20.00 | 1.000e+02 | ▇▂▁▁▁ |
Barbara Corcoran Present | 143 | 0.86 | 0.56 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.000e+00 | ▆▁▁▁▇ |
Mark Cuban Present | 142 | 0.86 | 0.90 | 0.30 | 0.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▁▁▇ |
Lori Greiner Present | 142 | 0.86 | 0.75 | 0.43 | 0.00 | 0.75 | 1.00 | 1.00 | 1.000e+00 | ▂▁▁▁▇ |
Robert Herjavec Present | 142 | 0.86 | 0.88 | 0.33 | 0.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▁▁▇ |
Daymond John Present | 143 | 0.86 | 0.66 | 0.47 | 0.00 | 0.00 | 1.00 | 1.00 | 1.000e+00 | ▅▁▁▁▇ |
Kevin O Leary Present | 143 | 0.86 | 0.96 | 0.21 | 0.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▁▁▇ |
Kevin Harrington Present | 143 | 0.86 | 0.95 | 0.23 | 0.00 | 1.00 | 1.00 | 1.00 | 1.000e+00 | ▁▁▁▁▇ |
Data 2 - March Madness
Introduction and data
Identify the source of the data.
- The data source called
march_madness.csv
is located in thedata
folder. This dataset is a collection of observations from March Madness (College basketball playoff tournament). The data comes from Kaggle, and can be found at this link.
- The data source called
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- This data was taken by the Washington Post from the NCAA tournament data that has been collected and recorded by the NCAA throughout the duration of past March Madness tournaments.
Write a brief description of the observations.
- Each observation represents one game. There is an observation for every game in each round of all tournaments from 1985-2021.
Research question
Research question(s)
- What is the relationship between a team’s seeding and how far it makes it in the tournament?
- What “underdog” seeds are most likely to win an upset?
Description of research topic
- March Madness is known to have a degree of randomness in which teams win each game, as nobody has ever predicted every game winner accurately. We are wondering how this randomness affects the likelihood of a team making it far in the tournament.
Hypotheses on the topic
High seeds and low seeds are more indicative of performance, while teams with middle-range seeds are less predictable.
Underdogs that are smaller underdogs (ie; ranked better) will have the best likelihood to upset a higher ranked team.
Types of variables in the research question
- Categorical values are all of the possible seeds and the year of the tournament. A quantitative variable is the average round a seed makes.
Glimpse of data
<- read_csv("data/march_madness.csv") march_madness
Rows: 2317 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): WTEAM, LTEAM
dbl (6): YEAR, ROUND, WSEED, WSCORE, LSEED, LSCORE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(march_madness)
Name | march_madness |
Number of rows | 2317 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
WTEAM | 0 | 1 | 3 | 25 | 0 | 207 | 0 |
LTEAM | 0 | 1 | 3 | 25 | 0 | 302 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
YEAR | 0 | 1 | 2002.76 | 10.47 | 1985 | 1994 | 2003 | 2012 | 2021 | ▇▇▇▇▇ |
ROUND | 0 | 1 | 1.86 | 1.21 | 0 | 1 | 1 | 2 | 6 | ▇▃▂▁▁ |
WSEED | 0 | 1 | 4.98 | 3.84 | 1 | 2 | 4 | 7 | 16 | ▇▃▂▂▁ |
WSCORE | 0 | 1 | 76.87 | 11.84 | 43 | 69 | 76 | 84 | 149 | ▂▇▂▁▁ |
LSEED | 0 | 1 | 8.72 | 4.60 | 1 | 5 | 9 | 13 | 16 | ▇▆▆▆▇ |
LSCORE | 0 | 1 | 65.19 | 11.05 | 29 | 58 | 65 | 72 | 115 | ▁▇▇▂▁ |
Data 3 - Goodreads and Google Books
Introduction and data
Identify the source of the data.
- This data comes in two parts. The first is webscraped from Goodreads’ website, specifically their most popular list, “Books That Everyone Should Read At Least Once.” The link is: “https://www.goodreads.com/list/show/264.Books_That_Everyone_Should_Read_At_Least_Once”. The second half of the data is from the Google Books API. We queried the API by searching each title from the Goodreads list. I wanted to use Goodreads’ API, but they discontinued it so we switched to the Google Books API. This resulted in some missing data points: while we used 800 books from the Goodreads list, only around 380 had an equivalent match on Google Books. With more time, we can expand this data set back to the 800 data point recommendation. ISBN was not available on the Goodreads website, which would have provided a more accurate way to search.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The Goodreads data was voted on by over 100,000 users. In total, the list features 20,000 books. Any user can vote, and the books are ranked by those with the most votes. The Google Books data was all collected by Google, in their attempts to digitize in mass the books of the world. Some books are not on the site due to copyright, so the selection varies. Further, Google Books often has multiple editions uploaded that alternate relevancy based on query, making selection difficult. Verified users and Google developers can add data to the site.
Write a brief description of the observations.
- This data set has 27 columns that cover everything from maturity rating to retail price. The relevant columns are title, author, publisher, publish date, description, page count, print type, categories (genre), average rating, number of ratings, maturity rating, language, links to preview or buy, subtitle, country, and price. The table can use a little more tidying and filtering.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Research Questions:
How does the average rating of a book vary across different genres, publication dates, and book lengths? Is there a singular model of book that emerges as most popular?
Do different publishing houses set different standards for qualities like page count and retail price of books? Does one of these models stand out as the most successful?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
Our research topic with this data set involves an analysis of popular books based on their genre, publication date, book length, publishing house, page count, and retail price. The study aims to investigate the relationship between these variables and the average rating of books. Additionally, we will seek to explore whether certain models of books emerge as more popular than others and/or whether publishing houses set different standards for qualities like page count and retail price.
Hypotheses:
The average rating of books varies significantly across different genres, publication dates, and book lengths. However, as this is voted by people, we expect to see certain genres mirror the trends of recent years - Romance may have the most presence on the list. Longer books may be perceived as having more depth and value, resulting in higher average ratings, but significantly less amount of ratings. We don’t believe that there will be a singular model of book that emerges as most popular from this dataset, as readers’ preferences and tastes can vary widely.
Different publishing houses may set different standards for qualities like page count and retail price of books. Some publishing houses may prioritize longer books with higher page counts, while others may prioritize shorter books that are more accessible to readers. However, we predict that they will be more consistent on genre and retail price than on page length. We expect to a variance in the trends of smaller publishing firms versus the dominant firms.
- Identify the types of variables in your research question. Categorical? Quantitative?
Categorical variables: genre, language, and maturity rating.
Quantitative, discrete variables: publication date, page count, average rating, number of ratings.
Quantitative continuous: price.
Glimpse of data
<- read_csv("data/books.csv") goodreadsdata
New names:
Rows: 386 Columns: 27
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(19): title, authors, publisher, publishedDate, description, printType, ... dbl
(5): pageCount, averageRating, ratingsCount, amount...23, amount...25 lgl (3):
allowAnonLogging, comicsContent, isEbook
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `amount...24` -> `amount...23`
• `currencyCode...25` -> `currencyCode...24`
• `amount...26` -> `amount...25`
• `currencyCode...27` -> `currencyCode...26`
glimpse(goodreadsdata)
Rows: 386
Columns: 27
$ title <chr> "To Kill a Mockingbird 40th", "Harry Potter and th…
$ authors <chr> "Harper Lee", "J.K. Rowling", "Jane Austen", "Anne…
$ publisher <chr> "HarperCollins Christian Publishing", "Pottermore …
$ publishedDate <chr> "1999-11-03", "2015-12-08", "2018", "2016", "2021-…
$ description <chr> "The explosion of racial hate and violence in a sm…
$ pageCount <dbl> 350, 311, 519, 0, 128, 117, 246, 165, 192, 0, 578,…
$ printType <chr> "BOOK", "BOOK", "BOOK", "BOOK", "BOOK", "BOOK", "B…
$ categories <chr> "FICTION", "Juvenile Fiction", "Courtship", "Amste…
$ averageRating <dbl> 4.5, 4.5, NA, 4.0, 4.0, 4.0, NA, 3.5, 3.5, 4.5, 4.…
$ ratingsCount <dbl> 2163, 2057, NA, 166, 7, 1906, NA, 3123, 485, 124, …
$ maturityRating <chr> "NOT_MATURE", "NOT_MATURE", "NOT_MATURE", "NOT_MAT…
$ allowAnonLogging <lgl> FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRU…
$ contentVersion <chr> "0.2.4.0.preview.0", "2.36.33.0.preview.3", "previ…
$ language <chr> "en", "en", "en", "en", "en", "en", "en", "en", "e…
$ previewLink <chr> "http://books.google.com/books?id=ayJpGQeyxgkC&pri…
$ infoLink <chr> "http://books.google.com/books?id=ayJpGQeyxgkC&dq=…
$ canonicalVolumeLink <chr> "https://books.google.com/books/about/To_Kill_a_Mo…
$ subtitle <chr> NA, NA, NA, NA, NA, NA, NA, "The Authorized Editio…
$ comicsContent <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ country <chr> "US", "US", "US", "US", "US", "US", "US", "US", "U…
$ saleability <chr> "NOT_FOR_SALE", "FOR_SALE", "NOT_FOR_SALE", "NOT_F…
$ isEbook <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TR…
$ amount...23 <dbl> NA, 9.99, NA, NA, NA, NA, NA, 1.99, NA, NA, NA, NA…
$ currencyCode...24 <chr> NA, "USD", NA, NA, NA, NA, NA, "USD", NA, NA, NA, …
$ amount...25 <dbl> NA, 9.99, NA, NA, NA, NA, NA, 1.99, NA, NA, NA, NA…
$ currencyCode...26 <chr> NA, "USD", NA, NA, NA, NA, NA, "USD", NA, NA, NA, …
$ buyLink <chr> NA, "https://play.google.com/store/books/details?i…
skim(goodreadsdata)
Name | goodreadsdata |
Number of rows | 386 |
Number of columns | 27 |
_______________________ | |
Column type frequency: | |
character | 19 |
logical | 3 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
title | 0 | 1.00 | 4 | 87 | 0 | 382 | 0 |
authors | 0 | 1.00 | 4 | 65 | 0 | 295 | 0 |
publisher | 51 | 0.87 | 2 | 44 | 0 | 151 | 0 |
publishedDate | 6 | 0.98 | 4 | 20 | 0 | 325 | 0 |
description | 27 | 0.93 | 14 | 4857 | 0 | 357 | 0 |
printType | 3 | 0.99 | 4 | 4 | 0 | 1 | 0 |
categories | 0 | 1.00 | 4 | 40 | 0 | 77 | 0 |
maturityRating | 0 | 1.00 | 10 | 10 | 0 | 1 | 0 |
contentVersion | 0 | 1.00 | 10 | 21 | 0 | 183 | 0 |
language | 0 | 1.00 | 2 | 2 | 0 | 7 | 0 |
previewLink | 0 | 1.00 | 77 | 210 | 0 | 385 | 0 |
infoLink | 0 | 1.00 | 72 | 194 | 0 | 383 | 0 |
canonicalVolumeLink | 0 | 1.00 | 59 | 102 | 0 | 383 | 0 |
subtitle | 278 | 0.28 | 6 | 165 | 0 | 85 | 0 |
country | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
saleability | 0 | 1.00 | 4 | 12 | 0 | 3 | 0 |
currencyCode…24 | 278 | 0.28 | 3 | 3 | 0 | 1 | 0 |
currencyCode…26 | 278 | 0.28 | 3 | 3 | 0 | 1 | 0 |
buyLink | 263 | 0.32 | 104 | 104 | 0 | 121 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
allowAnonLogging | 0 | 1.00 | 0.32 | FAL: 262, TRU: 124 |
comicsContent | 384 | 0.01 | 1.00 | TRU: 2 |
isEbook | 0 | 1.00 | 0.32 | FAL: 263, TRU: 123 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
pageCount | 12 | 0.97 | 340.86 | 279.84 | 0 | 164.00 | 297.00 | 448.00 | 1600 | ▇▅▁▁▁ |
averageRating | 90 | 0.77 | 4.04 | 0.53 | 2 | 4.00 | 4.00 | 4.50 | 5 | ▁▁▃▇▅ |
ratingsCount | 90 | 0.77 | 690.37 | 1217.53 | 1 | 11.75 | 118.50 | 482.00 | 4916 | ▇▁▁▁▁ |
amount…23 | 278 | 0.28 | 9.72 | 5.82 | 0 | 5.99 | 9.99 | 12.99 | 35 | ▅▇▂▁▁ |
amount…25 | 278 | 0.28 | 9.33 | 5.57 | 0 | 5.99 | 9.99 | 12.99 | 35 | ▅▇▁▁▁ |