The R Journal: Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson's Irises

Allison M. Horst; Alison Presmanes Hill; Kristen B. Gorman

doi:10.32614/RJ-2022-020

Data in the penguins object have been minimally updated from penguins_raw as follows:

Summary of the penguins_raw dataset

palmerpenguins for other programming languages

Python: Python users can load the palmerpenguins datasets into their Python environment using the following code to install and access data in the palmerpenguins Python package:

Julia: Julia users can access the penguins data in the PalmerPenguins.jl package. Example code to import the penguins data through PalmerPenguins.jl (more information on PalmerPenguins.jl from David Widmann can be found here):

TensorFlow: TensorFlow users can access the penguins data in TensorFlow Datasets. Information and examples for penguins data in TensorFlow can be found here.

Acknowledgements

All analyses were performed in the R language environment using version 4.1.2 (R Core Team 2021). Complete code for this paper is shared in the Supplemental Material. We acknowledge the following R packages used in analyses, with gratitude to developers and contributors:

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2022-020.zip

CRAN packages used

CRAN Task Views implied by cited packages

References

E. Anderson. The irises of the Gaspé Peninsula. Bulletin of the American Iris Society, 59: 2–5, 1935.

B. T. Bestelmeyer, A. M. Ellison, W. R. Fraser, K. B. Gorman, S. J. Holbrook, C. M. Laney, M. D. Ohman, D. P. C. Peters, F. C. Pillsbury, A. Rassweiler, et al. Analysis of abrupt transitions in ecological systems. Ecosphere, 2(12): art129, 2011. URL http://doi.wiley.com/10.1890/ES11-00216.1 [online; last accessed March 27, 2021].

R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2): 179–188, 1936. URL http://doi.wiley.com/10.1111/j.1469-1809.1936.tb02137.x [online; last accessed July 1, 2020].

D. Gohel and P. Skintzos. Ggiraph: Make ’ggplot2’ graphics interactive. 2022. URL https://CRAN.R-project.org/package=ggiraph. R package version 0.8.2.

K. B. Gorman, K. E. Ruck, T. D. Williams and W. R. Fraser. Advancing the Sea Ice Hypothesis: Trophic Interactions Among Breeding pygoscelis Penguins With Divergent Population Trends Throughout the Western Antarctic Peninsula. Frontiers in Marine Science, 8: 526092, 2021. URL https://www.frontiersin.org/articles/10.3389/fmars.2021.526092/full [online; last accessed September 25, 2021].

K. B. Gorman, S. L. Talbot, S. A. Sonsthagen, G. K. Sage, M. C. Gravely, W. R. Fraser and T. D. Williams. Population genetic structure and gene flow of Adélie penguins (Pygoscelis adeliae) breeding throughout the western Antarctic Peninsula. Antarctic Science, 29(6): 499–510, 2017. URL https://www.cambridge.org/core/product/identifier/S0954102017000293/type/journal_article [online; last accessed March 27, 2021].

K. B. Gorman, T. D. Williams and W. R. Fraser. Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus pygoscelis). PLoS ONE, 9(3): e90081, 2014. URL https://dx.plos.org/10.1371/journal.pone.0090081 [online; last accessed July 1, 2020].

A. Horst, A. Hill and K. Gorman. Palmerpenguins: Palmer archipelago (antarctica) penguin data. 2020. URL https://CRAN.R-project.org/package=palmerpenguins. R package version 0.1.0.

E. Hvitfeldt. Paletteer: Comprehensive collection of color palettes. 2021. URL https://github.com/EmilHvitfeldt/paletteer. R package version 1.3.0.

M. Kuhn and H. Wickham. Recipes: Preprocessing and feature engineering steps for modeling. 2021. URL https://CRAN.R-project.org/package=recipes. R package version 0.1.17.

Palmer Station Antarctica LTER and K. B. Gorman. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009. 2020a. URL https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-pal.219.5 [online; last accessed July 1, 2020].

Palmer Station Antarctica LTER and K. B. Gorman. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarctica) nesting along the Palmer Archipelago near Palmer Station, 2007-2009. 2020b. URL https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-pal.221.6 [online; last accessed July 1, 2020].

Palmer Station Antarctica LTER and K. B. Gorman. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009. 2020c. URL https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-pal.220.5 [online; last accessed July 1, 2020].

T. L. Pedersen. Patchwork: The composer of plots. 2020. URL https://CRAN.R-project.org/package=patchwork. R package version 1.1.1.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825–2830, 2011.

R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2021. URL https://www.R-project.org/.

D. Robinson, A. Hayes and S. Couch. Broom: Convert statistical objects into tidy tibbles. 2022. URL https://CRAN.R-project.org/package=broom. R package version 0.7.11.

B. Schloerke, D. Cook, J. Larmarange, F. Briatte, M. Marbach, E. Thoen, A. Elberg and J. Crowley. GGally: Extension to ggplot2. 2021. URL https://CRAN.R-project.org/package=GGally. R package version 2.1.2.

C. Sievert, C. Parmer, T. Hocking, S. Chamberlain, K. Ram, M. Corvellec and P. Despouy. Plotly: Create interactive web graphics via plotly.js. 2021. URL https://CRAN.R-project.org/package=plotly. R package version 4.10.0.

H. Wickham. Tidy Data. Journal of Statistical Software, 59(10): 2014. URL http://www.jstatsoft.org/v59/i10/ [online; last accessed July 1, 2020].

H. Wickham, M. Averick, J. Bryan, W. Chang, L. D. McGowan, R. François, G. Grolemund, A. Hayes, L. Henry, J. Hester, et al. Welcome to the tidyverse. Journal of Open Source Software, 4(43): 1686, 2019. DOI 10.21105/joss.01686.

H. Wickham, W. Chang, L. Henry, T. L. Pedersen, K. Takahashi, C. Wilke, K. Woo, H. Yutani and D. Dunnington. ggplot2: Create elegant data visualisations using the grammar of graphics. 2021. URL https://CRAN.R-project.org/package=ggplot2. R package version 3.3.5.

G. Yu. Shadowtext: Shadow text grob and layer. 2022. URL https://github.com/GuangchuangYu/shadowtext/. R package version 0.1.1.

H. Zhu. kableExtra: Construct complex table with kable and pipe syntax. 2021. URL https://CRAN.R-project.org/package=kableExtra. R package version 1.3.4.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Feature	iris	penguins
Year(s) collected	1935	2007 - 2009
Dimensions (col x row)	5 x 150	8 x 344
Documentation	minimal	complete metadata
Variable classes	double (4), factor (1)	double (2), int (3), factor (3)
Missing values?	no (n = 0; 0.0%)	yes (n = 19; 0.7%)

iris sample size (by species)		penguins sample size (by species and sex)
Iris species	Sample size	Penguin species	Female	Male	NA
setosa	50	Adélie	73	73	6
versicolor	50	Chinstrap	34	34	0
virginica	50	Gentoo	58	61	5

Penguins cluster assignments				Iris cluster assignments
Cluster	Adélie	Chinstrap	Gentoo	Cluster	setosa	versicolor	virginica
1	0	9	116	1	0	2	46
2	4	54	6	2	0	48	4
3	147	5	1	3	50	0	0

Feature	penguins_raw
Year(s) collected	2007 - 2009
Dimensions (col x row)	17 x 344
Documentation	complete metadata
Variable classes	character (9), Date (1), numeric (7)
Missing values?	yes (n = 336; 5.7%)

Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises

Authors

Affiliations

Published

Received

DOI

Volume

Pages

Introduction

Data source

The palmerpenguins R package

Comparing `iris` and `penguins`

Data structure and sample size

Continuous quantitative variables

Principal component analysis

K-means clustering

Conclusion

Penguins data processing

Summary of the `penguins_raw` dataset

palmerpenguins for other programming languages

Acknowledgements

Supplementary materials

CRAN packages used

CRAN Task Views implied by cited packages

Footnotes

References

Reuse

Citation

Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises

Authors

Affiliations

Published

Received

DOI

Volume

Pages

Introduction

Data source

The palmerpenguins R package

Comparing iris and penguins

Data structure and sample size

Continuous quantitative variables

Principal component analysis

K-means clustering

Conclusion

Penguins data processing

Summary of the penguins_raw dataset

palmerpenguins for other programming languages

Acknowledgements

Supplementary materials

CRAN packages used

CRAN Task Views implied by cited packages

Footnotes

References

Reuse

Citation

Comparing `iris` and `penguins`

Summary of the `penguins_raw` dataset