Ecological datasets in Python

Datasets included in library distributions are very practical for explaining concepts and for tutorials, as of course no extra download is required. A while ago, I posted a list of biodiversity datasets that come with R-core. Here I continue along the same line and list datasets coming with popular Python libraries.

Scikit-learn

Scikit-learn comes with the ubiquitous iris data set which includes petal and sepal length for 3 species of irises.

Another interesting dataset: the distribution of 2 species of South-American mammals with modelling examples.

There are many visualization examples of the iris and other datasets on this page.

Seaborn

The plotting library Seaborn comes with a variety of datasets among which a dataset on penguin sizes that includes plot examples. Here’s an example of applying pricipal component analysis to this dataset.

TensorFlow

TensorFlow provides a tutorial on how to use the datasets shipped with this library. As well as example Notebooks. People interested in biodiversity should definitely check the Fine tuning models for plant disease detection example, which is based on data from i_naturalist. Other datasets for biodiversity:

Not directly related to biodiversity, but interesting nonetheless: the EuroSat dataset of satellite images.

Leave a Reply

Your email address will not be published. Required fields are marked *