Create occurrence cubes for virtual species
Apart from simulating occurrence cubes from scratch, gcube can also be used to create cubes for virtual species generated using bioclimatic variables. This tutorial demonstrates how to generate a virtual species, apply a geographical limit, sample occurrences, and convert them into an occurrence cube.
Create virtual species
Virtual species can be created using the virtualspecies R package (Leroy et al., 2015). Below is a basic example of how to create a virtual species. For additional options and a more comprehensive guide, refer to the package documentation and the tutorial available here.
# Load packageslibrary(gcube)
library(geodata) # get bioclimatic datalibrary(virtualspecies) # generate virtual species
library(sf) # work with spatial objectslibrary(terra) # work with raster datalibrary(ggplot2) # data visualisation
Step 1: Download WorldClim Data
We first obtain WorldClim bioclimatic data at a 10-minute resolution.
# Download bioclimatic dataworldclim <- worldclim_global(var = "bio", res = 10, path = tempdir())
Step 2: Define the region of interest
We define the extent for the Western Palearctic and crop the climate data.
# Western Palearcticwp <- terra::ext(-15, 65, 30, 75)
# Crop climate dataworldclim <- terra::crop(worldclim, wp)
Step 3: Select relevant bioclimatic variables
We select a subset of bioclimatic variables: bio2, bio5, bio6, bio13, bio14, and bio15.
# Specify variablesbio_vars <- c("bio2", "bio5", "bio6", "bio13", "bio14", "bio15")
# Select variablesbio_string <- paste0("bio_", sub("bio", "", bio_vars), collapse = "|")bio_vars_selected <- names(worldclim)[grepl(bio_string, names(worldclim))]
# Subset climate dataworldclim_vars <- worldclim[[bio_vars_selected]]
Step 4: Generate a virtual species
We create a random virtual species based on the selected bioclimatic variables.
set.seed(123)
# Generate random virtual speciesvirtual_species <- generateRandomSp(worldclim_vars)#> Reading raster values. If it fails for very large rasters, use arguments 'sample.points = TRUE' and define a number of points to sample with 'nb.point'.#> - Perfoming the pca#> - Defining the response of the species along PCA axes#> - Calculating suitability values#> The final environmental suitability was rescaled between 0 and 1.#> To disable, set argument rescale = FALSE#> - Converting into Presence - Absence#> --- Generating a random value of beta for the logistic conversion#> --- Determing species.prevalence automatically according to alpha and beta#> Logistic conversion finished:#>#> - beta = 0.194194194194194#> - alpha = -0.1#> - species prevalence =0.770632849948426

Step 5: Apply a geographical limit
We limit the species distribution to the United Kingdom and Ireland.
# Limit distributionvirtual_distribution <- limitDistribution( x = virtual_species, geographical.limit = "country", area = c("United Kingdom", "Ireland"))#> Warning: [rasterize] unexpected additional argument(s): silent

Step 6: Sample occurrences
We sample 150 occurrences from the restricted distribution with a detection probability of 0.8 using the virtualspecies::sampleOccurrences()
function.
Detection probability and sampling bias can be introduced similar to sample_observations()
.
?virtualspecies::sampleOccurrences
# Sample from virtual species distributionvirtual_sample <- sampleOccurrences( virtual_distribution, n = 150, detection.probability = 0.8)

Grid designation with gcube
Now we can create an occurrence cube for this virtual species
The most important step, is to make the virtualspecies output compatible with gcube input.
This is done using the virtualsample_to_sf()
function.
For grid designation, see also the tutorial 3. Specifying the grid designation process.
Step 1: Convert sampled points to sf format
We transform the sampled points from a virtualspecies object so that it can be used by gcube with the virtualsample_to_sf()
function.
?virtualsample_to_sf
# Transform virtual samples to sf objectdetections_df_raw <- virtualsample_to_sf(virtual_sample)detections_df_raw#> Simple feature collection with 150 features and 2 fields#> Geometry type: POINT#> Dimension: XY#> Bounding box: xmin: -9.25 ymin: 50.58333 xmax: 1.416667 ymax: 58.25#> Geodetic CRS: WGS 84#> First 10 features:#> id observed geometry#> 1 62966 TRUE POINT (-0.75 53.08333)#> 2 60516 TRUE POINT (-9.083333 53.91667)#> 3 63937 TRUE POINT (1.083333 52.75)#> 4 59607 TRUE POINT (-0.5833333 54.25)#> 5 66802 TRUE POINT (-1.416667 51.75)#> 6 65795 TRUE POINT (-9.25 52.08333)#> 7 60087 TRUE POINT (-0.5833333 54.08333)#> 8 55755 TRUE POINT (-2.583333 55.58333)#> 9 54310 TRUE POINT (-3.416667 56.08333)#> 10 56230 TRUE POINT (-3.416667 55.41667)
Optional: Extract values from a raster
You can also extract raster values when transforming the sample data. For instance, extracting suitability or occurrence probability can be helpful for downstream analysis.
# Rename raster layers for clear outputnames(virtual_species$suitab.raster) <- "suitability"names(virtual_species$probability.of.occurrence) <- "occ_prob"
# Transform virtual samples and extract valuesvirtualsample_to_sf( virtual_sample, raster_lyr = c(virtual_species$suitab.raster, virtual_species$probability.of.occurrence))#> Simple feature collection with 150 features and 4 fields#> Geometry type: POINT#> Dimension: XY#> Bounding box: xmin: -9.25 ymin: 50.58333 xmax: 1.416667 ymax: 58.25#> Geodetic CRS: WGS 84#> First 10 features:#> id observed suitability occ_prob geometry#> 1 62966 TRUE 0.608657235 0.9843980 POINT (-0.75 53.08333)#> 2 60516 TRUE 0.081922322 0.2455073 POINT (-9.083333 53.91667)#> 3 63937 TRUE 0.481108846 0.9463000 POINT (1.083333 52.75)#> 4 59607 TRUE 0.375077019 0.8592202 POINT (-0.5833333 54.25)#> 5 66802 TRUE 0.516553006 0.9617124 POINT (-1.416667 51.75)#> 6 65795 TRUE 0.019556962 0.1485053 POINT (-9.25 52.08333)#> 7 60087 TRUE 0.354688757 0.8327085 POINT (-0.5833333 54.08333)#> 8 55755 TRUE 0.294534380 0.7317269 POINT (-2.583333 55.58333)#> 9 54310 TRUE 0.248244943 0.6319305 POINT (-3.416667 56.08333)#> 10 56230 TRUE 0.004863729 0.1308681 POINT (-3.416667 55.41667)
We transform the detected points into an sf object to the LAEA Europe CRS (EPSG:3035).
# Transform CRSdetections_df_transformed <- st_transform(detections_df_raw, crs = 3035)
Step 2: Add coordinate uncertainty
See also the tutorial 2. Simulating the detection process.
We only keep the detected occurrences using the filter_observations()
function.
detections_df <- filter_observations(detections_df_transformed)
Optionally, we can add coordinate uncertainty using the add_coordinate_uncertainty()
function.
We add 25 m uncertainty for each observation.
# Add coordinate uncertaintyobservations_df <- add_coordinate_uncertainty( observations = detections_df, coords_uncertainty_meters = 25)
Step 3: Specify reference grid
We also need a grid. Each observation will be designated to a grid cell. You can provide your own grid (e.g. EEA reference grid for Europe). For this example we create a simple grid around the sampled points.
# Create reference gridbuffered_observations <- st_buffer(observations_df, 25)cube_grid <- st_make_grid( buffered_observations, n = c(20, 20), square = TRUE) %>% st_sf()
The grid looks like this.
ggplot() + geom_sf(data = observations_df) + geom_sf(data = cube_grid, alpha = 0) + coord_sf(datum = 3035) + theme_minimal()

Step 4: Perform grid designation
We perform grid designation both in aggregated version and not for visualisation purposes.
# Perform grid designationoccurrence_cube_df <- grid_designation( observations_df, cube_grid, seed = 123)#> Warning in sample_from_uniform_circle(observations = observations, missing_uncertainty = missing_uncertainty, : No column `time_point` present!#> Assuming only a single time point.
# Get sampled points from uncertainty circlesampled_points <- grid_designation( observations_df, cube_grid, seed = 123, aggregate = FALSE)#> Warning in sample_from_uniform_circle(observations = observations, missing_uncertainty = missing_uncertainty, : No column `time_point` present!#> Assuming only a single time point.
Lets visualise were the samples were taken.
Note that no distinction is made between zeroes and NA
values.
Every absence gets a zero value.
ggplot() + geom_sf(data = occurrence_cube_df, alpha = 0) + geom_sf_text(data = occurrence_cube_df, aes(label = n)) + geom_sf(data = sampled_points, colour = "blue") + coord_sf(datum = 3035) + theme_minimal()
