We are extracting Copernicus Marine data from across National Marine Sanctuaries, borrowing polygons from the noaa-onms/climate-dashboard (see Github code).
Code
if (!file.exists(sanctuaries_geo)) {# source: https://github.com/noaa-onms/climate-dashboard/blob/main/data/sanctuaries.rds sanctuaries_rds <-here("../climate-dashboard/data/sanctuaries.rds")readRDS(sanctuaries_rds) |>filter(nms !="TBNMS") |># exclude Great Lakeswrite_sf(sanctuaries_geo, delete_dsn = T)}sanctuaries <-read_sf(sanctuaries_geo)sanctuaries |>st_drop_geometry() |>datatable()
File data/copernicus/FKNMS/cmems_obs-oc_glo_bgc-pp_my_l4-multi-4km_P1M_multi-vars_83.23W-79.98W_24.23N-25.73N_2026-02-01-2026-03-01.nc (NC_FORMAT_NETCDF4):
3 variables (excluding dimension variables):
float PP[longitude,latitude,time] (Contiguous storage)
_FillValue: -999
valid_max: 10000
units: mg m-2 day-1
valid_min: 0
standard_name: primary_productivity_of_biomass_expressed_as_carbon
long_name: Primary Productivity - Mean of the binned pixels
coordinates: time lat lon
missing_value: -999
short PP_uncertainty[longitude,latitude,time] (Contiguous storage)
_FillValue: -32768
valid_max: 32767
units: %
valid_min: 0
long_name: Primary Productivity - Uncertainty estimation
coordinates: time lat lon
scale_factor: 0.00999999977648258
missing_value: -32768
byte flags[longitude,latitude,time] (Contiguous storage)
standard_name: status_flag
valid_max: 1
valid_min: 0
long_name: Flags
coordinates: time lat lon
3 dimensions:
time Size:2
standard_name: time
unit_long: Days Since 1900-01-01
axis: T
long_name: Time
units: days since 1900-01-01
calendar: Gregorian
latitude Size:37
standard_name: latitude
axis: Y
units: degrees_north
long_name: latitude
longitude Size:79
standard_name: longitude
axis: X
units: degrees_east
long_name: longitude
9 global attributes:
comment: average
title: cmems_obs-oc_glo_bgc-pp_my_l4-multi-4km_P1M
source: surface observation
institution: ACRI
history: Created using software developed at ACRI-ST
Conventions: CF-1.8, ACDD-1.3
contact: servicedesk.cmems@acri-st.fr
references: http://www.globcolour.info GlobColour has been originally funded by ESA with data from ESA, NASA, NOAA and GeoEye. This version has received funding from the European Community s Seventh Framework Programme ([FP7/2007-2013]) under grant agreement n. 282723 [OSS2015 project].
copernicusmarine_version: 2.2.5
Using terra::rast() to read the netcdf file, show first layer:
Code
r <-rast(nc)
Warning: [rast] unknown calendar (assuming standard): Gregorian
The *_uncertainty variables in the GlobColour plankton dataset exhibit pronounced inflections toward high values across all sanctuaries. These patterns are methodological, not ecological, and reflect the uncertainty estimation framework described below.
What the uncertainty values represent. Uncertainty is expressed as percent error bars (relative uncertainty) on each PFT chlorophyll-a concentration estimate. Values are stored as 2-byte signed integers with a scale factor of 0.01, producing a maximum representable value of 327.67 %(Copernicus Marine Environment Monitoring Service 2025b).
How uncertainty is computed. The per-pixel uncertainty is derived via error propagation combining three independent sources (Xi et al. 2021, Eq. 7):
Remote-sensing reflectance (\(R_{rs}\)) input uncertainty — estimated via Monte Carlo perturbation of the \(R_{rs}\) spectrum, propagated through the EOF-regression retrieval algorithm using a look-up table (LUT) approach;
Model parameter uncertainty — arising from the regression coefficients relating EOF scores to PFT concentrations;
Sea surface temperature (SST) uncertainty — SST is an auxiliary predictor in the EOF-SST hybrid retrieval, and its uncertainty is propagated into the final PFT estimate.
Algorithm origin and sensor dependence. The EOF-SST hybrid algorithm was originally developed for OLCI (Sentinel-3), which has 21 spectral bands, and subsequently extended to the multi-sensor merged GlobColour product (Xi et al. 2020). Because the merged product combines sensors with different spectral configurations, uncertainty is generally higher for sensors with fewer bands (e.g., MODIS Aqua has 9 ocean-colour bands vs. OLCI’s 21).
Sensor transitions drive the inflections. The multi-sensor record combines: SeaWiFS (1997–2010), MODIS Aqua (2002–present), MERIS (2002–2012), VIIRS NPP (2012–present), and OLCI-S3A (2016–present) (Garnesson et al. 2019; Maritorena et al. 2010). Each sensor transition changes the input \(R_{rs}\) characteristics and the number of available spectral bands, producing step-changes in the propagated uncertainty even when the underlying ocean state is stable.
Low concentrations amplify relative uncertainty. PFTs that occur at very low concentrations in temperate and subpolar waters — particularly prokaryotes (PROKAR) and Prochlorococcus (PROCHLO) — yield high percent uncertainties because a small absolute error translates to a large relative error.
Reprocessing discontinuities. The November 2023 reprocessing (NASA R2022, MERIS 4th reprocessing, OLCI baseline collection 3.01) may introduce discontinuities in both concentration and uncertainty time series (Copernicus Marine Environment Monitoring Service 2025b).
Weakest retrieval skill. Prokaryotes and Prochlorococcus have the lowest retrieval skill among all PFTs (\(R^2\) = 0.42–0.51 against in-situ validation) (Copernicus Marine Environment Monitoring Service 2025a; Xi et al. 2021), meaning their uncertainty estimates are both the largest and least well-constrained.
VIIRS-SNPP post-2017 exclusions. Data from VIIRS-SNPP after 2017 have been excluded from some products due to identified spurious trends in the merged record (Xi et al. 2025).
Using PFT Data for Multivariate Ordination
Despite the uncertainty caveats above, the PFT concentration data are well-validated and suitable for multivariate analyses. Below are recommended approaches for detecting ecosystem-level shifts across sanctuaries.
Use concentration variables only (exclude _uncertainty). The PFT chlorophyll-a concentrations themselves are the validated retrieval products; the _uncertainty fields are informational metadata describing retrieval confidence. They should not be included as ordination input variables alongside the concentrations.
Log-transform concentrations. PFT concentrations span orders of magnitude. Log-transformation stabilizes variance and matches the scale on which the retrieval algorithm was trained and validated (Xi et al. 2021). Use \(\log_{10}(\text{Chl-}a + \epsilon)\) where \(\epsilon\) is a small constant to handle zeros.
Exclude low-skill PFTs or aggregate them. Consider dropping PROKAR and PROCHLO from ordination (weakest \(R^2 < 0.55\)), or combine them into a single “prokaryote” group to improve signal-to-noise ratio.
PCA / RDA on a sanctuary \(\times\) time matrix. Structure the data as: rows = sanctuary-month combinations, columns = PFT variables (CHL, DIATO, DINO, GREEN, HAPTO, MICRO, NANO, PICO) plus physical variables (thetao, so, mlotst, bottomT, PP). Redundancy analysis (RDA) can then partition biological community variation explained by environmental drivers.
Use uncertainty as weights. In weighted ordination methods (e.g., weighted PCA), use \(1 / \text{uncertainty}\) as observation weights to downweight unreliable observations rather than discarding them entirely.
Temporal anomaly approach. Compute anomalies relative to a monthly climatology per sanctuary to remove the dominant seasonal signal, then ordinate on the anomaly matrix to detect regime shifts and long-term trends.
Non-metric MDS (NMDS). NMDS is robust to the non-normal distributions common in ocean-colour data. Use Bray-Curtis dissimilarity on the PFT concentration matrix (after log-transformation) to preserve compositional distances.
Rolling window analysis. Apply ordination in sliding temporal windows (e.g., 5-year windows advancing by 1 year) to track community composition shifts over time, overlaid with environmental drivers (SST, salinity, mixed layer depth).
———. 2025b. “Ocean Colour Thematic Assembly Centre Product User Manual.” CMEMS-OC-PUM, Issue 6.0. CMEMS.
Garnesson, Philippe, Antoine Mangin, Odile Fanton d’Andon, Julien Demaria, and Marine Bretagnon. 2019. “The CMEMSGlobColour Chlorophyll a Product Based on Satellite Observation: Multi-Sensor Merging and Flagging Strategies.”Ocean Science 15 (3): 819–30. https://doi.org/10.5194/os-15-819-2019.
Maritorena, Stéphane, Odile Hembise Fanton d’Andon, Antoine Mangin, and David A. Siegel. 2010. “Merged Satellite Ocean Color Data Products Using a Bio-Optical Model: Characteristics, Benefits and Issues.”Remote Sensing of Environment 114 (8): 1791–1804. https://doi.org/10.1016/j.rse.2010.04.002.
Xi, Hongyan, Marine Bretagnon, Ehsan Mehdipour, Julien Demaria, Antoine Mangin, and Astrid Bracher. 2025. “Consistent Long-Term Observations of Surface Phytoplankton Functional Types from Space.”State of the Planet 6-osr9: 7. https://doi.org/10.5194/sp-6-osr9-7-2025.
Xi, Hongyan, Svetlana N. Losa, Antoine Mangin, Philippe Garnesson, Marine Bretagnon, Julien Demaria, Mariana A. Soppa, Odile Hembise Fanton d’Andon, and Astrid Bracher. 2021. “Global Chlorophyll a Concentrations of Phytoplankton Functional Types with Detailed Uncertainty Assessment Using Multisensor Ocean Color and Sea Surface Temperature Satellite Products.”Journal of Geophysical Research: Oceans 126 (5): e2020JC017127. https://doi.org/10.1029/2020JC017127.
Xi, Hongyan, Svetlana N. Losa, Antoine Mangin, Mariana A. Soppa, Philippe Garnesson, Julien Demaria, Yangyang Liu, Odile Hembise Fanton d’Andon, and Astrid Bracher. 2020. “Global Retrieval of Phytoplankton Functional Types Based on Empirical Orthogonal Functions Using CMEMSGlobColour Merged Products and Further Extension to OLCI Data.”Remote Sensing of Environment 240: 111704. https://doi.org/10.1016/j.rse.2020.111704.
Source Code
---title: "Copernicus for Sanctuaries"bibliography: copernicus.bibformat: html: toc: true embed-resources: true code-fold: true code-tools: trueexecute: echo: true # Shows all code chunks in the output message: true # Shows all messages generated by the code (e.g., from library() calls) warning: true # Shows all warnings generated by the code error: true # Allows errors to stop the render process and displays them as a messageeditor_options: chunk_output_type: console---```{r}#| label: setup#| warning: falselibrarian::shelf( dplyr, DT, fs, ggplot2, glue, here, jsonlite, leaflet, listviewer, mapview, ncdf4, purrr, readr, reticulate, sf, stringr, terra, tidyr,quiet = T)options(readr.show_col_types = F)sanctuaries_geo <-here("data/sanctuaries.geojson")# copernicusmarine username and passworduser <-"bbest1"pass_txt <-"~/My Drive/private/data.marine.copernicus.eu_bbest1-password.txt"dir_cm <-here("data/copernicus")results_csv <-here("data/copernicus.csv")```## SanctuariesWe are extracting Copernicus Marine data from across National Marine Sanctuaries, borrowing polygons from the [noaa-onms/climate-dashboard](https://github.com/noaa-onms/climate-dashboard) (see Github [code](https://noaa-onms.github.io/climate-dashboard/)).```{r}#| label: sanctuariesif (!file.exists(sanctuaries_geo)) {# source: https://github.com/noaa-onms/climate-dashboard/blob/main/data/sanctuaries.rds sanctuaries_rds <-here("../climate-dashboard/data/sanctuaries.rds")readRDS(sanctuaries_rds) |>filter(nms !="TBNMS") |># exclude Great Lakeswrite_sf(sanctuaries_geo, delete_dsn = T)}sanctuaries <-read_sf(sanctuaries_geo)sanctuaries |>st_drop_geometry() |>datatable()mapView(sanctuaries)```## Setup for copernicusmarine in R- [How to sign up for Copernicus Marine Service? | Copernicus Marine Help Center](https://help.marine.copernicus.eu/en/articles/4220332-how-to-sign-up-for-copernicus-marine-service)- [How to download data via the Copernicus Marine Toolbox in R? \| Copernicus Marine Help Center](https://help.marine.copernicus.eu/en/articles/8638253-how-to-download-data-via-the-copernicus-marine-toolbox-in-r#h_c480a903fd): R `reticulate` to Python `copernicusmarine````{r}#| label: cmt# do once: create virtual enviroment and install copernicusmarine Python module# virtualenv_create(envname = "CopernicusMarine")# virtualenv_install(envname = "CopernicusMarine", packages = c("copernicusmarine"))# TODO: check for CopernicusMarine env with copernicusmarine Python module# use virtualenv and reticulate::import copernicusmarine Python moduleuse_virtualenv(virtualenv ="CopernicusMarine", required =TRUE)cmt <-import("copernicusmarine")# loginpass <-readLines(pass_txt)stopifnot( cmt$login(user, pass, force_overwrite = T)) # py_help(cmt$login)# writes to ~/.copernicusmarine/.copernicusmarine-credentials``````{r}#| label: helpersget_dataset_description <-function(dataset_id) { cmt$describe(dataset_id = dataset_id,disable_progress_bar = T)$json() |>fromJSON()}get_dataset_times <-function(dataset_description) {# dataset_description = ds_times services <-pluck( dataset_description, "products", "datasets", 1,"versions", 1, "parts", 1, "services", 1) service_names <-pluck(services, "service_name") i_ts <-which(service_names =="arco-time-series") coordinates <-pluck(services, "variables", i_ts, "coordinates") coord_ids <-pluck(coordinates, 1, "coordinate_id") i_time <-which(coord_ids =="time") time_vals <-pluck(coordinates, 1, "values", i_time) |>as.numeric() %>% {./1000} time_min <-min(time_vals) |>as.POSIXct(origin ="1970-01-01", tz ="UTC") time_max <-max(time_vals) |>as.POSIXct(origin ="1970-01-01", tz ="UTC")stopifnot("get_dataset_times(): time_min is NA"=!is.na(time_min),"get_dataset_times(): time_max is NA"=!is.na(time_max))list(min = time_min, max = time_max)}get_last_date_in_csv <-function(results_csv) {if (!file.exists(results_csv))return(tibble(nms =character(), dataset =character(),last_time =as.Date(character())))read_csv(results_csv) |>mutate(time =as.Date(time)) |>group_by(nms, dataset) |>summarize(last_time =max(time, na.rm =TRUE), .groups ="drop")}```## Extract Global Ocean Physics ReanalysisProduct > Dataset > Variable- Product: [Global Ocean Physics Reanalysis | Copernicus Marine Service](https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description)\ Product name: Global Ocean Physics Reanalysis\ Product identifier: `GLOBAL_MULTIYEAR_PHY_001_030`- Dataset: `cmems_mod_glo_phy_my_0.083deg_P1M-m` (**monthly**)[Subset Form | Copernicus Marine Service](https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/download?dataset=cmems_mod_glo_phy_my_0.083deg_P1M-m_202311) * Dates: `01/01/1993, 00:00`→`06/01/2021, 00:00` * Depths: `0.5 m`→`5727.9 m`<!-- - Dataset: `cmems_mod_glo_phy_myint_0.083deg_P1M-m` (**Interim, monthly**)\ --><!-- [Subset Form | Copernicus Marine Service](https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/download?dataset=cmems_mod_glo_phy_myint_0.083deg_P1M-m_202311) --><!-- same as above except: --><!-- * Dates: `07/01/2021, 00:00`→`11/01/2024, 00:00` -->Variables, keep:- `thetao`: Sea water potential temperature [°C]- `bottomT`: Sea water potential temperature at sea floor [°C]- `so`: Sea water salinity [/ 103]- `mlotst`: Ocean mixed layer thickness defined by sigma theta [m]Variables, skip:- `siconc`: Sea ice area fraction- `sithick`: Sea ice thickness [m]- `usi`: Eastward sea ice velocity [m/s]- `vsi`: Northward sea ice velocity [m/s]- `uo`: Eastward sea water velocity [m/s]- `vo`: Northward sea water velocity [m/s]- `zos`: Sea surface height above geoid [m]## Extract Global Ocean Colour- [Global Ocean Colour (Copernicus-GlobColour), Bio-Geo-Chemical, L4 (monthly and interpolated) from Satellite Observations (1997-ongoing) | Copernicus Marine Service](https://data.marine.copernicus.eu/product/OCEANCOLOUR_GLO_BGC_L4_MY_009_104/services) - [Plankton, multi-sensor (4 km), monthly](https://data.marine.copernicus.eu/product/OCEANCOLOUR_GLO_BGC_L4_MY_009_104/download?dataset=cmems_obs-oc_glo_bgc-plankton_my_l4-multi-4km_P1M_202411) - [Primary production, multi-sensor (4 km), monthly](https://data.marine.copernicus.eu/product/OCEANCOLOUR_GLO_BGC_L4_MY_009_104/download?dataset=cmems_obs-oc_glo_bgc-pp_my_l4-multi-4km_P1M_202311)### Setup dataset parameters```{r}#| label: datasetsdatasets <-list(# Global Ocean Physics Reanalysis - monthly# https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/download?dataset=cmems_mod_glo_phy_my_0.083deg_P1M-m"cmems_mod_glo_phy_my_0.083deg_P1M-m"=list(vars =list("thetao", "bottomT", "so", "mlotst"),depth_rng =c(0, 0.5)),# Plankton, multi-sensor (4 km), monthly"cmems_obs-oc_glo_bgc-plankton_my_l4-multi-4km_P1M"=list(vars =list("CHL","CHL_uncertainty","DIATO","DIATO_uncertainty","DINO","DINO_uncertainty","GREEN","GREEN_uncertainty","HAPTO","HAPTO_uncertainty","MICRO","MICRO_uncertainty","NANO","NANO_uncertainty","PICO","PICO_uncertainty","PROCHLO","PROCHLO_uncertainty","PROKAR","PROKAR_uncertainty","flags"),depth_rng =c(0, 0.5)),# Primary production, multi-sensor (4 km), monthly"cmems_obs-oc_glo_bgc-pp_my_l4-multi-4km_P1M"=list(vars =list("PP", "PP_uncertainty", "flags"),depth_rng =c(0, 0.5)))# query Copernicus API to set date_rng (min & max) per datasetfmt_dt <- \(x) format(x, "%Y-%m-%dT00:00:00")for (id innames(datasets)) { # id = "cmems_mod_glo_phy_my_0.083deg_P1M-m" ds_times <-get_dataset_description(id) |>get_dataset_times() datasets[[id]]$date_rng <-c(fmt_dt(ds_times$min), fmt_dt(ds_times$max))message(glue(" {id}: {datasets[[id]]$date_rng[1]} -> {datasets[[id]]$date_rng[2]}"))}# summarize last fetched date per dataset from existing resultslast_dates <-get_last_date_in_csv(results_csv)for (id innames(datasets)) { last <- last_dates |>filter(dataset == id) |>pull(last_time) datasets[[id]]$date_last_fetched <-if (length(last) >0)as.character(min(last)) elseNA}datasets |>toJSON(auto_unbox = T, pretty = T) |>jsonedit() # view in browser```## Dataset Variables```{r}#| label: dataset_varsvars_csv <-here("data/copernicus_variables.csv")get_var_coord_range <-function(coord_df, coord_name) {if (!is.data.frame(coord_df)) return(c(NA_real_, NA_real_)) i <-which(coord_df$coordinate_id == coord_name)if (length(i) ==0) return(c(NA_real_, NA_real_)) vals <-as.numeric(coord_df$values[[i]])c(min(vals, na.rm = T), max(vals, na.rm = T))}get_dataset_vars_info <-function(dataset_id) { desc <-get_dataset_description(dataset_id) product <-pluck(desc, "products") |>as_tibble() |>slice(1) services <-pluck( desc, "products", "datasets", 1,"versions", 1, "parts", 1, "services", 1) i_ts <-which(services$service_name =="arco-time-series") vars <-pluck(services, "variables", i_ts) times <-get_dataset_times(desc)tibble(dataset_id = dataset_id,product_id = product$product_id,product_title = product$title,time_min =as.character(times$min),time_max =as.character(times$max),variable = vars$short_name,standard_name = vars$standard_name,units = vars$units,depth_min =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "depth")[1]),depth_max =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "depth")[2]),longitude_min =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "longitude")[1]),longitude_max =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "longitude")[2]),latitude_min =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "latitude")[1]),latitude_max =map_dbl(vars$coordinates, \(c) get_var_coord_range(c, "latitude")[2]) )}vars_info <-map_dfr(names(datasets), get_dataset_vars_info)write_csv(vars_info, vars_csv)datatable(vars_info, options =list(scrollX =TRUE))```### Iterate over sanctuaries & datasets to subset```{r}#| label: subset#| eval: truelast_dates <-get_last_date_in_csv(results_csv)for (i in1:nrow(sanctuaries)) {# i = 1 d_s <-slice(sanctuaries, i) bb <- d_s |>st_buffer(9.25*1000) |># buffer by a pixel width 1/12° ~ 9.25 km at equatorst_bbox() dir_out <-here(glue("{dir_cm}/{d_s$nms}"))dir.create(dir_out, showWarnings = F, recursive = T)message(glue("{i}/{nrow(sanctuaries)}: {d_s$sanctuary} ({d_s$nms}) : beg ~ {format(Sys.time(), '%H:%M:%S')}" ))for (j in1:length(datasets)) {# j = 1 ds_id <-names(datasets)[j] ds <- datasets[[j]]# check what's already processed last <- last_dates |>filter(nms == d_s$nms, dataset == ds_id) |>pull(last_time) avail_end <-as.Date(ds$date_rng[2]) last_msg <-if (length(last) ==1) as.character(last) else"none"message(glue(" {j}/{length(datasets)}: {ds_id}\n"," last retrieved: {last_msg}, available: {ds$date_rng[1]} to {ds$date_rng[2]}"))if (length(last) ==1&& last >= avail_end) {message(glue(" -> up to date, skipping"))next }# start one month after last processed, or from the beginning start_dt <-if (length(last) ==1) {format(seq.Date(last, by ="month", length.out =2)[2],"%Y-%m-%dT00:00:00") } else { ds$date_rng[1] }message(glue(" -> fetching {start_dt} to {ds$date_rng[2]}"))# browser() nc <- cmt$subset(dataset_id = ds_id,variables = ds$vars,minimum_longitude = bb$xmin,maximum_longitude = bb$xmax,minimum_latitude = bb$ymin,maximum_latitude = bb$ymax,start_datetime = start_dt,end_datetime = ds$date_rng[2],minimum_depth = ds$depth_rng[1],maximum_depth = ds$depth_rng[2],output_directory = dir_out,overwrite =TRUE,service ="timeseries",disable_progress_bar =TRUE) nc <-as.character(nc)message(glue(" : end ~ {format(Sys.time(), '%H:%M:%S')}")) }}```## Show rasterGrabbing first netcdf file, show full metadata on variables and attributes:```{r}#| label: show_ncnms ="FKNMS"ply <- sanctuaries |>filter(nms ==!!nms)# nc <- list.files(glue("data/copernicus/{nms}"), full.names = T)[1:6]nc <-dir_info(glue("data/copernicus/{nms}")) |>arrange(desc(modification_time)) |>slice(1) |>pull(path)o <-nc_open(nc)o```Using `terra::rast()` to read the netcdf file, show first layer:```{r}#| label: show_rastr <-rast(nc)ri <-1cat(glue(" index: {i} name: {names(r)[i]} time: {time(r[[i]])} varname: {varnames(r[[i]])} longname: {longnames(r[[i]])}"))# glue("{year}/{var}.tif/{time}_{depth}")```Plot the raster with the sanctuary:```{r}#| label: plot_rast# plot(r[[i]])r_1 <- r[[i]] |>project(leaflet:::epsg3857, method ="bilinear")# plot(r_1)plet(# r[[i]], r_1,# main = glue("{longnames(r[[i]])}\n{names(r)[i]}\n{time(r[[i]])}"),main =glue("{longnames(r[[i]])}\nthetao_depth=0.5\n{time(r[[i]])}"),tiles ="Esri.OceanBasemap",fill_range = T) |>addPolygons(data = ply,fillOpacity =0.2 )```## Summarize raster to sanctuary```{r}#| label: extract#| eval: trueextract_tbl <-function(r, p, stat ="mean") { terra::extract(r, p, fun = stat, na.rm = T) |>pivot_longer(!ID, names_to ="var_it") |>mutate(var =str_replace(var_it, "_[0-9]+$", ""),stat =!!stat,time =time(r) ) |>select(var, time, stat, value)}last_dates <-get_last_date_in_csv(results_csv)ncs <-dir_info(dir_cm, regexp ="\\.nc$", recurse =TRUE) |>pull(path)for (i in1:length(ncs)) {# i = 1 nc <- ncs[i] nms <-str_replace(nc, ".*/copernicus/([A-z-]+)/(.+)_multi-vars.*", "\\1") ds_id <-str_replace(nc, ".*/copernicus/([A-z-]+)/(.+)_multi-vars.*", "\\2")# parse end date from filename to check if already extracted nc_end <-str_extract(basename(nc), "[0-9]{4}-[0-9]{2}-[0-9]{2}\\.nc$") |>str_remove("\\.nc$") |>as.Date() last <- last_dates |>filter(nms ==!!nms, dataset ==!!ds_id) |>pull(last_time)if (length(last) ==1&&!is.na(nc_end) && nc_end <= last) {message(glue("{i}/{length(ncs)}: {nms}/{ds_id} already extracted through {last}, skipping"))next }message(glue("{i}/{length(ncs)}: nms:{nms}, ds:{ds_id}")) r <-rast(nc)# plet(r[[1]]) p <- sanctuaries |>filter(nms ==!!nms) d <-bind_rows(extract_tbl(r, p, "mean"),extract_tbl(r, p, "sd") ) |>mutate(nms =!!nms,dataset =!!ds_id ) |>relocate(nms, dataset)if (file.exists(results_csv)) {read_csv(results_csv) |>bind_rows(d) |>group_by(nms, dataset, var, time, stat) |>summarize(value =last(value),.groups ="drop" ) |>arrange(nms, dataset, var, time, stat) |>write_csv(results_csv) } else {write_csv(d, results_csv) }}``````{r}#| label: show_tsresults_url <- results_csv |>str_replace(here(),"https://github.com/noaa-onms/eco-indicators/blob/main" )# TODO: inspect how flags might be distributed across the dataset# d <- read_csv(results_csv)# d |># filter(var == "flags") |> # pull(value) |> # table()# 0 0.00125156445556946 0.00180831826401446 # 12204 678 678 # 0.00196850393700787 0.00217391304347826 0.00784313725490196 # 678 678 678 # 0.0116279069767442 0.035377456883861 0.0425050794246722 # 678 678 678 # 0.0443678254708057 0.0465998661296397 0.0883001161535571 # 678 678 678 # 0.107517010378544 # 678# show tableread_csv(results_csv)```table of results: [`r basename(results_url)`](`r results_url`)## Generate timeseries plot per sanctuary ```{r}#| label: plot_ts#| eval: trued <-read_csv(results_csv)# table(d$var)# bottomT mlotst so_depth=0.49402499 thetao_depth=0.49402499# 12288 12288 12288 12288# bottomT CHL CHL_uncertainty DIATO# 11820 10170 10170 10170# DIATO_uncertainty DINO DINO_uncertainty flags# 10170 10170 10170 10170# GREEN GREEN_uncertainty HAPTO HAPTO_uncertainty# 10170 10170 10170 10170# MICRO MICRO_uncertainty mlotst NANO# 10170 10170 11820 10170# NANO_uncertainty PICO PICO_uncertainty PROCHLO# 10170 10170 10170 10170# PROCHLO_uncertainty PROKAR PROKAR_uncertainty so_depth=0.49402499# 10170 10170 10170 11820# thetao_depth=0.49402499# 11820# table(d$dataset, useNA = "ifany")# cmems_mod_glo_phy_my_0.083deg_P1M-m# 47280# cmems_obs-oc_glo_bgc-plankton_my_l4-multi-4km_P1M# 213570# cmems_obs-oc_glo_bgc-pp_my_l4-multi-4km_P1M# 30510plot_var_facets <-function(nms, data) { png <-glue("figures/copernicus/{nms}.png") data |>filter( nms ==!!nms ) |>mutate(var =str_replace(var, "_depth=0.49402499", "_0.5m") ) |>pivot_wider(names_from = stat,values_from = value ) |>ggplot(aes(x = time, y = mean, color = var)) +facet_wrap(~var) +geom_line() +geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, fill = var),color =NA,alpha =0.2 ) +theme(legend.position ="none") +facet_wrap(~var, scales ="free_y") +labs(title = nms,x ="Time",y ="Value" )ggsave(png, width =8, height =6)}sapply(sanctuaries$nms, plot_var_facets, data = d)```## Show timeseries per sanctuary::: {.panel-tabset}```{r}#| results: asisfor (nms in sanctuaries$nms) {cat(glue::glue('### {nms}' ))}```<!--PROMPT to Claude CODE 2026-02-09:There are some strange *_uncertainty measures from the [Global Ocean Colour (Copernicus-GlobColour), Bio-Geo-Chemical, L4 (monthly and interpolated) from Satellite Observations (1997-ongoing) | Copernicus Marine Service](https://data.marine.copernicus.eu/product/OCEANCOLOUR_GLO_BGC_L4_MY_009_104/services) for all the sanctuaries (only showing CBNMS) showing up from @eco-indicators/copernicus.qmd inflecting to high valuesReview these documents to explain how these uncertainties are derived and possible explanations for these inflections:- [User Manual](https://documentation.marine.copernicus.eu/PUM/CMEMS-OC-PUM.pdf)- [Quality Information Document](https://documentation.marine.copernicus.eu/QUID/CMEMS-OC-QUID-009-101to104-111-113-116-118.pdf)- [Synthesis Quality Overview](https://documentation.marine.copernicus.eu/SQO/CMEMS-OC-SQO-009-101to104-111-113-116-118.pdf)Inject your findings at the end of this document @eco-indicators/copernicus.qmd, ie after displaying the plots in section "Show timeseries per sanctuary"-->:::## Show timeseries per sanctuary, without uncertainty```{r}#| label: plot_ts_without_uncertainty#| eval: trueplot_var_facets_without_uncertainty <-function(nms, data) { png <-glue("figures/copernicus/{nms}_without_uncertainty.png") data |>filter( nms ==!!nms,!str_ends(var, "_uncertainty") ) |>mutate(var =str_replace(var, "_depth=0.49402499", "_0.5m") ) |>pivot_wider(names_from = stat,values_from = value ) |>ggplot(aes(x = time, y = mean, color = var)) +facet_wrap(~var) +geom_line() +geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, fill = var),color =NA,alpha =0.2 ) +theme(legend.position ="none") +facet_wrap(~var, scales ="free_y") +labs(title = nms,x ="Time",y ="Value" )ggsave(png, width =8, height =6)}sapply(sanctuaries$nms, plot_var_facets_without_uncertainty, data = d)```::: {.panel-tabset}```{r}#| results: asisfor (nms in sanctuaries$nms) {cat(glue::glue('### {nms}' ))}```:::## Notes on PFT Uncertainty VariablesThe `*_uncertainty` variables in the GlobColour plankton dataset exhibit pronouncedinflections toward high values across all sanctuaries. These patterns aremethodological, not ecological, and reflect the uncertainty estimation frameworkdescribed below.**What the uncertainty values represent.** Uncertainty is expressed as **percenterror bars** (relative uncertainty) on each PFT chlorophyll-*a* concentrationestimate. Values are stored as 2-byte signed integers with a scale factor of0.01, producing a maximum representable value of **327.67 %** [@cmems_pum].**How uncertainty is computed.** The per-pixel uncertainty is derived via**error propagation** combining three independent sources [@xi2021, Eq. 7]:1. **Remote-sensing reflectance ($R_{rs}$) input uncertainty** — estimated via Monte Carlo perturbation of the $R_{rs}$ spectrum, propagated through the EOF-regression retrieval algorithm using a look-up table (LUT) approach;2. **Model parameter uncertainty** — arising from the regression coefficients relating EOF scores to PFT concentrations;3. **Sea surface temperature (SST) uncertainty** — SST is an auxiliary predictor in the EOF-SST hybrid retrieval, and its uncertainty is propagated into the final PFT estimate.**Algorithm origin and sensor dependence.** The EOF-SST hybrid algorithm wasoriginally developed for **OLCI** (Sentinel-3), which has 21 spectral bands, andsubsequently extended to the multi-sensor merged GlobColour product [@xi2020].Because the merged product combines sensors with different spectralconfigurations, uncertainty is generally higher for sensors with fewer bands(e.g., MODIS Aqua has 9 ocean-colour bands vs. OLCI's 21).**Sensor transitions drive the inflections.** The multi-sensor record combines:SeaWiFS (1997–2010), MODIS Aqua (2002–present), MERIS (2002–2012), VIIRS NPP(2012–present), and OLCI-S3A (2016–present) [@garnesson2019; @maritorena2010].Each sensor transition changes the input $R_{rs}$ characteristics and thenumber of available spectral bands, producing step-changes in the propagateduncertainty even when the underlying ocean state is stable.**Low concentrations amplify relative uncertainty.** PFTs that occur at very lowconcentrations in temperate and subpolar waters — particularly prokaryotes(PROKAR) and *Prochlorococcus* (PROCHLO) — yield high percent uncertaintiesbecause a small absolute error translates to a large relative error.**Reprocessing discontinuities.** The November 2023 reprocessing (NASA R2022,MERIS 4th reprocessing, OLCI baseline collection 3.01) may introducediscontinuities in both concentration and uncertainty time series [@cmems_pum].**Known sensor drifts.** NASA sensors exhibit documented radiometric drifts thataffect the merged product. The SQO report flags these as a known limitation[@cmems_sqo].**Weakest retrieval skill.** Prokaryotes and *Prochlorococcus* have the lowestretrieval skill among all PFTs ($R^2$ = 0.42–0.51 against in-situ validation)[@cmems_sqo; @xi2021], meaning their uncertainty estimates are both the largestand least well-constrained.**VIIRS-SNPP post-2017 exclusions.** Data from VIIRS-SNPP after 2017 have beenexcluded from some products due to identified spurious trends in the mergedrecord [@xi2025sp].## Using PFT Data for Multivariate OrdinationDespite the uncertainty caveats above, the PFT concentration data arewell-validated and suitable for multivariate analyses. Below are recommendedapproaches for detecting ecosystem-level shifts across sanctuaries.1. **Use concentration variables only (exclude `_uncertainty`).** The PFT chlorophyll-*a* concentrations themselves are the validated retrieval products; the `_uncertainty` fields are informational metadata describing retrieval confidence. They should not be included as ordination input variables alongside the concentrations.2. **Log-transform concentrations.** PFT concentrations span orders of magnitude. Log-transformation stabilizes variance and matches the scale on which the retrieval algorithm was trained and validated [@xi2021]. Use $\log_{10}(\text{Chl-}a + \epsilon)$ where $\epsilon$ is a small constant to handle zeros.3. **Exclude low-skill PFTs or aggregate them.** Consider dropping PROKAR and PROCHLO from ordination (weakest $R^2 < 0.55$), or combine them into a single "prokaryote" group to improve signal-to-noise ratio.4. **PCA / RDA on a sanctuary $\times$ time matrix.** Structure the data as: rows = sanctuary-month combinations, columns = PFT variables (CHL, DIATO, DINO, GREEN, HAPTO, MICRO, NANO, PICO) plus physical variables (thetao, so, mlotst, bottomT, PP). Redundancy analysis (RDA) can then partition biological community variation explained by environmental drivers.5. **Use uncertainty as weights.** In weighted ordination methods (e.g., weighted PCA), use $1 / \text{uncertainty}$ as observation weights to downweight unreliable observations rather than discarding them entirely.6. **Temporal anomaly approach.** Compute anomalies relative to a monthly climatology per sanctuary to remove the dominant seasonal signal, then ordinate on the anomaly matrix to detect regime shifts and long-term trends.7. **Non-metric MDS (NMDS).** NMDS is robust to the non-normal distributions common in ocean-colour data. Use Bray-Curtis dissimilarity on the PFT concentration matrix (after log-transformation) to preserve compositional distances.8. **Rolling window analysis.** Apply ordination in sliding temporal windows (e.g., 5-year windows advancing by 1 year) to track community composition shifts over time, overlaid with environmental drivers (SST, salinity, mixed layer depth).