Skip to main content

    EXTRAPOLATING CONTINUOUS PLANKTON RECORDER DATA THROUGH THE SOUTHERN OCEAN USING BOOSTED REGRESSION TREES

    Request Meeting Document
    Document Number:
    WG-SAM-08/12
    Author(s):
    M.H. Pinkerton, A.N.H. Smith (New Zealand), B. Raymond, G. Hosie (Australia) and B. Sharp (New Zealand)
    Abstract

    nnovative multivariate statistical modelling techniques make it possible to generate spatially comprehensive species distribution layers from discontinuous biological data, by fitting complex and scale-dependent relationship between species abundance and available environmental data. The resulting species-specific distribution layers have many potential applications. We apply one such method, called BRT (Boosted Regression Trees), to data on the distribution of Oithona similis, a small cyclopoid copepod which is abundant through much of near-surface waters of the Southern Ocean. A large dataset (>19 000 records) of abundances of O. similis were measured during the SCAR Southern Ocean Continuous Plankton Recorder (SO-CPR) Survey. We demonstrate that it is possible to obtain a relationship between both the abundance and the probability of presence of O. similis and the long-term, broad-scale environmental conditions of the location where the CPR sample was taken. These fitted relationships were used to estimate abundances of O. similis through the Southern Ocean. We present a number of methods for investigating the robustness of the prediction. (1) Non-spatial cross validation tested the relationship against data withheld from the fitting. We found that the data withholding of data from the model fitting must be done on a tow by tow basis as there is significant within-tow correlation. (2) Spatial cross-validation withheld data from particular geographic regions from the fitting process and used these subsequently to test the predictive accuracy of the model. These cross-validation methods showed that the fitted relationships explained 28–38% of the total variance in abundance (depending on method of cross-validation). The area under the ROC for the model predicting presence was 0.77 indicating good discrimination between presence and absence. (3) Spatially-resolved measures were used to test how well the environmental space of the predicted area was spanned by the environmental characteristics associated with the biological samples. This method was applied to the individual environmental data layers singly and to multivariate space defined by all environmental data layers together. The multivariate statistic was used to create a “mask” which excluded from prediction those geographic areas of the Southern Ocean where environmental conditions were not well represented by the SO-CPR sample locations.