1. Perform a retrospective case control study on pathophysiology traits of bees collected from colonies in September which survived the winter or did not.
2. Further refine and validate a colony survival predictive model.
The purpose of this project is to help beekeepers mitigate colony loss. Overwinter colony losses have averaged 30% over the past 10 years, which is double that deemed acceptable by beekeepers. As it is difficult, if not impossible, to diagnose ill colonies prior to loss, my goal is to develop new tools to provide beekeepers with better insight regarding the health of their colonies. I will achieve this goal by combining pathophysiological analysis with Computer Aided Diagnosis to develop a model that predicts overwinter colony loss in the fall. If successful, these tools may provide beekeepers with the information necessary to expect certain losses, thus reducing their labor and materials costs for overwinter care. They will also lay the groundwork for researchers to investigate the putative causes of predictive pathophysiological states, which may lead to better tools in the future.
Subsamples from the Sentinel Apiary program will be utilized based on the criteria of paired colonies from the same apiaries; one which survived the winter and one that did not. For the sample years of 2017 and 2018 in the sample months of September, 50 pairs of colonies per year will be selected and 20 bees from each sample will be autopsied, for a total of 4,000 autopsies across 200 samples. All geographic regions and management practices represented by Sentinel Apiary participants will be included. This retrospective case controlled experimental design is the most powerful way to make predictive models.
Autopsies will be performed using standardized methods. In previous studies 17 physiological differences were assessed. In addition, I propose monitoring two additional traits; quantity of fat body and hypopharyngeal gland condition. Recent and past work have implicated the importance of these organs in monitoring colony health, and so I include these, as they may help increase the precision of our predictive models. Twenty bees, randomly selected from each stored sub sample will be autopsied, requiring an average 2 hours to complete. Data collection will be carried out as a binary response to the presence or absence of each trait. The resulting data set will be combined with all existing data from the Sentinel Apiary program including parasite loads (Varroa and Nosema), location, number of frames of bees, brood pattern, queen status, survivorship, and any applied management practices.
All analysis will be carried out in R statistical programming language. The original work on Pathophysiology and Colony Collapse Disorder successfully classified colonies and apiaries using a statistical learning technique called Classification and Regression Trees (CART). This is a non-linear and non-parametric method that recursively divides the predictor space based on the response variable. Where these divisions are made is decided by the algorithm through minimizing the Gini coefficient, a calculated measure of equality, at each split. The output is often an easy to interpret decision tree, where the highest branch (or first split/node) is the variable with the smallest Gini coefficient. Given the right set of variables and enough observations, CART can be very useful for determining the relationship between many predictors and a response variable. These trees are, however, extremely sensitive to outliers, which can lead to misclassification, instability, or misinterpretation of the results.
To improve upon the predictive accuracy of CART, ensemble learning methods that combine the construction of many trees can be applied. Such methods include:
o Bootstrap aggregation of multiple trees and taking their average
• Random Forests
o Like Bagging but only uses a subset of variables to build each tree before averaging
o Can reveal important relationships among underrepresented predictors and the response
o Each tree is built upon the error of its predecessor with the goal of reducing the error at each iteration
o The result is potentially a tree of low error with high prediction accuracy
I have applied all these methods to our current dataset to determine which performs with the highest accuracy. In each instance, two tiers of cross validation were used to assess each model: 10-fold cross validation on the entire data set and Leave-One-Out cross validation,
where each colony was a single observation. In its current form, the data perform well with all methods listed but the lowest error occurs when using a form of Boosting called Adaptive Boosting, or AdaBoost, which combines a series of weak learners to produce a strong learner through iterative error reduction.
Upon completion of the proposed autopsies, I will incorporate the new data with my existing set and revisit all the above methods, tuning and assessing each model accordingly. Selection of the final model will be based on the highest and most consistent prediction accuracy through the described methods of cross validation.
1/15/2020 – There are currently no results to discuss.
1/15/2020 – I am currently on schedule with the proposed timeline for this project. Two undergraduate technicians have been hired and trained under the standardized methods for performing autopsies and collecting data. During that time, I identified and located the samples needed using the predetermined criteria detailed in the methods section. Sample processing officially began in October and the technicians have completed ~50% of the samples, keeping us within the goal of all data being collected by April 2020.