BIKE app is a graphical user interface developed for running the BIKE model (a Bayesian dietary exposure assessment model for microbiological and chemical hazards) and inspecting the results.
It is based on connected Bayesian hierarchical models, utilizing OpenBUGS and R in tandem. Chronic and acute exposures are estimated for chemical and microbiological hazards, respectively. Uncertainty and variability in exposures are visualized, and a few optional model structures are available.
BIKE app is open source and available from GitHub (https://github.com/jukran/BIKE2) . Simulated synthetic data resembling real occurrence and consumption data is provided with the code as an example.
Find more about BIKE model from:
Ranta J, Mikkelä A, Suomi J, Tuominen P. BIKE: Dietary Exposure Model for Foodborne Microbiological and Chemical Hazards. Foods. 2021; 10(11):2520. https://doi.org/10.3390/foods10112520
To run BIKE, four separate files need to be prepared. They have to contain data for concentrations, consumptions, occurrence, and prevalence, respectively.
The columns in the files should have specific names. The names of the food types and the hazards should match in all data files.
The data should be uploaded as csv files using point (.) for decimal and comma (,) for field separation. The data should not contain any special characters (e.g., ä, ö, å, etc.).
Read more on how to prepare the files here.
BIKE provides an option to use model settings that are most suitable for the input data.
These include Consumption model, Correlation models, Priors for variances and Number of MCMC iterations.
Note that the time for the simulations to complete depends on the number of iterations selected. It is recommended to start with small numbers, e.g., the default 4000.
After the four files are uploaded and the model is set up, the simulations could be run.
The results are visualized with figures and tables , and their content could be changed, e.g., food type, hazard, credible interval, etc.
In addition, adjustment factors for both the concentration level and the prevalence for each food-hazard combination could be assigned in the exposures section.
New 2D simulation for the quantiles figure is running after the button 'Generate plot' is pressed.
The exposure limit analyses table and the posterior predictive distribution summaries table generate after the button 'Generate table' is pressed.
The data file with hazard concentrations has to contain at least the columns with the following headings: Type, Hazard, Concentration, LOQ, LOD, Unit . Each row should contain one measurement result of one hazard from one food type.
Concentration data file may also contain other columns although these are not currently used in the model.
The column Type is for the food type in question, e.g. 'broiler'. This must be the same food type for which consumption data are given in the other data files, i.e., the names should be spelled exactly the same way. There can be other columns giving broader food categories, e.g. 'poultry' or 'meat foods', but these are not used in the model since the connection between hazard data and food consumption data is done at the finest feasible level of food type classification provided. Hence, 'Type' can denote raw ingredients or composite food types containing many ingredients. The names of the food types could be any character strings (without spaces), e.g. FoodEx2 codes or other naming system, but very long names should be avoided for clarity and for more compact labels in the plotting windows. For example, 'minced meat casserole' could be shortened to 'mmeatcas' when preparing the data files. The column Hazard specifies the hazard name in question, e.g., 'cadmium' or 'salmonella' for each row, while the Concentration column is for the numeric concentration values measured for the specific hazard name and food type.
Columns LOQ (limit of quantification) and LOD (limit of detection) specify the measurement limits. The notation format is the same for chemical hazards and microbiological hazards. The possibilities for each measurement are: reported numerical value (> LOQ), a value between LOQ and LOD, or a value below LOD. The limits can also be different for each measurement. If a concentration value is reported in column 'Concentration', it is interpreted as an exact measurement. If the value was between the two limits, then both LOQ and LOD need to be given as numerical values, while concentration is marked NA. If the value was below LOD, then LOD needs to be given as numerical value, and both concentration and LOQ are marked NA. In this way, BIKE will know which of the three situations is in question for each hazard concentration measurement per row.
The column Unit is for specifying the measurement units, e.g. mg/kg, or cfu/g. These are not automatically converted to be compatible in the calculations. Therefore, when preparing the data files compatible measurements must be used. If the concentration values are per gram, so must be the food consumption amounts as grams per day. A suitable measurement unit is such that it does not lead to extremely small or large numerical values since this could affect also the numerical computations. Therefore, sensible measurement units have to be selected.
The data file with food consumption has to contain at least the columns with the following names: IDnum, Weight, foodA1, foodA2, foodB1, foodB2, etc.
The number of reporting days must be the same for all persons. Missing values not allowed. Consumption data file may also contain other columns although these are not currently used in the model. For example, age of respondents.
Food consumption data corresponds to food diary data format where daily food consumption amounts per each individual are tabulated per food item, row-by-row. The column named IDnum is for the respondent's number, Weight is for the bodyweight, and rest are the columns with names for the detailed food types consumed on a specific day. For example, broiler1 for consumption amounts of broiler on the first day. The next columns could be likewise fish1, apple1, etc. These columns would be followed by the same list of food types for the second day, e.g. broiler2, fish2, apple2, etc. There need to be at least two days recorded for each consumer and the same number of days for all. Each row represents the reported consumptions of one consumer. The food types can represent composite foods or raw ingredients as needed, but the names of the food types (apart from the day number as the last character) have to be the same as those used in the hazard concentration data. Each row gives either the consumed food amounts, or zeros, for the reported days. The measurement units also need to be compatible with those in the concentration data, e.g. consumptions in grams if concentrations are given per grams. Consumption data may originally come in a hierarchical form that has several levels of food types with increasing details, e.g. seafood, fish, smoked fish, smoked salmon. However, only one of those labels (character string without spaces) has to be selected and used throughout in consumption data as well as in concentration data. This labeling of food items can only be as detailed as both data sets permit.
The file with occurrence data needs to contain the columns with the following headings: hazardnames, hazardtypes, limitexpo, foodA, foodB, etc.
Note that 'all' for concentration information implies that the concentration distribution will be estimated jointly with prevalence parameter using a zero-inflated model where the fraction of measurements below LOD are interpreted allowing the possibility of true zeros. Then, prevalence data file should mark the corresponding hazard sample data as 'NA' to signify the hazard prevalence is not estimated from separate sample information.
The file should contain a table with rows for each hazard specifying the name of the hazard (e.g. 'cadmium'), the type ( 'K' for chemical, 'M' for microbiological), and the exposure limit. The remaining columns have headers according to the food types, and the row entry will specify whether the concentration data for that food-hazard pair should be interpreted to represent only truly positive concentrations (even when below LOD), or as any measurements which might contain also truly zeros when the measurement fell below LOD. In the former case, the correct entry is positives , and in the latter case the correct entry is all . If data for some food-hazard pair is missing, the correct entry is NA . Note that this interpretation applies to full set of concentration data for the particular food-hazard pair.
The file with prevalence data need to contain a table with columns with the following headings: hazardnames, hazardtypes, infoods, npositive, nsample.
Note that 'NA' for sample information implies that the prevalence will be estimated jointly from the concentration data using a zero-inflated model where the fraction of measurements below LOD are interpreted allowing the possibility of true zeros. Then, occurrence data file should mark the corresponding hazard concentrations as 'all' to signify they may contain both true zeros and small positive values when below LOD.
The file with prevalence data need to contain a table with column names hazardnames, hazardtypes, infoods, npositive, nsample , in the exact came order (!), and the row entries for the last two columns will define the number of true positivesand the sample size to be used if the concentration data for that food-hazard pair should represent only positive concentrations. Otherwise, the number of true positives and the sample size are marked 'NA'. If the sample information is marked 'NA', then the corresponding entry in occurrence table should be 'all' to allow estimation of prevalence jointly with concentration distribution using zero-inflated modeling. Hence, the 'positives' in occurrence table should go together with specific values for prevalence sample data, and 'all' should go together with 'NA' in prevalence table.
The source code is available at GitHub
Ranta J, Marinova-Todorova M, Mikkelä A, Suomi J, Tuominen P 2023. BIKE foodborne exposure model - A graphical user interface for the Bayesian dietary exposure assessment model for microbiological and chemical hazards (BIKE). Finnish Food Authority, Helsinki, Finland. Available at https://bike-expo-shiny.rahtiapp.fi/