Approximate Bayesian Computation related methods

BiochemNetABC.abc_smcMethod
abc_smc(pm::ParametricModel, l_obs, func_dist; nbr_particles, alpha, kernel_type, NT
        duration_time, bound_sim, sym_var_aut, verbose)

Run the ABC-SMC algorithm with the pm parametric model.

func_dist(l_sim, l_obs) is the distance function between simulations and observation, it corresponds to $\rho(\eta(y_sim), \eta(y_exp))\$. l_obs::Vector{<:T2} is a collection of observations. dist must have a signature of the form func_dist(l_sim::Vector{T1}, l_obs::Vector{T2}).

If pm is defined on a ContinuousTimeModel, then T1 should verify T1 <: Trajectory.

!!! Distance function and distributed ABC If you use abc_smc with multiple workers, dist has to be defined on every workers by using @everywhere.

source
BiochemNetABC.abc_model_choice_datasetMethod
abc_model_choice_dataset(models,
                         summary_stats_observations,
                         summary_stats_func::Function, distance_func::Function,
                         k::Int, N_ref::Int; dir_results::Union{Nothing,String} = nothing)

Creates a reference table for ABC model choice with discrete uniform prior distribution over the models.

source
BiochemNetABC.abc_model_choice_datasetMethod
abc_model_choice_dataset(models, models_prior,
                         summary_stats_observations,
                         summary_stats_func::Function, distance_func::Function,
                         k::Int, N_ref::Int; dir_results::Union{Nothing,String} = nothing)

Creates a reference table for ABC model choice.

The mandatory arguments are:

  • models is a list of objects inherited from Model or ParametricModel,
  • models_prior: the prior over the models (by default: discrete uniform distribution)
  • summary_stats_observations are the summary statitics of the observations,
  • summary_stats_func::Function: the function that computes the summary statistics over a model simulation,
  • distance_func: the distance function over the summary statistics space,
  • N_ref: the number of samples in the reference table,
  • k: the k nearest samples from the observations to keep in the reference table (k < N_ref).

The result is a AbcModelChoiceDataset with fields:

  • summary_stats_matrix: the (Nstats, Nref) features matrix. Accessible via .X.
  • summary_stats_observations: the observations used for simulating the dataset.
  • models_indexes: the labels vector. Accessible via .y.

If specified, dir_results is the directory where the summary statistics matrix and associated models are stored (CSV).

source
BiochemNetABC.posterior_proba_modelMethod
posterior_proba_model(rf_abc::RandomForestABC)

Estimates the posterior probability of the model $P(M = \widehat{M}(s_{obs}) | s_{obs})$ with the Random Forest ABC method.

source
BiochemNetABC.rf_abc_model_choiceMethod
rf_abc_model_choice(abc_trainset;
                    k::Int = N_ref, distance_func::Function = (x,y) -> 1, 
                    hyperparameters_range::Dict)

Run the Random Forest Approximate Bayesian Computation model choice method with an already simulated dataset.

The mandatory arguments are:

  • abc_trainset: an already simulated dataset with `abc_model_choice_dataset

The optional arguments are:

  • hyperparameters_range: a dict with the hyperparameters range values for the cross validation fit of the Random Forest (by default: Dict(:n_estimators => [200], :min_samples_leaf => [1], :min_samples_split => [2])). See scikit-learn documentation of RandomForestClassifier for the hyperparameters name.

The result is a RandomForestABC object with fields:

  • reference_table an AbcModelChoiceDataset that corresponds to the reference table of the algorithm,
  • clf a random forest classifier (PyObject from scikit-learn),
  • summary_stats_observations are the summary statitics of the observations
  • estim_model is the underlying model of the observations inferred with the RF-ABC method.
source
BiochemNetABC.rf_abc_model_choiceMethod
rf_abc_model_choice(models, summary_stats_observations,
                    summary_stats_func::Function, N_ref::Int;
                    k::Int = N_ref, distance_func::Function = (x,y) -> 1, 
                    hyperparameters_range::Dict)

Run the Random Forest Approximate Bayesian Computation model choice method.

The mandatory arguments are:

  • models is a list of objects inherited from Model or ParametricModel,
  • summary_stats_observations are the summary statitics of the observations
  • N_ref: the number of samples in the reference table.
  • summary_stats_func::Function: the function that computes the summary statistics over a model simulation.

The optional arguments are:

  • models_prior: the prior over the models (by default: discrete uniform distribution)
  • k: the k nearest samples from the observations to keep in the reference table (by default: k = N_ref)
  • distance_func: the distance function, has to be defined if k < N_ref
  • hyperparameters_range: a dict with the hyperparameters range values for the cross validation fit of the Random Forest (by default: Dict(:n_estimators => [200], :min_samples_leaf => [1], :min_samples_split => [2])). See scikit-learn documentation of RandomForestClassifier for the hyperparameters name.

The result is a RandomForestABC object with fields:

  • reference_table an AbcModelChoiceDataset that corresponds to the reference table of the algorithm,
  • clf a random forest classifier (PyObject from scikit-learn),
  • summary_stats_observations are the summary statitics of the observations
  • estim_model is the underlying model of the observations inferred with the RF-ABC method.
source