PosteriorDB.jl

PosteriorDB.jl is a Julia package for easily working with a posteriordb database. It includes convenience functions for accessing data, model code, and information for individual posteriors, models, data, and reference draws.

Installation

PosteriorDB can be installed from the Julia general registry with

] add PosteriorDB

Example usage

When PosteriorDB.jl is installed, a copy of posteriordb is downloaded. When a database is created with database, this is automatically used.

julia> using PosteriorDB
julia> pdb = PosteriorDB.database()PosteriorDatabase(...)
julia> PosteriorDB.path(pdb)"/home/runner/.julia/artifacts/6bceea9b69198fe2c8610d9a3543229de8862e5d/posteriordb-0.5.0/posterior_database"

For now, the database is read-only. We can list the available posteriors with posterior_names.

julia> PosteriorDB.posterior_names(pdb)147-element Vector{String}:
 "GLMM_Poisson_data-GLMM_Poisson_model"
 "GLMM_data-GLMM1_model"
 "GLM_Binomial_data-GLM_Binomial_model"
 "GLM_Poisson_Data-GLM_Poisson_model"
 "M0_data-M0_model"
 "Mb_data-Mb_model"
 "Mh_data-Mh_model"
 "Mt_data-Mt_model"
 "Mtbh_data-Mtbh_model"
 "Mth_data-Mth_model"
 ⋮
 "wells_data-wells_daae_c_model"
 "wells_data-wells_dae_c_model"
 "wells_data-wells_dae_inter_model"
 "wells_data-wells_dae_model"
 "wells_data-wells_dist"
 "wells_data-wells_dist100_model"
 "wells_data-wells_dist100ars_model"
 "wells_data-wells_interaction_c_model"
 "wells_data-wells_interaction_model"

We can also list available models with model_names and datasets with dataset_names.

julia> PosteriorDB.model_names(pdb)120-element Vector{String}:
 "2pl_latent_reg_irt"
 "GLMM1_model"
 "GLMM_Poisson_model"
 "GLM_Binomial_model"
 "GLM_Poisson_model"
 "M0_model"
 "Mb_model"
 "Mh_model"
 "Mt_model"
 "Mtbh_model"
 ⋮
 "wells_daae_c_model"
 "wells_dae_c_model"
 "wells_dae_inter_model"
 "wells_dae_model"
 "wells_dist"
 "wells_dist100_model"
 "wells_dist100ars_model"
 "wells_interaction_c_model"
 "wells_interaction_model"
julia> PosteriorDB.dataset_names(pdb)91-element Vector{String}: "GLMM_Poisson_data" "GLMM_data" "GLM_Binomial_data" "GLM_Poisson_Data" "M0_data" "Mb_data" "Mh_data" "Mt_data" "Mtbh_data" "Mth_data" ⋮ "synthetic_grid_RBF_kernels" "three_docs1200" "three_men1" "three_men2" "three_men3" "timssAusTwn_irt" "traffic_accident_nyc" "uk_drivers" "wells_data"

We can fetch a posterior using its name and posterior.

julia> post = PosteriorDB.posterior(pdb, "eight_schools-eight_schools_centered")Posterior: eight_schools-eight_schools_centered

Each posterior has a corresponding model and dataset, which can be fetched with model and dataset.

julia> mod = PosteriorDB.model(post)Model: eight_schools_centered
Title: A centered hiearchical model for 8 schools
julia> data = PosteriorDB.dataset(post)Dataset: eight_schools Title: The 8 schools dataset of Rubin (1981)

The same model and dataset can be accessed directly from the database.

julia> PosteriorDB.model(pdb, "eight_schools_centered")Model: eight_schools_centered
Title: A centered hiearchical model for 8 schools
julia> PosteriorDB.dataset(pdb, "eight_schools")Dataset: eight_schools Title: The 8 schools dataset of Rubin (1981)

The functions database, name, and info can be applied to any posterior, model, or dataset.

julia> PosteriorDB.database(post)PosteriorDatabase(...)
julia> PosteriorDB.name(post)"eight_schools-eight_schools_centered"
julia> PosteriorDB.info(post)OrderedCollections.OrderedDict{String, Any} with 8 entries: "name" => "eight_schools-eight_schools_centered" "keywords" => ["stan benchmark", "pathfinder paper"] "model_name" => "eight_schools_centered" "reference_posterior_name" => "eight_schools-eight_schools_noncentered" "data_name" => "eight_schools" "dimensions" => OrderedDict{String, Any}("theta"=>8, "mu"=>1, "… "added_by" => "Mans Magnusson" "added_date" => "2019-08-12"

From the model we can access implementation code and model information.

julia> impl = PosteriorDB.implementation(mod, "stan")PosteriorDB.StanModelImplementation(...)
julia> PosteriorDB.path(impl)"/home/runner/.julia/artifacts/6bceea9b69198fe2c8610d9a3543229de8862e5d/posteriordb-0.5.0/posterior_database/models/stan/eight_schools_centered.stan"
julia> mod_code = PosteriorDB.load(impl)"data {\n int<lower=0> J; // number of schools\n array[J] real y; // estimated treatment\n array[J] real<lower=0> sigma; // std of estimated effect\n}\nparameters {\n array[J] real theta; // treatment effect in school j\n real mu; // hyper-parameter of mean\n real<lower=0> tau; // hyper-parameter of sdv\n}\nmodel {\n tau ~ cauchy(0, 5); // a non-informative prior\n theta ~ normal(mu, tau);\n y ~ normal(theta, sigma);\n mu ~ normal(0, 5);\n}\n\n\n"
julia> println(mod_code)data { int<lower=0> J; // number of schools array[J] real y; // estimated treatment array[J] real<lower=0> sigma; // std of estimated effect } parameters { array[J] real theta; // treatment effect in school j real mu; // hyper-parameter of mean real<lower=0> tau; // hyper-parameter of sdv } model { tau ~ cauchy(0, 5); // a non-informative prior theta ~ normal(mu, tau); y ~ normal(theta, sigma); mu ~ normal(0, 5); }
julia> PosteriorDB.info(mod)OrderedCollections.OrderedDict{String, Any} with 10 entries: "name" => "eight_schools_centered" "title" => "A centered hiearchical model for 8 schools" "description" => "A centered hiearchical model for the 8 schools ex… "keywords" => ["bda3_example", "hiearchical"] "references" => ["rubin1981estimation", "gelman2013bayesian"] "urls" => "http://www.stat.columbia.edu/~gelman/arm/examples… "prior" => OrderedDict{String, Any}("keywords"=>Any[]) "model_implementations" => OrderedDict{String, Any}("stan"=>OrderedDict{Strin… "added_by" => "Mans Magnusson" "added_date" => "2019-08-12"

We can access information about the dataset and load it with load.

julia> PosteriorDB.info(data)OrderedCollections.OrderedDict{String, Any} with 9 entries:
  "name"        => "eight_schools"
  "keywords"    => ["bda3_example"]
  "title"       => "The 8 schools dataset of Rubin (1981)"
  "description" => "A study for the Educational Testing Service to analyze the …
  "urls"        => ["http://www.stat.columbia.edu/~gelman/arm/examples/schools"]
  "references"  => ["rubin1981estimation", "gelman2013bayesian"]
  "data_file"   => "data/data/eight_schools.json"
  "added_by"    => "Mans Magnusson"
  "added_date"  => "2019-08-12"
julia> PosteriorDB.path(data)"/home/runner/.julia/artifacts/6bceea9b69198fe2c8610d9a3543229de8862e5d/posteriordb-0.5.0/posterior_database/data/data/eight_schools.json.zip"
julia> PosteriorDB.load(data)OrderedCollections.OrderedDict{String, Any} with 3 entries: "J" => 8 "y" => [28, 8, -3, 7, -1, 1, 18, 12] "sigma" => [15, 10, 16, 11, 9, 11, 10, 18]
julia> PosteriorDB.load(data, String)"{\n \"J\": 8,\n \"y\": [28, 8, -3, 7, -1, 1, 18, 12],\n \"sigma\": [15, 10, 16, 11, 9, 11, 10, 18]\n}\n"

Lastly, we can access gold standard posterior draws with reference_posterior and load.

julia> ref = PosteriorDB.reference_posterior(post)Reference posterior: eight_schools-eight_schools_noncentered
julia> PosteriorDB.info(ref)OrderedCollections.OrderedDict{String, Any} with 8 entries: "name" => "eight_schools-eight_schools_noncentered" "inference" => OrderedDict{String, Any}("method"=>"stan_sampling", "method_… "diagnostics" => OrderedDict{String, Any}("diagnostic_information"=>OrderedDi… "checks_made" => OrderedDict{String, Any}("ndraws_is_10k"=>true, "nchains_is_… "comments" => nothing "added_by" => "Måns Magnusson" "added_date" => "2020-04-06" "versions" => OrderedDict{String, Any}("rstan_version"=>"rstan 2.21.1", "r…
julia> PosteriorDB.path(ref)"/home/runner/.julia/artifacts/6bceea9b69198fe2c8610d9a3543229de8862e5d/posteriordb-0.5.0/posterior_database/reference_posteriors/draws/draws/eight_schools-eight_schools_noncentered.json.zip"
julia> using DataFrames
julia> DataFrame(PosteriorDB.load(ref))10×10 DataFrame Row theta[1] theta[2] t ⋯ │ Array… Array… A ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ [10.6803, 6.45384, -2.24163, 2.4… [9.71771, 4.41031, 0.761705, 4.4… [ ⋯ 2 │ [5.70891, 10.3012, 4.24395, 11.0… [-2.32311, 14.8122, 6.51726, 5.0… [ 3 │ [7.23747, -0.427832, 9.14782, 18… [7.35426, 8.6958, 8.90588, -8.12… [ 4 │ [4.44916, 2.34711, 17.6804, 0.40… [2.4368, 5.89809, 8.63032, 5.568… [ 5 │ [5.60707, 3.21463, 7.56894, 9.26… [16.8475, 3.61332, 10.1766, 8.95… [ ⋯ 6 │ [-1.75423, 9.82372, 5.38233, 14.… [0.387243, 9.79837, 5.26381, 3.9… [ 7 │ [15.7747, 1.04266, 6.83041, 9.07… [14.1234, 2.9573, 7.18835, 7.422… [ 8 │ [2.27576, 4.01067, 5.04264, 2.46… [1.23522, 11.2636, 5.1447, 3.754… [ 9 │ [14.0308, 6.02058, 12.2585, 6.36… [9.52779, 4.7798, 20.4301, 1.738… [ ⋯ 10 │ [3.69286, 7.65907, 0.358297, 1.1… [8.63972, 9.72966, 2.33489, -1.5… [ 8 columns omitted
julia> PosteriorDB.load(ref, String)"[\n {\n \"theta[1]\": [10.6802773011458, 6.45383910854259, -2.24162964640445, 2.46002311906488, 5.62488583140483, 7.50336783926614, 7.81599603245006, 5.01930011367663, 10.0453731387824, 5.18654343159969, 18.6991518451465, 8.87556423359689, 7.57183387302954, 5.831791526" ⋯ 1813642 bytes ⋯ "7942696, 6.05821941997084, 4.0325475521696, 0.915313753788002, 5.14859498223275, 1.62034403382475, 3.09807118666218, 5.04154487841247, 3.95785320605706, 1.89947657026868, 7.24701481671206, 2.00970915673199, 0.749392121060776, 0.579837182291891, 7.8650469735373]\n }\n]\n"