Overview
Before constructing a query, you need to know what values are valid for each variable in a table. The GSS StatsBank API provides metadata for every table that acts like a menu of available options. This guide shows you how to retrieve and inspect metadata to build accurate queries.
When building queries, you might wonder:
What are the exact values I can use for the “WaterDisposal” variable?
What options are available for “Geographic_Area”?
Are these values case-sensitive?
Metadata answers these questions by providing the complete list of valid values for each variable.
Metadata is automatically fetched when you navigate to a table, but you can also retrieve it explicitly to inspect the available options.
Prerequisites
This guide assumes you’ve:
Established a connection to the API
Navigated to a specific table
Saved the table request to a variable
URL = "https://statsbank.statsghana.gov.gh:443/api/v1/en/"
table_req = build_url(URL, "PHC 2021 StatsBank" , "Water and Sanitation" , "waterDisposal_table.px" )
Fetch the Table Metadata
Perform a GET request to retrieve metadata: metadata = table_req |>
req_perform() |>
resp_body_json()
This returns a nested list containing all table information, including variables and their valid values.
Understand the Metadata Structure
The metadata object contains several components:
title: The table title
variables: A list of all variables in the table
comments: Any notes about the data
Each variable contains:
code: The variable identifier used in queries
text: A human-readable description
values: List of valid values
valueTexts: Descriptions of each value
Inspecting Variable Values
Find a Specific Variable
Filter the metadata to inspect a single variable: variable_info = metadata $ variables |>
purrr::keep( ~ .x $ code == "WaterDisposal" ) |>
purrr::flatten()
This extracts all information about the “WaterDisposal” variable.
Extract Valid Values
Create a tibble of available options: tibble(
Code = unlist (variable_info $ values)
)
Output: # A tibble: 6 × 1
Code
< chr >
1 Total households in occupied dwelling unit
2 Thrown onto the ground / street / outside
3 Through the sewerage system
4 Through drainage into a pit (soak away)
5 Flows or thrown into drains / gutter
6 Other (Specify)
These are the exact values you must use in your query’s values list. Copy them directly to avoid typos.
View All Variables at Once
See all variables in the table: map_chr(metadata $ variables, "code" )
Output: [ 1 ] "WaterDisposal" "Locality" "Geographic_Area"
Creating a Helper Function
Streamline metadata inspection with a reusable function:
get_variable_values = function ( table_req , variable_code ) {
# Fetch metadata
metadata = table_req |>
req_perform() |>
resp_body_json()
# Find the specific variable
variable_info = metadata $ variables |>
purrr::keep( ~ .x $ code == variable_code) |>
purrr::flatten()
# Return a tibble of values
tibble(
Code = unlist (variable_info $ values),
Description = unlist (variable_info $ valueTexts)
)
}
Usage:
# Get all water disposal options
get_variable_values(table_req, "WaterDisposal" )
# Get all geographic areas
get_variable_values(table_req, "Geographic_Area" )
Understanding Variable Types
Variables like “WaterDisposal” represent categories: get_variable_values(table_req, "WaterDisposal" )
# Returns categorical options:
# - Total households
# - Sewerage system
# - Drainage into pit
# etc.
Use the "item" filter to select specific categories in your query.
Variables like “Geographic_Area” represent locations: get_variable_values(table_req, "Geographic_Area" )
# Returns locations:
# - Ghana
# - Western
# - Ashanti
# etc.
You can select specific regions or use filter = "all" for all locations.
Some tables have time periods (though not in our example): # If a table had a "Year" variable:
get_variable_values(table_req, "Year" )
# Would return years:
# - 2020
# - 2021
# - 2022
Exploring Complete Table Structure
View comprehensive metadata for a table:
# Fetch metadata
metadata = table_req |>
req_perform() |>
resp_body_json()
# Table title
metadata $ title
# All variables
map_chr(metadata $ variables, "code" )
# First variable details
metadata $ variables[[ 1 ]] $ code
metadata $ variables[[ 1 ]] $ text
length (metadata $ variables[[ 1 ]] $ values) # Number of values
Use metadata inspection to construct accurate queries:
Inspect Available Values
# See what water disposal options exist
water_options = get_variable_values(table_req, "WaterDisposal" )
print (water_options)
Select Values for Your Query
# Choose specific options from the metadata output
query_list = list (
query = list (
list (
code = "WaterDisposal" ,
selection = list (
filter = "item" ,
values = list (
"Total households in occupied dwelling unit" ,
"Through the sewerage system" ,
"Through drainage into a pit (soak away)"
)
)
),
list (
code = "Geographic_Area" ,
selection = list ( filter = "all" , values = list ( "*" ))
)
),
response = list ( format = "csv" )
)
Execute the Query
response = table_req |>
req_body_json(query_list) |>
req_perform()
data = response |>
resp_body_raw() |>
readr::read_csv()
Finding the Total Category
Many variables include a “Total” category for aggregated data:
water_options = get_variable_values(table_req, "WaterDisposal" )
water_options |> filter (str_detect(Code, "Total" ))
# Result: "Total households in occupied dwelling unit"
Including the “Total” category in your query allows you to calculate percentages and proportions.
Checking Geographic Hierarchy
Geographic areas often follow a hierarchy:
geo_options = get_variable_values(table_req, "Geographic_Area" )
head (geo_options, 20 )
# Typically shows:
# - National level (Ghana)
# - Regional level (Western, Ashanti, etc.)
# - District level (within each region)
Best Practices
Always Inspect Metadata First : Before building a query, check what values are available
Copy Exact Values : Variable values are case-sensitive and must match exactly
Save Metadata Locally : For tables you use frequently, save metadata to avoid repeated API calls
Check for Updates : Metadata may change when new data is added to tables
Case Sensitivity : Values like “Through the sewerage system” and “through the sewerage system” are different. Always use the exact case returned by the metadata.
Special Characters : Some values contain special characters or parentheses. Use them exactly as shown in the metadata.
Here’s a complete example workflow:
library (httr2)
library (tidyverse)
# Helper function
get_variable_values = function ( table_req , variable_code ) {
metadata = table_req |>
req_perform() |>
resp_body_json()
variable_info = metadata $ variables |>
purrr::keep( ~ .x $ code == variable_code) |>
purrr::flatten()
tibble(
Code = unlist (variable_info $ values)
)
}
# Connect and navigate
URL = "https://statsbank.statsghana.gov.gh:443/api/v1/en/"
table_req = build_url(URL, "PHC 2021 StatsBank" , "Water and Sanitation" , "waterDisposal_table.px" )
# Inspect metadata
water_disposal_options = get_variable_values(table_req, "WaterDisposal" )
print (water_disposal_options)
# Build query using valid values from metadata
query_list = list (
query = list (
list (
code = "WaterDisposal" ,
selection = list (
filter = "item" ,
values = list (
"Total households in occupied dwelling unit" ,
"Through the sewerage system" )
)
),
list (
code = "Geographic_Area" ,
selection = list ( filter = "all" , values = list ( "*" ))
)
),
response = list ( format = "csv" )
)
# Execute query
response = table_req |>
req_body_json(query_list) |>
req_perform()
data = response |>
resp_body_raw() |>
readr::read_csv()
Next Steps
You now have a complete understanding of how to work with the GSS StatsBank API. You can:
Connect to the API
Navigate databases and tables
Inspect metadata to find valid values
Construct accurate queries
Retrieve and parse data
For more advanced usage patterns and tips, refer to the Complete Example documentation.