Retrieving Data

Overview

Once you’ve constructed a query, you need to send it to the API, retrieve the response, and parse it into a usable format. This guide shows you how to execute queries and transform the raw data into analysis-ready dataframes.

Prerequisites

This guide assumes you’ve already:

Established a connection and navigated to a table
Saved the table request (e.g., table_req)
Constructed a query object (e.g., query_list)

# From previous steps
table_req = build_url(URL, "PHC 2021 StatsBank", "Water and Sanitation", "waterDisposal_table.px")

query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list(
          "Total households in occupied dwelling unit",
          "Through the sewerage system")
      )
    ),
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(format = "csv")
)

Executing the Query

Attach the Query to Your Request

Combine your saved table request with the query:

final_req = table_req |> 
  req_body_json(query_list)

This attaches the query parameters to the request object.

Perform the POST Request

Execute the request to retrieve data:

response = final_req |> 
  req_perform()

The API requires a POST request to retrieve actual data. This is different from the GET requests used for navigation.

Parse the Response

Convert the raw response into a dataframe:

data = response |> 
  resp_body_raw() |> 
  readr::read_csv()

The resp_body_raw() function extracts the raw CSV bytes, which are then parsed by read_csv().

Transforming the Data

The raw data often comes in a “wide” format with regions as columns. You’ll typically want to reshape it into a tidy format for analysis.

View the Raw Data Structure

The initial data structure looks like this:

head(data)
# A wide table with geographic areas as column headers

Pivot to Long Format

Transform the data into a long format:

data = data |> 
  tidyr::pivot_longer(
    cols = -1,  # All columns except the first
    names_to = "Geographic_Area", 
    values_to = "count" 
  )

This converts geographic areas from column headers to rows.

Pivot to Wide Format by Variable

Reshape so each water disposal method becomes a column:

data = data |>
  tidyr::pivot_wider(
    names_from = 1,  # First column contains variable names
    values_from = count
  )

Rename Columns for Clarity

Create cleaner column names:

data = data |> 
  rename(
    total_households = `Total households in occupied dwelling unit`,
    sewerage_disposal = `Through the sewerage system`
  )

Complete Data Retrieval Pipeline

Here’s the full pipeline from query to clean data:

# Attach the query to our existing request object
final_req = table_req |> 
  req_body_json(query_list)

# Perform the request 
response = final_req |> 
  req_perform()  

# Parse and transform the data
data = response |> 
  resp_body_raw() |> 
  readr::read_csv() |> 
  tidyr::pivot_longer(
    cols = -1, 
    names_to = "Geographic_Area", 
    values_to = "count" 
  ) |>
  tidyr::pivot_wider(
    names_from = 1, 
    values_from = count
  ) |> 
  rename(
    total_households = `Total households in occupied dwelling unit`,
    sewerage_disposal = `Through the sewerage system`
  )

Viewing the Results

Preview the First Rows

Check the first few rows of your clean data:

head(data, n=10)

Expected output:

# A tibble: 10 × 3
   Geographic_Area                           total_households sewerage_disposal
   <chr>                                                <dbl>            <dbl>
Ghana                                              8356779           192638
Western                                             620785             9238
Jomoro Municipal                                     39438              418
Ellembelle                                           37206              651
Nzema East Municipal                                 28386              116
Ahanta West Municipal                                47288              581
Effia Kwesimintsim Municipal                         53717             1650
Sekondi Takoradi Metropolitan Area (STMA)            73876             1406
STMA-Takoradi                                        16271              323
STMA-Sekondi                                         17137              404

Verify Data Types

Check the structure of your dataframe:

str(data)

Ensure numeric columns are properly typed as <dbl> and character columns as <chr>.

Handling Different Response Formats

Working with JSON Responses

If you requested JSON format instead of CSV:

# Query with JSON format
query_list = list(
  query = list(...),
  response = list(format = "json")
)

# Parse JSON response
response = final_req |> req_perform()
data = response |> resp_body_json()

# data is now a nested list structure

Handling Large Datasets

For large responses, consider streaming or pagination:

# Set a timeout for large requests
final_req = table_req |> 
  req_body_json(query_list) |>
  req_timeout(60)  # 60 second timeout

response = final_req |> req_perform()

Data Transformation Patterns

Pipeline Approach: Chain operations together using the pipe operator (|>) for clean, readable code that transforms data in logical steps.

Creating Calculated Fields

Add derived columns to your data:

data = data |> 
  mutate(
    sewerage_percentage = (sewerage_disposal / total_households) * 100,
    non_sewerage = total_households - sewerage_disposal
  )

Filtering Results

Filter to specific geographic areas:

# Filter to regional level only
regional_data = data |> 
  filter(Geographic_Area %in% c("Ghana", "Western", "Eastern", "Greater Accra"))

Error Handling

Empty Results: If your query returns no data, check that your variable values match exactly with the metadata. Values are case-sensitive.

Parsing Errors: If read_csv() fails, verify that you specified format = "csv" in your query’s response section.

Saving Your Data

Save the processed data for later use:

# Save as CSV
write_csv(data, "water_disposal_data.csv")

# Save as RDS (preserves R data types)
saveRDS(data, "water_disposal_data.rds")

# Load RDS later
data = readRDS("water_disposal_data.rds")

Complete Working Example

Here’s a complete script from start to finish:

library(httr2)
library(tidyverse)

# Setup
URL = "https://statsbank.statsghana.gov.gh:443/api/v1/en/"
table_req = build_url(URL, "PHC 2021 StatsBank", "Water and Sanitation", "waterDisposal_table.px")

# Query
query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list(
          "Total households in occupied dwelling unit",
          "Through the sewerage system")
      )
    ),
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(format = "csv")
)

# Retrieve and parse
response = table_req |> 
  req_body_json(query_list) |> 
  req_perform()

data = response |> 
  resp_body_raw() |> 
  readr::read_csv() |> 
  tidyr::pivot_longer(
    cols = -1, 
    names_to = "Geographic_Area", 
    values_to = "count" 
  ) |>
  tidyr::pivot_wider(
    names_from = 1, 
    values_from = count
  ) |> 
  rename(
    total_households = `Total households in occupied dwelling unit`,
    sewerage_disposal = `Through the sewerage system`
  )

# View results
head(data, n=10)

Next Steps

Learn how to inspect table metadata to discover all available variables and their valid values. See the Working with Metadata guide to explore table structure and find valid query parameters.

Get Started

Core Concepts

Guides

Reference

Resources

Overview

Prerequisites

Executing the Query

Transforming the Data

Complete Data Retrieval Pipeline

Viewing the Results

Handling Different Response Formats

Data Transformation Patterns

Creating Calculated Fields

Filtering Results

Error Handling

Saving Your Data

Complete Working Example

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Reference

Resources

​Overview

​Prerequisites

​Executing the Query

​Transforming the Data

​Complete Data Retrieval Pipeline

​Viewing the Results

​Handling Different Response Formats

​Data Transformation Patterns

​Creating Calculated Fields

​Filtering Results

​Error Handling

​Saving Your Data

​Complete Working Example

​Next Steps

Build docs developers (and LLMs) love

Overview

Prerequisites

Executing the Query

Transforming the Data

Complete Data Retrieval Pipeline

Viewing the Results

Handling Different Response Formats

Data Transformation Patterns

Creating Calculated Fields

Filtering Results

Error Handling

Saving Your Data

Complete Working Example

Next Steps