Skip to main content

Overview

Once you’ve navigated to a specific table, you need to construct a query that tells the API exactly what data you want. Queries are sent as JSON objects and specify which variables to include, what filters to apply, and what format you want the response in.

Query Structure

A query has two main components:
  1. Query: Specifies which variables and filters to apply
  2. Response: Defines the output format (e.g., CSV, JSON)
The query structure follows the PxWeb API standard. For more details, see the API documentation.

Prerequisites

This guide assumes you’ve already:
  • Established a connection to the API
  • Navigated to a specific table
  • Saved the table request to a variable (e.g., table_req)
# From the previous navigation step
table_req = build_url(URL, "PHC 2021 StatsBank", "Water and Sanitation", "waterDisposal_table.px")

Building a Basic Query

1

Understand Available Variables

When you navigated to the table, you saw the available variables:
[1] "WaterDisposal"   "Locality"        "Geographic_Area"
These are the variables you can filter and select in your query.
2

Define Your Query List

Create a query to select specific water disposal methods:
query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list(
          "Total households in occupied dwelling unit",
          "Through the sewerage system") # Selecting one specific method
      )
    ),
    # We must select values for other variables or the API might aggregate them
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(
    format = "csv" # We want the data back as a CSV
  )
)
3

Understand the Query Components

Let’s break down each part:Variable Selection:
list(
  code = "WaterDisposal",  # The variable name
  selection = list(
    filter = "item",        # Filter type
    values = list(...)      # Specific values to include
  )
)
Geographic Area (all values):
list(
   code = "Geographic_Area",
   selection = list(filter = "all", values = list("*"))
)
Response Format:
response = list(
  format = "csv"  # Options: csv, json, json-stat, json-stat2, px
)

Filter Types

The API supports different filter types for variable selection:

Item Filter

Select specific values from a variable:
selection = list(
  filter = "item",
  values = list(
    "Total households in occupied dwelling unit",
    "Through the sewerage system"
  )
)
Use the “item” filter when you want to select only certain categories from a variable.

All Filter

Select all available values using the wildcard *:
selection = list(
  filter = "all",
  values = list("*")
)
You must include all variables from the table in your query, either by selecting specific values or using the “all” filter.

Response Formats

The API supports multiple output formats:
FormatDescriptionUse Case
csvComma-separated valuesEasy to parse with read_csv()
jsonJSON formatWeb applications, flexible parsing
json-statJSON-stat formatStatistical applications
json-stat2JSON-stat version 2Updated statistical format
pxPC-Axis formatNative PxWeb format
CSV format is recommended for most R workflows as it integrates seamlessly with readr::read_csv().

Complete Query Example

Here’s a complete example querying water disposal data:
# Define the query
query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list(
          "Total households in occupied dwelling unit",
          "Through the sewerage system")
      )
    ),
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(
    format = "csv"
  )
)

Advanced Query Patterns

Select multiple specific categories:
query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list(
          "Total households in occupied dwelling unit",
          "Through the sewerage system",
          "Through drainage into a pit (soak away)",
          "Flows or thrown into drains/gutter"
        )
      )
    ),
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(format = "csv")
)
Get data in JSON format instead of CSV:
query_list = list(
  query = list(
    list(
      code = "WaterDisposal",
      selection = list(
        filter = "item",
        values = list("Total households in occupied dwelling unit")
      )
    ),
    list(
       code = "Geographic_Area",
       selection = list(filter = "all", values = list("*"))
    )
  ),
  response = list(
    format = "json"  # Changed to JSON
  )
)

Common Issues

All Variables Required: You must include all table variables in your query. If a table has three variables, your query must specify selections for all three.
Exact Value Names: Variable values are case-sensitive and must match exactly as returned by the metadata. “Through the sewerage system” is different from “through the sewerage system”.

Finding Valid Values

Not sure what values to use in your query? See the Working with Metadata guide to learn how to inspect table metadata and find all valid values for each variable.

Next Steps

Now that you’ve constructed a query, you’re ready to retrieve and parse the data. See the Retrieving Data guide to learn how to execute your query and process the results.

Build docs developers (and LLMs) love