Overview
The content keywords indicate that an instance contains non-JSON data encoded in a JSON string. These properties provide additional information required to interpret JSON data as rich multimedia documents by describing the type of content, how it is encoded, and/or how it may be validated.Purpose
Content keywords are designed to:- Describe binary data encoded as strings (e.g., base64-encoded images)
- Specify the media type of string content (e.g., HTML, JSON, XML)
- Provide a schema for validating decoded content
- Enable applications to properly handle embedded data
Implementation Requirements
Due to security and performance concerns, as well as the open-ended nature of possible content types, implementations MUST NOT automatically decode, parse, and/or validate the string contents.
- Read the content annotations from the schema
- Use these annotations to invoke appropriate libraries separately
- Handle decoding, parsing, and validation explicitly when needed
- Apply only to strings
- Have no effect on other data types
- Produce annotations, not assertions
Keywords
contentEncoding
Defines how binary data is encoded in a string.
Value: String
If the instance value is a string, this property defines that the string SHOULD be interpreted as encoded binary data. Applications wishing to decode it SHOULD do so using the encoding named by this property.
Common Encoding Values
base64
base64
Base64 encoding as defined in RFC 4648. This is the most common encoding for binary data in JSON.Use case: Images, PDFs, binary filesExample:
base32
base32
Base32 encoding as defined in RFC 4648.Use case: Case-insensitive encoding needs
base16
base16
Base16 (hexadecimal) encoding as defined in RFC 4648.Use case: Hexadecimal data representation
quoted-printable
quoted-printable
Quoted-printable encoding from RFC 2045, sections 6.7.Use case: MIME email content
7bit, 8bit, binary
7bit, 8bit, binary
MIME transfer encodings from RFC 2045, section 6.8.Use case: MIME context
As “base64” is defined in both RFC 4648 and RFC 2045, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context.
Identity Encoding
IfcontentEncoding is absent but contentMediaType is present, this indicates that the encoding is the identity encoding (no transformation was needed to represent the content in a UTF-8 string).
All encoding values defined in the RFCs result in strings consisting only of 7-bit ASCII characters. Therefore,
contentEncoding has no meaning for strings containing characters outside of that range.contentMediaType
Indicates the media type (MIME type) of the string contents.
Value: String (must be a valid media type)
Standard: RFC 2046
If the instance is a string, this property indicates the media type of the contents. If contentEncoding is present, this property describes the decoded string.
Common Media Types
text/html
HTML content
application/json
JSON data
application/xml
XML data
image/png
PNG image
image/jpeg
JPEG image
application/pdf
PDF document
application/jwt
JSON Web Token
text/csv
CSV data
contentSchema
Describes the structure of the decoded string content.
Value: Valid JSON Schema
If the instance is a string, and if contentMediaType is present, this keyword’s subschema describes the structure of the string.
Requirements
- This keyword MAY be used with any media type that can be mapped into JSON Schema’s data model
- Specifying such mappings is outside the scope of this specification
- The subschema is produced as an annotation
contentSchemaSHOULD NOT produce an annotation ifcontentMediaTypeis not present
Examples
Base64-Encoded Image
HTML Content
Embedded JSON
JWT with Schema
This example describes a JWT that is MACed using the HMAC SHA-256 algorithm, and requires the “iss” and “exp” fields in its claim set.contentEncoding does not appear in this example. While the application/jwt media type uses base64url encoding, that is defined by the media type itself, which determines how the JWT string is decoded into a list of two JSON data structures: first the header, and then the payload. Since the JWT media type ensures that the JWT can be represented in a JSON string, there is no need for further encoding or decoding.Base64-Encoded PDF
CSV Data
Base64-Encoded Binary File
Complete Example: File Upload Schema
Security Considerations
Applications can mitigate this risk by:- Only performing processing when a relationship between the schema and instance is established (e.g., they share the same authority)
- Validating the decoded content in a sandboxed environment
- Setting appropriate resource limits for decoding operations
- Being aware of the security considerations of the specific media type or encoding being processed
Use Cases
File Upload APIs
File Upload APIs
APIs that accept file uploads as base64-encoded strings can use content keywords to specify the expected encoding and media type.
Embedded Documents
Embedded Documents
Systems that store HTML, XML, or other documents as JSON strings can use content keywords to indicate the document type.
Binary Data Storage
Binary Data Storage
Applications storing binary data (images, PDFs, etc.) as base64 strings can document the encoding format.
Nested JSON
Nested JSON
Systems with JSON strings containing other JSON data can use
contentSchema to validate the nested structure.JWT Validation
JWT Validation
APIs using JWTs can specify the expected JWT structure and claims using content keywords.
Best Practices
-
Always specify media type: When using
contentEncoding, also includecontentMediaTypeto fully describe the data - Use standard encodings: Prefer well-known encodings like base64 over custom encoding schemes
- Document behavior: Clearly document how your application handles content keywords
- Validate separately: Perform content validation separately from schema validation
- Consider size limits: Set appropriate limits on encoded data size to prevent resource exhaustion
Related Topics
- Format Validation - Semantic format validation
- Annotation Keywords - Meta-data keywords
- Structural Validation - Type and structural constraints