Skip to main content

Overview

Python strings are Unicode (UTF-8) text. The C API provides extensive functions for creating and manipulating Unicode strings.

Type Object

PyTypeObject PyUnicode_Type

Type Checking

int PyUnicode_Check(PyObject *o)
int PyUnicode_CheckExact(PyObject *o)

Creating Unicode Strings

From C Strings

PyObject* PyUnicode_FromString(const char *u)
Create from UTF-8 encoded C string (null-terminated). Example:
PyObject *hello = PyUnicode_FromString("Hello, World!");
return hello;

With Length

PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Example:
PyObject *str = PyUnicode_FromStringAndSize("Hello", 5);

From Format String

PyObject* PyUnicode_FromFormat(const char *format, ...)
Like sprintf for Python strings. Format codes:
  • %s - C string (UTF-8)
  • %d, %ld - Integers
  • %S - PyObject* (via PyObject_Str())
  • %R - PyObject* (via PyObject_Repr())
Example:
PyObject *msg = PyUnicode_FromFormat("Value: %d, Name: %s", 42, "test");

Converting to C Strings

const char* PyUnicode_AsUTF8(PyObject *unicode)
Get UTF-8 encoded C string. Returns NULL on error. Example:
const char *str = PyUnicode_AsUTF8(obj);
if (str == NULL)
    return NULL;
printf("String: %s\\n", str);
The returned pointer is owned by the Unicode object. Don’t free it!

String Operations

Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
Example:
PyObject *parts = PyUnicode_Split(str, PyUnicode_FromString(","), -1);

Complete Example

static PyObject* format_name(PyObject *self, PyObject *args) {
    const char *first, *last;
    
    if (!PyArg_ParseTuple(args, "ss", &first, &last))
        return NULL;
    
    return PyUnicode_FromFormat("%s %s", first, last);
}

See Also

Build docs developers (and LLMs) love