Understand Go’s string type and Unicode handling with runes
A Go string is a read-only slice of bytes. The language and standard library treat strings specially — as containers of text encoded in UTF-8. Unlike other languages where strings are made of “characters”, Go uses the concept of a rune to represent a Unicode code point.
A rune is an integer that represents a Unicode code point. For more background, see this Go blog post on strings.
package mainimport ( "fmt" "unicode/utf8")func main() { // s is a string assigned a literal value representing // the word "hello" in Thai. Go string literals are UTF-8 encoded text. const s = "สวัสดี" // Since strings are equivalent to []byte, this will produce // the length of the raw bytes stored within. fmt.Println("Len:", len(s)) // Indexing into a string produces the raw byte values at each index. for i := 0; i < len(s); i++ { fmt.Printf("%x ", s[i]) } fmt.Println() // To count how many runes are in a string, use the utf8 package. fmt.Println("Rune count:", utf8.RuneCountInString(s)) // A range loop handles strings specially and decodes each rune // along with its offset in the string. for idx, runeValue := range s { fmt.Printf("%#U starts at %d\n", runeValue, idx) } // You can also use utf8.DecodeRuneInString explicitly. fmt.Println("\nUsing DecodeRuneInString") for i, w := 0, 0; i < len(s); i += w { runeValue, width := utf8.DecodeRuneInString(s[i:]) fmt.Printf("%#U starts at %d\n", runeValue, i) w = width }}
func examineRune(r rune) { // Values enclosed in single quotes are rune literals. // You can compare a rune value to a rune literal directly. if r == 't' { fmt.Println("found tee") } else if r == 'ส' { fmt.Println("found so sua") }}
Rune literals use single quotes ('t'), while string literals use double quotes ("hello").