Toit strings

Toit strings are immutable value objects, containing Unicode strings. The characters are encoded with UTF-8, although most of the time you can ignore this.

String literals

String literals use double quotes, ". (Single quotes are reserved for character literals.)

If a string literal contains double quotes ("), or backslashes (\), then these must be escaped with backslashes:

main:
  print "String with \"quotes\"."  // >> String with "quotes".
  print "Zig zag string /\\/\\."   // >> Zig zag string /\/\.

As an alternative, you can embed double quote characters using triple-quote delimited strings:

main:
  print """String with "quotes"."""  // >> String with "quotes".

Escaped characters

String and character literals can also contain other escaped characters, most of which could be written literally, but where it is often more readable to use escapes. For example, you can put a literal tab character in a literal string, but it will look like a series of spaces, so you can use \t instead to make it clearer.

Toit supports the following escape sequences (see the ASCII control code chart for the list of names):

Control code chart namesUnicode characterUnicode code point (rune)
\0NullNUL0
\aAlert (Beep, Bell)BEL7
\bBackspaceBS8
\fForm feedFF12
\nLine Feed or NewlineLF10
\rCarriage ReturnCR13
\tHorizontal TabHT9
\vVertical TabVT11
\$$ (See below for the use of $ for string interpolation)36
\""34
\''39
\\\ 92
\uXXXX (where X is a hex char)Unicode rune
\u{X...} (where X... are hex chars)Unicode rune
\xXX (where X is a hex char)Unicode rune
\x{X...} (where X... are hex chars)Unicode rune

The following shows some examples of the numeric escapes:

  // For two-digit code points (runes).
  // using \x, the curly braces are optional.
  s1 /string := "S\xf8en s\x{e5} s\u{00e6}r ud."
  print s1  // >> Søen så sær ud.

  // For four-digit values you can either use \x with curly
  // braces or \u.
  s2 /string := "\u{20ac} \u20ac \x{20ac}"
  print s2  // Prints "€ € €", which is Unicode 0x20ac.

  // The \u version can also be used with curly braces.
  // It is in fact necessary for 5-digit Unicode runes.
  s3 /string := "\u{1f648}"
  print s3  // Prints the see-no-evil-monkey emoji.

  // \x{} is recommended for one or two digits, and \u{} is
  // recommended for three to five digits.

  // The same escapes work in character literals.
  c /int := '\x2a'
  print c  // Prints "42", which is the decimal from of 0x2a.

  c2 /int := '\''
  print c2 // Prints "39", the ASCII code for a single quote.

See below for more advanced string literals.

String representation

Strings are represented internally as UTF-8 byte sequences. This is visible when requesting the length, which is given in bytes with the size getter:

  print "Hello!".size  // Prints "6", the number of bytes.
  print "Hellö!".size  // Prints "7", the number of bytes.

It is also visible when using [] to index into a string:

  print "Hello!"[0]  // Prints "72", the ASCII code for 'H'.
  print "Hellö!"[4]  // Prints "246", the Unicode code point for 'ö'.

Since some UTF-8 sequences consist of multiple bytes, there are some index points that do not correspond to a character. At these points, the [] indexing operation returns null. For example, the Euro sign is coded as three bytes:

  print "€2".size  // Prints "4", the number of bytes in the string.
  print "€2"[0]    // Prints "8364", the Unicode code for '€'.
  print "€2"[1]    // Prints "null", no character starts at byte index 1.
  print "€2"[2]    // Prints "null", no character starts at byte index 2.
  print "€2"[3]    // Prints "50", the ASCII code for '2'.

Since null is falsy we can index only the valid code points quite simply. The following program iterates over the 5-character, 6-byte string, printing 163, 50, 46, 50, 52:

  STR ::= "£2.24"
  for i := 0; i < STR.size; i++:
    ch := STR[i]
    if ch:  // Only handle non-null.
      print ch

In this case it would be simpler to use the do --runes method:

  STR ::= "£2.24"
  STR.do --runes:
    print it

Toit does not allow the construction of illegal UTF-8 sequences. For example when converting byte arrays to strings:

  // The following examples use the #[...] syntax for
  // byte array literals.
  // Create a single-character string containing an 'A':
  s1/string := #[65].to_string
  // Create a string with the word "Hello":
  s2/string := #[72, 101, 108, 108, 111].to_string
  // Null characters are allowed in strings:
  s3/string := #[0].to_string
  // Bytes above 127 are part of multi-byte UTF-8 sequences
  // and cannot stand alone:
  s4/string := #[128].to_string // Error: Not legal UTF-8!
  s5/string := #[0xc0, 0xbb].to_string // Error: Overlong encoding!

Surrogates are also not legal in Toit strings.

The raw UTF-8 bytes in a Toit string can be accessed with at --raw.

  // Print all 10 UTF-8 bytes in the 8-character string.
  s6 := "RöckDöts"
  list := []  // Empty list.
  s6.size.repeat: | index |  // Index ranges from 0 to 10.
    list.add (s6.at --raw index)
  print list  // >> [82, 195, 182, 99, 107, 68, 195, 182, 116, 115]

Alternatively, the string's UTF-8 bytes can be written to a byte array with the to_byte_array or write_to_byte_array methods.

Slices of strings

Toit supports the [x..y] notation for slices of strings. This creates a new string consisting of bytes x to y, where x is inclusive and y is exclusive:

  hello ::= "Hello"
  print hello[2..4] // Prints "ll", characters 2 to 4 of "Hello".
  two_euro := "€2"
  print two_euro[0..2] // Error, this 2-byte slice cuts the '€' character!
  print two_euro[0..3] // Prints "€", these 3 bytes make up the '€' character.

The first index can be omitted, in which case the slice starts at the beginning of the source string. If the second index is omitted, the slice ends at the end of the source string.

  hello_world := "Hello, World!"
  print hello_world[..5] // >> Hello
  print hello_world[7..] // >> World!

Multi-line strings

For strings that span multiple lines, newlines can be inserted with \n, but it is often clearer to use the triple-quotes syntax, """...""".

help_message := """
  Toit CLI command
      -h help
      -d device name
      -p prefix
  """
main:
  print help_message

The multi-line strings trim indentation-based white-space (see below). Therefore the above program will output the following:

Toit CLI command
    -h help
    -d device name
    -p prefix
Toit removes the first newline if it is immediately after the `"""`.
You should almost always put a newline after `"""` if you are using the triple quotes for multi-line strings, since it makes the indentation rules much easier to understand.

In multi-line strings there are some extra escapes that can be used.

\ followed by a newline (ASCII 10)Removes a newline in a multi-line string.
\ followed by a carriage return (ASCII 13)Removes a newline in a multi-line string. If followed by a newline, that one is removed as well.
\sSpace (see below for a detailed explanation).

Toit removes the indentation of a multi-line string. It looks for the line with the smallest indentation and removes that amount of indentation for each line.

If a space (or several spaces) are needed in front of each line, use \s to indicate to the compiler that this space shouldn't be seen as indentation.

main:
  str1 := """
    foo"""
  str2 := """
    \sfoo"""
  print "[" + str1 + "]"  // >> [foo]
  print "[" + str2 + "]"  // >> [ foo]

Note that multi-line strings frequently end with a newline anyway, in which case the last line can be used to indicate how much the indentation is:

main:
  print """
    foo
    bar
  """ // <== indented less than the previous lines. As such
      // it gives the indentation for the other lines.

  print """
    foo
    bar
    """

This results in the output:

  foo
  bar
foo
bar

Long string literals

Triple-quoting can also be used for very long strings that don't necessarily contain newlines. In this case you end each line with a backslash, which will cause the parser to ignore the newline at the end of the line.

You should start the string with a non-escaped newline that will be ignored in accordance with the above rules for triple quoted strings.
main:
  print """
    This is a very long line that goes on \
    and on for a very long time. It's easier \
    to put any spaces at the end of the line, \
    but you can put them at the start,\
     making use of the usual\
     indentation-removing rules \
    for triple-quoted strings."""

This prints

This is a very long line that goes on and on for a very long time. It's easier to put any spaces at the end of the line, but you can put them at the start, making use of the usual indentation-removing rules for triple-quoted strings.

String interpolation

Using the dollar sign ('$') we can interpolate variables and other expressions into strings:

// Print:
//   Little pig 1
//   Little pig 2
//   Little pig 3
main:
  for i := 1; i <= 3; i++:
    print "  Little pig $i"

The variable after the dollar sign is turned into a string by calling its stringify method. The variable stops at the first non-alpha-numeric character, but the expression can be extended using dot or [] notation:

main:
  i := 42
  print "Level $i, the band."        // >> Level 42, the band.
  j := -42
  print "Level $j.abs, the band."    // >> Level 42, the band.
  list := [0, 42, 103]
  print "Level $list[1], the band."  // >> Level 42, the band.

If the variable is followed immediately by alphanumeric characters or we want a more complicated interpolation expression we can put parentheses, (), around an arbitrary expression:

main:
  print "Level $(6 * 7), the band." // >> Level 42, the band.
  j := -42
  print "Level $(-j), the band."    // >> Level 42, the band.
  i := 42
  print "Almost $(i)mm."            // >> Almost 42mm.

The string interpolation feature also has support for some printf-style formatting options. Unlike printf the formatting directives are directly adjacent to the value they are formatting, so there is no need to count function arguments:

import math show PI

main:
  // %f can be used to specify the number of decimals for
  // floating point numbers:
  print "PI $(%.2f PI)"         // >> 3.14
  // %x is for hexadecimal numbers:
  print "In hex 0x$(%x 42)."    // >> In hex 0x2a.
  // %02x pads a hex number with zeros:
  print "CR is 0x$(%02x 0xd)."  // >> CR is 0x0d.
  // %4d pads a decimal number with spaces:
  // Prints:
  // >   0<
  // >   1<
  // >   8<
  // >  27<
  // >  64<
  // > 125<
  6.repeat:
    print ">$(%4d it * it * it)<"
  // %c inserts a character by its Unicode code point (rune):
  // Prints: Life, the Universe and *.
  print "Life, the Universe and $(%c 42)."
  // %o writes in octal:
  print "The VAX answer: $(%03o 42)." // >> The VAX answer: 052.

For other bases than 8, 10, and 16, see stringify.

The full set of formatting options includes left-padding, and center-padding. See the documentation on the static string method, format.