The data_encoding
library#
Throughout the Tezos protocol, data is serialized so that it can be used via RPC, written to disk, or placed in a block. This serialization/de-serialization is handled via the data-encoding library by providing a set primitive encodings and a variety of combinators.
Examples/Tutorial#
The following is a very brief introduction to the data-encoding library.
Note
A more complete documentation, including a full tutorial with many examples and a reference of all available combinators, is available in data-encoding’s API documentation.
Encoding an integer#
Integers are defined as other concrete data types with a generic
encoding type type 'a encoding
. This means that it is an encoding
to/from type int
. There are a variety of ways to encode an integer,
depending on what binary serialization you want to achieve:
Data_encoding.int8
Data_encoding.uint8
Data_encoding.int16
Data_encoding.uint16
Data_encoding.int31
Data_encoding.int32
Data_encoding.int64
For example, an encoding that represents a 31 bit integer has type
Data_encoding.int31 = int Data_encoding.encoding
.
let int31_encoding = Data_encoding.int31
Encoding an object#
Encoding a single integer is fairly uninteresting. The data-encoding library provides a number of combinators that can be used to build more complicated objects. Consider the type that represents an interval from the first number to the second:
type interval = int64 * int64
We can define an encoding for this type as:
let interval_encoding =
Data_encoding.(obj2 (req "min" int64) (req "max" int64))
In the example above we construct a new value interval_encoding
by
combining two int64
integers using the obj2
(object with two fields)
constructor.
The library provides different constructors, i.e. for objects that have
no data (Data_encoding.empty
), constructors for object up to 10
fields, constructors for tuples, list, etc.
These are serialized to binary by converting each internal object to binary and placing them in the order of the original object and to JSON as a JSON object with field names.
Lists, arrays, and options#
List, arrays and options types can be built on top of ground data types.
type interval_list = interval list
type interval_array = interval array
type interval_option = interval option
And the encoders for these types as
let interval_list_encoding = Data_encoding.list interval_encoding
let interval_array_encoding = Data_encoding.array interval_encoding
let interval_option_encoding = Data_encoding.option interval_encoding
Union types#
The Octez codebase makes heavy use of variant types. Consider the following variant type:
type variant = B of bool
| S of string
Encoding for this types can be expressed as:
let variant_encoding =
let open Data_encoding in
union ~tag_size:`Uint8
[ case ~title:"B" (Tag 0) bool
(function B b -> Some b | _ -> None)
(fun b -> B b) ;
case ~title:"S" (Tag 1) string
(function S s -> Some s | _ -> None)
(fun s -> S s) ]
This variant encoding is a bit more complicated. Let’s look at the parts of the encoding:
We include an optimization hint to the binary encoding to inform it of the number of elements we expect in the tag. In most cases, we can use
`Uint8
, which allows you to have up to 256 possible cases (default).We provide a function to wrap the datatype. The encoding works by repeatedly trying to decode the datatype using these functions until one returns
Some payload
. This payload is then encoded using the dataencoding specified.We specify a function from the encoded type to the actual datatype.
Since the library does not provide an exhaustive check on these constructors, the user must be careful when constructing union types to avoid unfortunate runtime failures.
How the Dataencoding module works#
This section is 100% optional. You do not need to understand this section to use the library.
The library uses GADTs to provide type-safe serialization/de-serialization. From there, a runtime representation of JSON objects is parsed into the type-safe version.
First we define an untyped JSON AST:
type json =
[ `O of (string * json) list
| `Bool of bool
| `Float of float
| `A of json list
| `Null
| `String of string ]
This is then parsed into a typed AST (we eliminate several cases for clarity):
type 'a desc =
| Null : unit desc
| Empty : unit desc
| Bool : bool desc
| Int64 : Int64.t desc
| Float : float desc
| Bytes : Kind.length -> Bytes.t desc
| String : Kind.length -> string desc
| String_enum : Kind.length * (string * 'a) list -> 'a desc
| Array : 'a t -> 'a array desc
| List : 'a t -> 'a list desc
| Obj : 'a field -> 'a desc
| Objs : Kind.t * 'a t * 'b t -> ('a * 'b) desc
| Tup : 'a t -> 'a desc
| Union : Kind.t * tag_size * 'a case list -> 'a desc
| Mu : Kind.enum * string * ('a t -> 'a t) -> 'a desc
| Conv :
{ proj : ('a -> 'b) ;
inj : ('b -> 'a) ;
encoding : 'b t ;
schema : Json_schema.schema option } -> 'a desc
| Describe :
{ title : string option ;
description : string option ;
encoding : 'a t } -> 'a desc
| Def : { name : string ;
encoding : 'a t } -> 'a desc
The first few constructors define all ground types.
The constructors for
Bytes
,String
andString_enum
include a length field in order to provide safe binary serialization.The constructors for
Array
andList
are used by the combinators we saw earlier.The
Obj
andObjs
constructors create JSON objects. These are wrapped in theConv
constructor to remove nesting that results when these constructors are used naively.The
Mu
constructor is used to create self-referential definitions.The
Conv
constructor allows you to clean up a nested definition or compute another type from an existing one.The
Describe
andDef
constructors are used to add documentation
The library also provides various wrappers and convenience functions to make constructing these objects easier. Reading the documentation in the mli file should orient you on how to use these functions.