You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1050 lines
47 KiB
Markdown
1050 lines
47 KiB
Markdown
4 months ago
|
# Rapid YAML
|
||
|
[![MIT Licensed](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/biojppm/rapidyaml/blob/master/LICENSE.txt)
|
||
|
[![release](https://img.shields.io/github/v/release/biojppm/rapidyaml?color=g&include_prereleases&label=release%20&sort=semver)](https://github.com/biojppm/rapidyaml/releases)
|
||
|
[![PyPI](https://img.shields.io/pypi/v/rapidyaml?color=g)](https://pypi.org/project/rapidyaml/)
|
||
|
[![Docs](https://img.shields.io/badge/docs-docsforge-blue)](https://rapidyaml.docsforge.com/)
|
||
|
[![Gitter](https://badges.gitter.im/rapidyaml/community.svg)](https://gitter.im/rapidyaml/community)
|
||
|
|
||
|
[![test](https://github.com/biojppm/rapidyaml/workflows/test/badge.svg?branch=master)](https://github.com/biojppm/rapidyaml/actions)
|
||
|
[![Coveralls](https://coveralls.io/repos/github/biojppm/rapidyaml/badge.svg?branch=master)](https://coveralls.io/github/biojppm/rapidyaml)
|
||
|
[![Codecov](https://codecov.io/gh/biojppm/rapidyaml/branch/master/graph/badge.svg?branch=master)](https://codecov.io/gh/biojppm/rapidyaml)
|
||
|
[![Total alerts](https://img.shields.io/lgtm/alerts/g/biojppm/rapidyaml.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/biojppm/rapidyaml/alerts/)
|
||
|
[![Language grade: C/C++](https://img.shields.io/lgtm/grade/cpp/g/biojppm/rapidyaml.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/biojppm/rapidyaml/context:cpp)
|
||
|
|
||
|
|
||
|
Or ryml, for short. ryml is a C++ library to parse and emit YAML,
|
||
|
and do it fast, on everything from x64 to bare-metal chips without
|
||
|
operating system. (If you are looking to use your programs with a YAML tree
|
||
|
as a configuration tree with override facilities, take a look at
|
||
|
[c4conf](https://github.com/biojppm/c4conf)).
|
||
|
|
||
|
ryml parses both read-only and in-situ source buffers; the resulting
|
||
|
data nodes hold only views to sub-ranges of the source buffer. No
|
||
|
string copies or duplications are done, and no virtual functions are
|
||
|
used. The data tree is a flat index-based structure stored in a single
|
||
|
array. Serialization happens only at your direct request, after
|
||
|
parsing / before emitting. Internally, the data tree representation
|
||
|
stores only string views and has no knowledge of types, but of course,
|
||
|
every node can have a YAML type tag. ryml makes it easy and fast to
|
||
|
read and modify the data tree.
|
||
|
|
||
|
ryml is available as a single header file, or it can be used as a
|
||
|
simple library with cmake -- both separately (ie
|
||
|
build->install->`find_package()`) or together with your project (ie with
|
||
|
`add_subdirectory()`). (See below for examples).
|
||
|
|
||
|
ryml can use custom global and per-tree memory allocators and error
|
||
|
handler callbacks, and is exception-agnostic. ryml provides a default
|
||
|
implementation for the allocator (using `std::malloc()`) and error
|
||
|
handlers (using using `std::abort()` is provided, but you can opt out
|
||
|
and provide your own memory allocation and eg, exception-throwing
|
||
|
callbacks.
|
||
|
|
||
|
ryml does not depend on the STL, ie, it does not use any std container
|
||
|
as part of its data structures), but it can serialize and deserialize
|
||
|
these containers into the data tree, with the use of optional
|
||
|
headers. ryml ships with [c4core](https://github.com/biojppm/c4core) a
|
||
|
small C++ utilities multiplatform library.
|
||
|
|
||
|
ryml is written in C++11, and compiles cleanly with:
|
||
|
* Visual Studio 2015 and later
|
||
|
* clang++ 3.9 and later
|
||
|
* g++ 4.8 and later
|
||
|
* Intel Compiler
|
||
|
|
||
|
ryml is [extensively unit-tested in Linux, Windows and
|
||
|
MacOS](https://github.com/biojppm/rapidyaml/actions). The tests cover
|
||
|
x64, x86, wasm (emscripten), arm, aarch64, ppc64le and s390x
|
||
|
architectures, and include analysing ryml with:
|
||
|
* valgrind
|
||
|
* clang-tidy
|
||
|
* clang sanitizers:
|
||
|
* memory
|
||
|
* address
|
||
|
* undefined behavior
|
||
|
* thread
|
||
|
* [LGTM.com](https://lgtm.com/projects/g/biojppm/rapidyaml)
|
||
|
|
||
|
ryml also [runs in
|
||
|
bare-metal](https://github.com/biojppm/rapidyaml/issues/193), and
|
||
|
[RISC-V
|
||
|
architectures](https://github.com/biojppm/c4core/pull/69). Both of
|
||
|
these are pending implementation of CI actions for continuous
|
||
|
validation, but ryml has been proven to work there.
|
||
|
|
||
|
ryml is [available in Python](https://pypi.org/project/rapidyaml/),
|
||
|
and can very easily be compiled to JavaScript through emscripten (see
|
||
|
below).
|
||
|
|
||
|
See also [the changelog](https://github.com/biojppm/rapidyaml/tree/master/changelog)
|
||
|
and [the roadmap](https://github.com/biojppm/rapidyaml/tree/master/ROADMAP.md).
|
||
|
|
||
|
<!-- endpythonreadme -->
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Table of contents
|
||
|
* [Is it rapid?](#is-it-rapid)
|
||
|
* [Comparison with yaml-cpp](#comparison-with-yaml-cpp)
|
||
|
* [Performance reading JSON](#performance-reading-json)
|
||
|
* [Performance emitting](#performance-emitting)
|
||
|
* [Quick start](#quick-start)
|
||
|
* [Using ryml in your project](#using-ryml-in-your-project)
|
||
|
* [Package managers](#package-managers)
|
||
|
* [Single header file](#single-header-file)
|
||
|
* [As a library](#as-a-library)
|
||
|
* [Quickstart samples](#quickstart-samples)
|
||
|
* [CMake build settings for ryml](#cmake-build-settings-for-ryml)
|
||
|
* [Forcing ryml to use a different c4core version](#forcing-ryml-to-use-a-different-c4core-version)
|
||
|
* [Other languages](#other-languages)
|
||
|
* [JavaScript](#javascript)
|
||
|
* [Python](#python)
|
||
|
* [YAML standard conformance](#yaml-standard-conformance)
|
||
|
* [Test suite status](#test-suite-status)
|
||
|
* [Known limitations](#known-limitations)
|
||
|
* [Alternative libraries](#alternative-libraries)
|
||
|
* [License](#license)
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Is it rapid?
|
||
|
|
||
|
You bet! On a i7-6800K CPU @3.40GHz:
|
||
|
* ryml parses YAML at about ~150MB/s on Linux and ~100MB/s on Windows (vs2017).
|
||
|
* **ryml parses JSON at about ~450MB/s on Linux**, faster than sajson (didn't
|
||
|
try yet on Windows).
|
||
|
* compared against the other existing YAML libraries for C/C++:
|
||
|
* ryml is in general between 2 and 3 times faster than [libyaml](https://github.com/yaml/libyaml)
|
||
|
* ryml is in general between 10 and 70 times faster than
|
||
|
[yaml-cpp](https://github.com/jbeder/yaml-cpp), and in some cases as
|
||
|
much as 100x and [even
|
||
|
200x](https://github.com/biojppm/c4core/pull/16#issuecomment-700972614) faster.
|
||
|
|
||
|
[Here's the benchmark](./bm/bm_parse.cpp). Using different
|
||
|
approaches within ryml (in-situ/read-only vs. with/without reuse), a YAML /
|
||
|
JSON buffer is repeatedly parsed, and compared against other libraries.
|
||
|
|
||
|
### Comparison with yaml-cpp
|
||
|
|
||
|
The first result set is for Windows, and is using a [appveyor.yml config
|
||
|
file](./bm/cases/appveyor.yml). A comparison of these results is
|
||
|
summarized on the table below:
|
||
|
|
||
|
| Read rates (MB/s) | ryml | yamlcpp | compared |
|
||
|
|------------------------------|--------|---------|--------------|
|
||
|
| appveyor / vs2017 / Release | 101.5 | 5.3 | 20x / 5.2% |
|
||
|
| appveyor / vs2017 / Debug | 6.4 | 0.0844 | 76x / 1.3% |
|
||
|
|
||
|
|
||
|
The next set of results is taken in Linux, comparing g++ 8.2 and clang++ 7.0.1 in
|
||
|
parsing a YAML buffer from a [travis.yml config
|
||
|
file](./bm/cases/travis.yml) or a JSON buffer from a [compile_commands.json
|
||
|
file](./bm/cases/compile_commands.json). You
|
||
|
can [see the full results here](./bm/results/parse.linux.i7_6800K.md).
|
||
|
Summarizing:
|
||
|
|
||
|
| Read rates (MB/s) | ryml | yamlcpp | compared |
|
||
|
|-----------------------------|--------|---------|------------|
|
||
|
| json / clang++ / Release | 453.5 | 15.1 | 30x / 3% |
|
||
|
| json / g++ / Release | 430.5 | 16.3 | 26x / 4% |
|
||
|
| json / clang++ / Debug | 61.9 | 1.63 | 38x / 3% |
|
||
|
| json / g++ / Debug | 72.6 | 1.53 | 47x / 2% |
|
||
|
| travis / clang++ / Release | 131.6 | 8.08 | 16x / 6% |
|
||
|
| travis / g++ / Release | 176.4 | 8.23 | 21x / 5% |
|
||
|
| travis / clang++ / Debug | 10.2 | 1.08 | 9x / 1% |
|
||
|
| travis / g++ / Debug | 12.5 | 1.01 | 12x / 8% |
|
||
|
|
||
|
The 450MB/s read rate for JSON puts ryml squarely in the same ballpark
|
||
|
as [RapidJSON](https://github.com/Tencent/rapidjson) and other fast json
|
||
|
readers
|
||
|
([data from here](https://lemire.me/blog/2018/05/03/how-fast-can-you-parse-json/)).
|
||
|
Even parsing full YAML is at ~150MB/s, which is still in that performance
|
||
|
ballpark, albeit at its lower end. This is something to be proud of, as the
|
||
|
YAML specification is much more complex than JSON: [23449 vs 1969 words](https://www.arp242.net/yaml-config.html#its-pretty-complex).
|
||
|
|
||
|
|
||
|
### Performance reading JSON
|
||
|
|
||
|
So how does ryml compare against other JSON readers? Well, it's one of the
|
||
|
fastest!
|
||
|
|
||
|
The benchmark is the [same as above](./bm/parse.cpp), and it is reading
|
||
|
the [compile_commands.json](./bm/cases/compile_commands.json), The `_arena`
|
||
|
suffix notes parsing a read-only buffer (so buffer copies are performed),
|
||
|
while the `_inplace` suffix means that the source buffer can be parsed in
|
||
|
place. The `_reuse` means the data tree and/or parser are reused on each
|
||
|
benchmark repeat.
|
||
|
|
||
|
Here's what we get with g++ 8.2:
|
||
|
|
||
|
| Benchmark | Release,MB/s | Debug,MB/s |
|
||
|
|:----------------------|-------------:|------------:|
|
||
|
| rapidjson_arena | 509.9 | 43.4 |
|
||
|
| rapidjson_inplace | 1329.4 | 68.2 |
|
||
|
| sajson_inplace | 434.2 | 176.5 |
|
||
|
| sajson_arena | 430.7 | 175.6 |
|
||
|
| jsoncpp_arena | 183.6 | ? 187.9 |
|
||
|
| nlohmann_json_arena | 115.8 | 21.5 |
|
||
|
| yamlcpp_arena | 16.6 | 1.6 |
|
||
|
| libyaml_arena | 113.9 | 35.7 |
|
||
|
| libyaml_arena_reuse | 114.6 | 35.9 |
|
||
|
| ryml_arena | 388.6 | 36.9 |
|
||
|
| ryml_inplace | 393.7 | 36.9 |
|
||
|
| ryml_arena_reuse | 446.2 | 74.6 |
|
||
|
| ryml_inplace_reuse | 457.1 | 74.9 |
|
||
|
|
||
|
You can verify that (at least for this test) ryml beats most json
|
||
|
parsers at their own game, with the only exception of
|
||
|
[rapidjson](https://github.com/Tencent/rapidjson). And actually, in
|
||
|
Debug, [rapidjson](https://github.com/Tencent/rapidjson) is slower
|
||
|
than ryml, and [sajson](https://github.com/chadaustin/sajson)
|
||
|
manages to be faster (but not sure about jsoncpp; need to scrutinize there
|
||
|
the suspicious fact that the Debug result is faster than the Release result).
|
||
|
|
||
|
|
||
|
### Performance emitting
|
||
|
|
||
|
[Emitting benchmarks](bm/bm_emit.cpp) also show similar speedups from
|
||
|
the existing libraries, also anecdotally reported by some users [(eg,
|
||
|
here's a user reporting 25x speedup from
|
||
|
yaml-cpp)](https://github.com/biojppm/rapidyaml/issues/28#issue-553855608). Also, in
|
||
|
some cases (eg, block folded multiline scalars), the speedup is as
|
||
|
high as 200x (eg, 7.3MB/s -> 1.416MG/s).
|
||
|
|
||
|
|
||
|
### CI results and request for files
|
||
|
|
||
|
While a more effective way of showing the benchmark results is not
|
||
|
available yet, you can browse through the [runs of the benchmark
|
||
|
workflow in the
|
||
|
CI](https://github.com/biojppm/rapidyaml/actions/workflows/benchmarks.yml)
|
||
|
to scroll through the results for yourself.
|
||
|
|
||
|
Also, if you have a case where ryml behaves very nicely or not as nicely as
|
||
|
claimed above, we would definitely like to see it! Please submit a pull request
|
||
|
adding the file to [bm/cases](bm/cases), or just send us the files.
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Quick start
|
||
|
|
||
|
If you're wondering whether ryml's speed comes at a usage cost, you
|
||
|
need not: with ryml, you can have your cake and eat it too. Being
|
||
|
rapid is definitely NOT the same as being unpractical, so ryml was
|
||
|
written with easy AND efficient usage in mind, and comes with a two
|
||
|
level API for accessing and traversing the data tree.
|
||
|
|
||
|
The following snippet is a quick overview taken from [the quickstart
|
||
|
sample](samples/quickstart.cpp). After cloning ryml (don't forget the
|
||
|
`--recursive` flag for git), you can very
|
||
|
easily build and run this executable using any of the build samples,
|
||
|
eg the [`add_subdirectory()` sample](samples/add_subdirectory/).
|
||
|
|
||
|
```c++
|
||
|
// Parse YAML code in place, potentially mutating the buffer.
|
||
|
// It is also possible to:
|
||
|
// - parse a read-only buffer using parse_in_arena()
|
||
|
// - reuse an existing tree (advised)
|
||
|
// - reuse an existing parser (advised)
|
||
|
char yml_buf[] = "{foo: 1, bar: [2, 3], john: doe}";
|
||
|
ryml::Tree tree = ryml::parse_in_place(ryml::substr(yml_buf));
|
||
|
|
||
|
// Note: it will always be significantly faster to use mutable
|
||
|
// buffers and reuse tree+parser.
|
||
|
//
|
||
|
// Below you will find samples that show how to achieve reuse; but
|
||
|
// please note that for brevity and clarity, many of the examples
|
||
|
// here are parsing immutable buffers, and not reusing tree or
|
||
|
// parser.
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// API overview
|
||
|
|
||
|
// ryml has a two-level API:
|
||
|
//
|
||
|
// The lower level index API is based on the indices of nodes,
|
||
|
// where the node's id is the node's position in the tree's data
|
||
|
// array. This API is very efficient, but somewhat difficult to use:
|
||
|
size_t root_id = tree.root_id();
|
||
|
size_t bar_id = tree.find_child(root_id, "bar"); // need to get the index right
|
||
|
CHECK(tree.is_map(root_id)); // all of the index methods are in the tree
|
||
|
CHECK(tree.is_seq(bar_id)); // ... and receive the subject index
|
||
|
|
||
|
// The node API is a lightweight abstraction sitting on top of the
|
||
|
// index API, but offering a much more convenient interaction:
|
||
|
ryml::NodeRef root = tree.rootref();
|
||
|
ryml::NodeRef bar = tree["bar"];
|
||
|
CHECK(root.is_map());
|
||
|
CHECK(bar.is_seq());
|
||
|
// NodeRef is a lightweight handle to the tree and associated id:
|
||
|
CHECK(root.tree() == &tree); // NodeRef points at its tree, WITHOUT refcount
|
||
|
CHECK(root.id() == root_id); // NodeRef's id is the index of the node
|
||
|
CHECK(bar.id() == bar_id); // NodeRef's id is the index of the node
|
||
|
|
||
|
// The node API translates very cleanly to the index API, so most
|
||
|
// of the code examples below are using the node API.
|
||
|
|
||
|
// One significant point of the node API is that it holds a raw
|
||
|
// pointer to the tree. Care must be taken to ensure the lifetimes
|
||
|
// match, so that a node will never access the tree after the tree
|
||
|
// went out of scope.
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// To read the parsed tree
|
||
|
|
||
|
// Node::operator[] does a lookup, is O(num_children[node]).
|
||
|
// maps use string keys, seqs use integral keys.
|
||
|
CHECK(tree["foo"].is_keyval());
|
||
|
CHECK(tree["foo"].key() == "foo");
|
||
|
CHECK(tree["foo"].val() == "1");
|
||
|
CHECK(tree["bar"].is_seq());
|
||
|
CHECK(tree["bar"].has_key());
|
||
|
CHECK(tree["bar"].key() == "bar");
|
||
|
CHECK(tree["bar"][0].val() == "2");
|
||
|
CHECK(tree["bar"][1].val() == "3");
|
||
|
CHECK(tree["john"].val() == "doe");
|
||
|
// An integral key is the position of the child within its parent,
|
||
|
// so even maps can also use int keys, if the key position is
|
||
|
// known.
|
||
|
CHECK(tree[0].id() == tree["foo"].id());
|
||
|
CHECK(tree[1].id() == tree["bar"].id());
|
||
|
CHECK(tree[2].id() == tree["john"].id());
|
||
|
// Tree::operator[](int) searches a root child by its position.
|
||
|
CHECK(tree[0].id() == tree["foo"].id()); // 0: first child of root
|
||
|
CHECK(tree[1].id() == tree["bar"].id()); // 1: first child of root
|
||
|
CHECK(tree[2].id() == tree["john"].id()); // 2: first child of root
|
||
|
// NodeRef::operator[](int) searches a node child by its position
|
||
|
// on __the node__'s children list:
|
||
|
CHECK(bar[0].val() == "2"); // 0 means first child of bar
|
||
|
CHECK(bar[1].val() == "3"); // 1 means second child of bar
|
||
|
// NodeRef::operator[](string):
|
||
|
// A string key is the key of the node: lookup is by name. So it
|
||
|
// is only available for maps, and it is NOT available for seqs,
|
||
|
// since seq members do not have keys.
|
||
|
CHECK(tree["foo"].key() == "foo");
|
||
|
CHECK(tree["bar"].key() == "bar");
|
||
|
CHECK(tree["john"].key() == "john");
|
||
|
CHECK(bar.is_seq());
|
||
|
// CHECK(bar["BOOM!"].is_seed()); // error, seqs do not have key lookup
|
||
|
|
||
|
// Note that maps can also use index keys as well as string keys:
|
||
|
CHECK(root["foo"].id() == root[0].id());
|
||
|
CHECK(root["bar"].id() == root[1].id());
|
||
|
CHECK(root["john"].id() == root[2].id());
|
||
|
|
||
|
// Please note that since a ryml tree uses indexed linked lists for storing
|
||
|
// children, the complexity of `Tree::operator[csubstr]` and
|
||
|
// `Tree::operator[size_t]` is linear on the number of root children. If you use
|
||
|
// it with a large tree where the root has many children, you may get a
|
||
|
// performance hit. To avoid this hit, you can create your own accelerator
|
||
|
// structure. For example, before doing a lookup, do a single traverse at the
|
||
|
// root level to fill an `std::map<csubstr,size_t>` mapping key names to node
|
||
|
// indices; with a node index, a lookup (via `Tree::get()`) is O(1), so this way
|
||
|
// you can get O(log n) lookup from a key.
|
||
|
//
|
||
|
// As for `NodeRef`, the difference from `NodeRef::operator[]`
|
||
|
// to `Tree::operator[]` is that the latter refers to the root node, whereas
|
||
|
// the former can be invoked on any node. But the lookup process is the same for
|
||
|
// both and their algorithmic complexity is the same: they are both linear in
|
||
|
// the number of direct children; but depending on the data, that number may
|
||
|
// be very different from one to another.
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Hierarchy:
|
||
|
|
||
|
{
|
||
|
ryml::NodeRef foo = root.first_child();
|
||
|
ryml::NodeRef john = root.last_child();
|
||
|
CHECK(tree.size() == 6); // O(1) number of nodes in the tree
|
||
|
CHECK(root.num_children() == 3); // O(num_children[root])
|
||
|
CHECK(foo.num_siblings() == 3); // O(num_children[parent(foo)])
|
||
|
CHECK(foo.parent().id() == root.id()); // parent() is O(1)
|
||
|
CHECK(root.first_child().id() == root["foo"].id()); // first_child() is O(1)
|
||
|
CHECK(root.last_child().id() == root["john"].id()); // last_child() is O(1)
|
||
|
CHECK(john.first_sibling().id() == foo.id());
|
||
|
CHECK(foo.last_sibling().id() == john.id());
|
||
|
// prev_sibling(), next_sibling(): (both are O(1))
|
||
|
CHECK(foo.num_siblings() == root.num_children());
|
||
|
CHECK(foo.prev_sibling().id() == ryml::NONE); // foo is the first_child()
|
||
|
CHECK(foo.next_sibling().key() == "bar");
|
||
|
CHECK(foo.next_sibling().next_sibling().key() == "john");
|
||
|
CHECK(foo.next_sibling().next_sibling().next_sibling().id() == ryml::NONE); // john is the last_child()
|
||
|
}
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Iterating:
|
||
|
{
|
||
|
ryml::csubstr expected_keys[] = {"foo", "bar", "john"};
|
||
|
// iterate children using the high-level node API:
|
||
|
{
|
||
|
size_t count = 0;
|
||
|
for(ryml::NodeRef const& child : root.children())
|
||
|
CHECK(child.key() == expected_keys[count++]);
|
||
|
}
|
||
|
// iterate siblings using the high-level node API:
|
||
|
{
|
||
|
size_t count = 0;
|
||
|
for(ryml::NodeRef const& child : root["foo"].siblings())
|
||
|
CHECK(child.key() == expected_keys[count++]);
|
||
|
}
|
||
|
// iterate children using the lower-level tree index API:
|
||
|
{
|
||
|
size_t count = 0;
|
||
|
for(size_t child_id = tree.first_child(root_id); child_id != ryml::NONE; child_id = tree.next_sibling(child_id))
|
||
|
CHECK(tree.key(child_id) == expected_keys[count++]);
|
||
|
}
|
||
|
// iterate siblings using the lower-level tree index API:
|
||
|
// (notice the only difference from above is in the loop
|
||
|
// preamble, which calls tree.first_sibling(bar_id) instead of
|
||
|
// tree.first_child(root_id))
|
||
|
{
|
||
|
size_t count = 0;
|
||
|
for(size_t child_id = tree.first_sibling(bar_id); child_id != ryml::NONE; child_id = tree.next_sibling(child_id))
|
||
|
CHECK(tree.key(child_id) == expected_keys[count++]);
|
||
|
}
|
||
|
}
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Gotchas:
|
||
|
CHECK(!tree["bar"].has_val()); // seq is a container, so no val
|
||
|
CHECK(!tree["bar"][0].has_key()); // belongs to a seq, so no key
|
||
|
CHECK(!tree["bar"][1].has_key()); // belongs to a seq, so no key
|
||
|
//CHECK(tree["bar"].val() == BOOM!); // ... so attempting to get a val is undefined behavior
|
||
|
//CHECK(tree["bar"][0].key() == BOOM!); // ... so attempting to get a key is undefined behavior
|
||
|
//CHECK(tree["bar"][1].key() == BOOM!); // ... so attempting to get a key is undefined behavior
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Deserializing: use operator>>
|
||
|
{
|
||
|
int foo = 0, bar0 = 0, bar1 = 0;
|
||
|
std::string john;
|
||
|
root["foo"] >> foo;
|
||
|
root["bar"][0] >> bar0;
|
||
|
root["bar"][1] >> bar1;
|
||
|
root["john"] >> john; // requires from_chars(std::string). see serialization samples below.
|
||
|
CHECK(foo == 1);
|
||
|
CHECK(bar0 == 2);
|
||
|
CHECK(bar1 == 3);
|
||
|
CHECK(john == "doe");
|
||
|
}
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Modifying existing nodes: operator<< vs operator=
|
||
|
|
||
|
// operator= assigns an existing string to the receiving node.
|
||
|
// This pointer will be in effect until the tree goes out of scope
|
||
|
// so beware to only assign from strings outliving the tree.
|
||
|
root["foo"] = "says you";
|
||
|
root["bar"][0] = "-2";
|
||
|
root["bar"][1] = "-3";
|
||
|
root["john"] = "ron";
|
||
|
// Now the tree is _pointing_ at the memory of the strings above.
|
||
|
// That is OK because those are static strings and will outlive
|
||
|
// the tree.
|
||
|
CHECK(root["foo"].val() == "says you");
|
||
|
CHECK(root["bar"][0].val() == "-2");
|
||
|
CHECK(root["bar"][1].val() == "-3");
|
||
|
CHECK(root["john"].val() == "ron");
|
||
|
// WATCHOUT: do not assign from temporary objects:
|
||
|
// {
|
||
|
// std::string crash("will dangle");
|
||
|
// root["john"] = ryml::to_csubstr(crash);
|
||
|
// }
|
||
|
// CHECK(root["john"] == "dangling"); // CRASH! the string was deallocated
|
||
|
|
||
|
// operator<< first serializes the input to the tree's arena, then
|
||
|
// assigns the serialized string to the receiving node. This avoids
|
||
|
// constraints with the lifetime, since the arena lives with the tree.
|
||
|
CHECK(tree.arena().empty());
|
||
|
root["foo"] << "says who"; // requires to_chars(). see serialization samples below.
|
||
|
root["bar"][0] << 20;
|
||
|
root["bar"][1] << 30;
|
||
|
root["john"] << "deere";
|
||
|
CHECK(root["foo"].val() == "says who");
|
||
|
CHECK(root["bar"][0].val() == "20");
|
||
|
CHECK(root["bar"][1].val() == "30");
|
||
|
CHECK(root["john"].val() == "deere");
|
||
|
CHECK(tree.arena() == "says who2030deere"); // the result of serializations to the tree arena
|
||
|
// using operator<< instead of operator=, the crash above is avoided:
|
||
|
{
|
||
|
std::string ok("in_scope");
|
||
|
// root["john"] = ryml::to_csubstr(ok); // don't, will dangle
|
||
|
root["john"] << ryml::to_csubstr(ok); // OK, copy to the tree's arena
|
||
|
}
|
||
|
CHECK(root["john"] == "in_scope"); // OK!
|
||
|
CHECK(tree.arena() == "says who2030deerein_scope"); // the result of serializations to the tree arena
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Adding new nodes:
|
||
|
|
||
|
// adding a keyval node to a map:
|
||
|
CHECK(root.num_children() == 3);
|
||
|
root["newkeyval"] = "shiny and new"; // using these strings
|
||
|
root.append_child() << ryml::key("newkeyval (serialized)") << "shiny and new (serialized)"; // serializes and assigns the serialization
|
||
|
CHECK(root.num_children() == 5);
|
||
|
CHECK(root["newkeyval"].key() == "newkeyval");
|
||
|
CHECK(root["newkeyval"].val() == "shiny and new");
|
||
|
CHECK(root["newkeyval (serialized)"].key() == "newkeyval (serialized)");
|
||
|
CHECK(root["newkeyval (serialized)"].val() == "shiny and new (serialized)");
|
||
|
CHECK( ! root["newkeyval"].key().is_sub(tree.arena())); // it's using directly the static string above
|
||
|
CHECK( ! root["newkeyval"].val().is_sub(tree.arena())); // it's using directly the static string above
|
||
|
CHECK( root["newkeyval (serialized)"].key().is_sub(tree.arena())); // it's using a serialization of the string above
|
||
|
CHECK( root["newkeyval (serialized)"].val().is_sub(tree.arena())); // it's using a serialization of the string above
|
||
|
// adding a val node to a seq:
|
||
|
CHECK(root["bar"].num_children() == 2);
|
||
|
root["bar"][2] = "oh so nice";
|
||
|
root["bar"][3] << "oh so nice (serialized)";
|
||
|
CHECK(root["bar"].num_children() == 4);
|
||
|
CHECK(root["bar"][2].val() == "oh so nice");
|
||
|
CHECK(root["bar"][3].val() == "oh so nice (serialized)");
|
||
|
// adding a seq node:
|
||
|
CHECK(root.num_children() == 5);
|
||
|
root["newseq"] |= ryml::SEQ;
|
||
|
root.append_child() << ryml::key("newseq (serialized)") |= ryml::SEQ;
|
||
|
CHECK(root.num_children() == 7);
|
||
|
CHECK(root["newseq"].num_children() == 0);
|
||
|
CHECK(root["newseq (serialized)"].num_children() == 0);
|
||
|
// adding a map node:
|
||
|
CHECK(root.num_children() == 7);
|
||
|
root["newmap"] |= ryml::MAP;
|
||
|
root.append_child() << ryml::key("newmap (serialized)") |= ryml::SEQ;
|
||
|
CHECK(root.num_children() == 9);
|
||
|
CHECK(root["newmap"].num_children() == 0);
|
||
|
CHECK(root["newmap (serialized)"].num_children() == 0);
|
||
|
// operator[] does not mutate the tree until the returned node is
|
||
|
// written to.
|
||
|
//
|
||
|
// Until such time, the NodeRef object keeps in itself the required
|
||
|
// information to write to the proper place in the tree. This is
|
||
|
// called being in a "seed" state.
|
||
|
//
|
||
|
// This means that passing a key/index which does not exist will
|
||
|
// not mutate the tree, but will instead store (in the node) the
|
||
|
// proper place of the tree to do so if and when it is required.
|
||
|
//
|
||
|
// This is a significant difference from eg, the behavior of
|
||
|
// std::map, which mutates the map immediately within the call to
|
||
|
// operator[].
|
||
|
CHECK(!root.has_child("I am nobody"));
|
||
|
ryml::NodeRef nobody = root["I am nobody"];
|
||
|
CHECK(nobody.valid()); // points at the tree, and a specific place in the tree
|
||
|
CHECK(nobody.is_seed()); // ... but nothing is there yet.
|
||
|
CHECK(!root.has_child("I am nobody")); // same as above
|
||
|
ryml::NodeRef somebody = root["I am somebody"];
|
||
|
CHECK(!root.has_child("I am somebody")); // same as above
|
||
|
CHECK(somebody.valid());
|
||
|
CHECK(somebody.is_seed()); // same as above
|
||
|
somebody = "indeed"; // this will commit to the tree, mutating at the proper place
|
||
|
CHECK(somebody.valid());
|
||
|
CHECK(!somebody.is_seed()); // now the tree has this node, and it is no longer a seed
|
||
|
CHECK(root.has_child("I am somebody"));
|
||
|
CHECK(root["I am somebody"].val() == "indeed");
|
||
|
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Emitting:
|
||
|
|
||
|
// emit to a FILE*
|
||
|
ryml::emit(tree, stdout);
|
||
|
// emit to a stream
|
||
|
std::stringstream ss;
|
||
|
ss << tree;
|
||
|
std::string stream_result = ss.str();
|
||
|
// emit to a buffer:
|
||
|
std::string str_result = ryml::emitrs<std::string>(tree);
|
||
|
// can emit to any given buffer:
|
||
|
char buf[1024];
|
||
|
ryml::csubstr buf_result = ryml::emit(tree, buf);
|
||
|
// now check
|
||
|
ryml::csubstr expected_result = R"(foo: says who
|
||
|
bar:
|
||
|
- 20
|
||
|
- 30
|
||
|
- oh so nice
|
||
|
- oh so nice (serialized)
|
||
|
john: in_scope
|
||
|
newkeyval: shiny and new
|
||
|
newkeyval (serialized): shiny and new (serialized)
|
||
|
newseq: []
|
||
|
newseq (serialized): []
|
||
|
newmap: {}
|
||
|
newmap (serialized): []
|
||
|
I am somebody: indeed
|
||
|
)";
|
||
|
CHECK(buf_result == expected_result);
|
||
|
CHECK(str_result == expected_result);
|
||
|
CHECK(stream_result == expected_result);
|
||
|
// There are many possibilities to emit to buffer;
|
||
|
// please look at the emit sample functions below.
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Dealing with UTF8
|
||
|
ryml::Tree langs = ryml::parse_in_arena(R"(
|
||
|
en: Planet (Gas)
|
||
|
fr: Planète (Gazeuse)
|
||
|
ru: Планета (Газ)
|
||
|
ja: 惑星(ガス)
|
||
|
zh: 行星(气体)
|
||
|
decode this: "\u263A \xE2\x98\xBA"
|
||
|
and this as well: "\u2705 \U0001D11E"
|
||
|
)");
|
||
|
// in-place UTF8 just works:
|
||
|
CHECK(langs["en"].val() == "Planet (Gas)");
|
||
|
CHECK(langs["fr"].val() == "Planète (Gazeuse)");
|
||
|
CHECK(langs["ru"].val() == "Планета (Газ)");
|
||
|
CHECK(langs["ja"].val() == "惑星(ガス)");
|
||
|
CHECK(langs["zh"].val() == "行星(气体)");
|
||
|
// and \x \u \U codepoints are decoded (but only when
|
||
|
// they appear inside double-quoted strings):
|
||
|
CHECK(langs["decode this"].val() == "☺ ☺");
|
||
|
CHECK(langs["and this as well"].val() == "✅ 𝄞");
|
||
|
|
||
|
//------------------------------------------------------------------
|
||
|
// Getting the location of nodes in the source:
|
||
|
ryml::Parser parser;
|
||
|
ryml::Tree tree2 = parser.parse_in_arena("expected.yml", expected_result);
|
||
|
ryml::Location loc = parser.location(tree2["bar"][1]);
|
||
|
CHECK(parser.location_contents(loc).begins_with("30"));
|
||
|
CHECK(loc.line == 3u);
|
||
|
CHECK(loc.col == 4u);
|
||
|
// For further details in location tracking, refer to the sample function.
|
||
|
```
|
||
|
|
||
|
The [quickstart.cpp sample](./samples/quickstart.cpp) (from which the
|
||
|
above overview was taken) has many more detailed examples, and should
|
||
|
be your first port of call to find out any particular point about
|
||
|
ryml's API. It is tested in the CI, and thus has the correct behavior.
|
||
|
There you can find the following subjects being addressed:
|
||
|
|
||
|
```c++
|
||
|
sample_substr(); ///< about ryml's string views (from c4core)
|
||
|
sample_parse_file(); ///< ready-to-go example of parsing a file from disk
|
||
|
sample_parse_in_place(); ///< parse a mutable YAML source buffer
|
||
|
sample_parse_in_arena(); ///< parse a read-only YAML source buffer
|
||
|
sample_parse_reuse_tree(); ///< parse into an existing tree, maybe into a node
|
||
|
sample_parse_reuse_parser(); ///< reuse an existing parser
|
||
|
sample_parse_reuse_tree_and_parser(); ///< how to reuse existing trees and parsers
|
||
|
sample_iterate_trees(); ///< visit individual nodes and iterate through trees
|
||
|
sample_create_trees(); ///< programatically create trees
|
||
|
sample_tree_arena(); ///< interact with the tree's serialization arena
|
||
|
sample_fundamental_types(); ///< serialize/deserialize fundamental types
|
||
|
sample_formatting(); ///< control formatting when serializing/deserializing
|
||
|
sample_base64(); ///< encode/decode base64
|
||
|
sample_user_scalar_types(); ///< serialize/deserialize scalar (leaf/string) types
|
||
|
sample_user_container_types(); ///< serialize/deserialize container (map or seq) types
|
||
|
sample_std_types(); ///< serialize/deserialize STL containers
|
||
|
sample_emit_to_container(); ///< emit to memory, eg a string or vector-like container
|
||
|
sample_emit_to_stream(); ///< emit to a stream, eg std::ostream
|
||
|
sample_emit_to_file(); ///< emit to a FILE*
|
||
|
sample_emit_nested_node(); ///< pick a nested node as the root when emitting
|
||
|
sample_json(); ///< JSON parsing and emitting: notes and constraints
|
||
|
sample_anchors_and_aliases(); ///< deal with YAML anchors and aliases
|
||
|
sample_tags(); ///< deal with YAML type tags
|
||
|
sample_docs(); ///< deal with YAML docs
|
||
|
sample_error_handler(); ///< set a custom error handler
|
||
|
sample_global_allocator(); ///< set a global allocator for ryml
|
||
|
sample_per_tree_allocator(); ///< set per-tree allocators
|
||
|
sample_location_tracking(); ///< track node locations in the parsed source tree
|
||
|
```
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Using ryml in your project
|
||
|
|
||
|
### Package managers
|
||
|
|
||
|
If you opt for package managers, here's where ryml is available so far
|
||
|
(thanks to all the contributors!):
|
||
|
* [vcpkg](https://vcpkg.io/en/packages.html): `vcpkg install ryml`
|
||
|
* Arch Linux/Manjaro:
|
||
|
* [rapidyaml-git (AUR)](https://aur.archlinux.org/packages/rapidyaml-git/)
|
||
|
* [python-rapidyaml-git (AUR)](https://aur.archlinux.org/packages/python-rapidyaml-git/)
|
||
|
* [PyPI](https://pypi.org/project/rapidyaml/)
|
||
|
|
||
|
Although package managers are very useful for quickly getting up to
|
||
|
speed, the advised way is still to bring ryml as a submodule of your
|
||
|
project, building both together. This makes it easy to track any
|
||
|
upstream changes in ryml. Also, ryml is small and quick to build, so
|
||
|
there's not much of a cost for building it with your project.
|
||
|
|
||
|
### Single header file
|
||
|
ryml is provided chiefly as a cmake library project, but it can also
|
||
|
be used as a single header file, and there is a [tool to
|
||
|
amalgamate](./tools/amalgamate.py) the code into a single header
|
||
|
file. The amalgamated header file is provided with each release, but
|
||
|
you can also generate a customized file suiting your particular needs
|
||
|
(or commit):
|
||
|
|
||
|
```console
|
||
|
[user@host rapidyaml]$ python tools/amalgamate.py -h
|
||
|
usage: amalgamate.py [-h] [--c4core | --no-c4core] [--fastfloat | --no-fastfloat] [--stl | --no-stl] [output]
|
||
|
|
||
|
positional arguments:
|
||
|
output output file. defaults to stdout
|
||
|
|
||
|
optional arguments:
|
||
|
-h, --help show this help message and exit
|
||
|
--c4core amalgamate c4core together with ryml. this is the default.
|
||
|
--no-c4core amalgamate c4core together with ryml. the default is --c4core.
|
||
|
--fastfloat enable fastfloat library. this is the default.
|
||
|
--no-fastfloat enable fastfloat library. the default is --fastfloat.
|
||
|
--stl enable stl interop. this is the default.
|
||
|
--no-stl enable stl interop. the default is --stl.
|
||
|
```
|
||
|
|
||
|
The amalgamated header file contains all the function declarations and
|
||
|
definitions. To use it in the project, `#include` the header at will
|
||
|
in any header or source file in the project, but in one source file,
|
||
|
and only in that one source file, `#define` the macro
|
||
|
`RYML_SINGLE_HDR_DEFINE_NOW` **before including the header**. This
|
||
|
will enable the function definitions. For example:
|
||
|
```c++
|
||
|
// foo.h
|
||
|
#include <ryml_all.hpp>
|
||
|
|
||
|
// foo.cpp
|
||
|
// ensure that foo.h is not included before this define!
|
||
|
#define RYML_SINGLE_HDR_DEFINE_NOW
|
||
|
#include <ryml_all.hpp>
|
||
|
```
|
||
|
|
||
|
If you wish to package the single header into a shared library, then
|
||
|
you will need to define the preprocessor symbol `RYML_SHARED` during
|
||
|
compilation.
|
||
|
|
||
|
|
||
|
### As a library
|
||
|
The single header file is a good approach to quickly try the library,
|
||
|
but if you wish to make good use of CMake and its tooling ecosystem,
|
||
|
(and get better compile times), then ryml has you covered.
|
||
|
|
||
|
As with any other cmake library, you have the option to integrate ryml into
|
||
|
your project's build setup, thereby building ryml together with your
|
||
|
project, or -- prior to configuring your project -- you can have ryml
|
||
|
installed either manually or through package managers.
|
||
|
|
||
|
Currently [cmake](https://cmake.org/) is required to build ryml; we
|
||
|
recommend a recent cmake version, at least 3.13.
|
||
|
|
||
|
Note that ryml uses submodules. Take care to use the `--recursive` flag
|
||
|
when cloning the repo, to ensure ryml's submodules are checked out as well:
|
||
|
```bash
|
||
|
git clone --recursive https://github.com/biojppm/rapidyaml
|
||
|
```
|
||
|
If you omit `--recursive`, after cloning you
|
||
|
will have to do `git submodule init` and `git submodule update`
|
||
|
to ensure ryml's submodules are checked out.
|
||
|
|
||
|
### Quickstart samples
|
||
|
|
||
|
These samples show different ways of getting ryml into your application. All the
|
||
|
samples use [the same quickstart executable
|
||
|
source](./samples/quickstart.cpp), but are built in different ways,
|
||
|
showing several alternatives to integrate ryml into your project. We
|
||
|
also encourage you to refer to the [quickstart source](./samples/quickstart.cpp) itself, which
|
||
|
extensively covers most of the functionality that you may want out of
|
||
|
ryml.
|
||
|
|
||
|
Each sample brings a `run.sh` script with the sequence of commands
|
||
|
required to successfully build and run the application (this is a bash
|
||
|
script and runs in Linux and MacOS, but it is also possible to run in
|
||
|
Windows via Git Bash or the WSL). Click on the links below to find out
|
||
|
more about each sample:
|
||
|
|
||
|
| Sample name | ryml is part of build? | cmake file | commands |
|
||
|
|:-------------------|--------------------------|:-------------|:-------------|
|
||
|
| [`singleheader`](./samples/singleheader) | **yes**<br>ryml brought as a single header file,<br>not as a library | [`CMakeLists.txt`](./samples/singleheader/CMakeLists.txt) | [`run.sh`](./samples/singleheader/run.sh) |
|
||
|
| [`singleheaderlib`](./samples/singleheaderlib) | **yes**<br>ryml brought as a library<br>but from the single header file | [`CMakeLists.txt`](./samples/singleheaderlib/CMakeLists.txt) | [`run_shared.sh` (shared library)](./samples/singleheaderlib/run_shared.sh)<br> [`run_static.sh` (static library)](./samples/singleheaderlib/run_static.sh) |
|
||
|
| [`add_subdirectory`](./samples/add_subdirectory) | **yes** | [`CMakeLists.txt`](./samples/add_subdirectory/CMakeLists.txt) | [`run.sh`](./samples/add_subdirectory/run.sh) |
|
||
|
| [`fetch_content`](./samples/fetch_content) | **yes** | [`CMakeLists.txt`](./samples/fetch_content/CMakeLists.txt) | [`run.sh`](./samples/fetch_content/run.sh) |
|
||
|
| [`find_package`](./samples/find_package) | **no**<br>needs prior install or package | [`CMakeLists.txt`](./samples/find_package/CMakeLists.txt) | [`run.sh`](./samples/find_package/run.sh) |
|
||
|
|
||
|
### CMake build settings for ryml
|
||
|
The following cmake variables can be used to control the build behavior of
|
||
|
ryml:
|
||
|
|
||
|
* `RYML_WITH_TAB_TOKENS=ON/OFF`. Enable/disable support for tabs as
|
||
|
valid container tokens after `:` and `-`. Defaults to `OFF`,
|
||
|
because this may cost up to 10% in processing time.
|
||
|
* `RYML_DEFAULT_CALLBACKS=ON/OFF`. Enable/disable ryml's default
|
||
|
implementation of error and allocation callbacks. Defaults to `ON`.
|
||
|
* `RYML_STANDALONE=ON/OFF`. ryml uses
|
||
|
[c4core](https://github.com/biojppm/c4core), a C++ library with low-level
|
||
|
multi-platform utilities for C++. When `RYML_STANDALONE=ON`, c4core is
|
||
|
incorporated into ryml as if it is the same library. Defaults to `ON`.
|
||
|
|
||
|
If you're developing ryml or just debugging problems with ryml itself, the
|
||
|
following cmake variables can be helpful:
|
||
|
* `RYML_DEV=ON/OFF`: a bool variable which enables development targets such as
|
||
|
unit tests, benchmarks, etc. Defaults to `OFF`.
|
||
|
* `RYML_DBG=ON/OFF`: a bool variable which enables verbose prints from
|
||
|
parsing code; can be useful to figure out parsing problems. Defaults to
|
||
|
`OFF`.
|
||
|
|
||
|
#### Forcing ryml to use a different c4core version
|
||
|
|
||
|
ryml is strongly coupled to c4core, and this is reinforced by the fact
|
||
|
that c4core is a submodule of the current repo. However, it is still
|
||
|
possible to use a c4core version different from the one in the repo
|
||
|
(of course, only if there are no incompatibilities between the
|
||
|
versions). You can find out how to achieve this by looking at the
|
||
|
[`custom_c4core` sample](./samples/custom_c4core/CMakeLists.txt).
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Other languages
|
||
|
|
||
|
One of the aims of ryml is to provide an efficient YAML API for other
|
||
|
languages. JavaScript is fully available, and there is already a
|
||
|
cursory implementation for Python using only the low-level API. After
|
||
|
ironing out the general approach, other languages are likely to
|
||
|
follow (all of this is possible because we're using
|
||
|
[SWIG](http://www.swig.org/), which makes it easy to do so).
|
||
|
|
||
|
### JavaScript
|
||
|
|
||
|
A JavaScript+WebAssembly port is available, compiled through [emscripten](https://emscripten.org/).
|
||
|
|
||
|
|
||
|
### Python
|
||
|
|
||
|
(Note that this is a work in progress. Additions will be made and things will
|
||
|
be changed.) With that said, here's an example of the Python API:
|
||
|
|
||
|
```python
|
||
|
import ryml
|
||
|
|
||
|
# ryml cannot accept strings because it does not take ownership of the
|
||
|
# source buffer; only bytes or bytearrays are accepted.
|
||
|
src = b"{HELLO: a, foo: b, bar: c, baz: d, seq: [0, 1, 2, 3]}"
|
||
|
|
||
|
def check(tree):
|
||
|
# for now, only the index-based low-level API is implemented
|
||
|
assert tree.size() == 10
|
||
|
assert tree.root_id() == 0
|
||
|
assert tree.first_child(0) == 1
|
||
|
assert tree.next_sibling(1) == 2
|
||
|
assert tree.first_sibling(5) == 2
|
||
|
assert tree.last_sibling(1) == 5
|
||
|
# use bytes objects for queries
|
||
|
assert tree.find_child(0, b"foo") == 1
|
||
|
assert tree.key(1) == b"foo")
|
||
|
assert tree.val(1) == b"b")
|
||
|
assert tree.find_child(0, b"seq") == 5
|
||
|
assert tree.is_seq(5)
|
||
|
# to loop over children:
|
||
|
for i, ch in enumerate(ryml.children(tree, 5)):
|
||
|
assert tree.val(ch) == [b"0", b"1", b"2", b"3"][i]
|
||
|
# to loop over siblings:
|
||
|
for i, sib in enumerate(ryml.siblings(tree, 5)):
|
||
|
assert tree.key(sib) == [b"HELLO", b"foo", b"bar", b"baz", b"seq"][i]
|
||
|
# to walk over all elements
|
||
|
visited = [False] * tree.size()
|
||
|
for n, indentation_level in ryml.walk(tree):
|
||
|
# just a dumb emitter
|
||
|
left = " " * indentation_level
|
||
|
if tree.is_keyval(n):
|
||
|
print("{}{}: {}".format(left, tree.key(n), tree.val(n))
|
||
|
elif tree.is_val(n):
|
||
|
print("- {}".format(left, tree.val(n))
|
||
|
elif tree.is_keyseq(n):
|
||
|
print("{}{}:".format(left, tree.key(n))
|
||
|
visited[inode] = True
|
||
|
assert False not in visited
|
||
|
# NOTE about encoding!
|
||
|
k = tree.get_key(5)
|
||
|
print(k) # '<memory at 0x7f80d5b93f48>'
|
||
|
assert k == b"seq" # ok, as expected
|
||
|
assert k != "seq" # not ok - NOTE THIS!
|
||
|
assert str(k) != "seq" # not ok
|
||
|
assert str(k, "utf8") == "seq" # ok again
|
||
|
|
||
|
# parse immutable buffer
|
||
|
tree = ryml.parse(src)
|
||
|
check(tree) # OK
|
||
|
|
||
|
# also works, but requires bytearrays or
|
||
|
# objects offering writeable memory
|
||
|
mutable = bytearray(src)
|
||
|
tree = ryml.parse_in_place(mutable)
|
||
|
check(tree) # OK
|
||
|
```
|
||
|
|
||
|
As expected, the performance results so far are encouraging. In
|
||
|
a [timeit benchmark](api/python/parse_bm.py) compared
|
||
|
against [PyYaml](https://pyyaml.org/)
|
||
|
and [ruamel.yaml](https://yaml.readthedocs.io/en/latest/), ryml parses
|
||
|
quicker by a factor of 30x-50x:
|
||
|
|
||
|
```
|
||
|
+-----------------------+-------+----------+---------+----------------+
|
||
|
| case | iters | time(ms) | avg(ms) | avg_read(MB/s) |
|
||
|
+-----------------------+-------+----------+---------+----------------+
|
||
|
| parse:RuamelYaml | 88 | 800.483 | 9.096 | 0.234 |
|
||
|
| parse:PyYaml | 88 | 541.370 | 6.152 | 0.346 |
|
||
|
| parse:RymlRo | 3888 | 776.020 | 0.200 | 10.667 |
|
||
|
| parse:RymlRoReuse | 1888 | 381.558 | 0.202 | 10.535 |
|
||
|
| parse:RymlRw | 3888 | 775.121 | 0.199 | 10.679 |
|
||
|
| parse:RymlRwReuse | 3888 | 774.534 | 0.199 | 10.687 |
|
||
|
+-----------------------+-------+----------+---------+----------------+
|
||
|
```
|
||
|
|
||
|
(Note that the results above are somewhat biased towards ryml, because it does
|
||
|
not perform any type conversions: return types are merely `memoryviews` to
|
||
|
the source buffer.)
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## YAML standard conformance
|
||
|
|
||
|
ryml is close to feature complete. Most of the YAML features are well
|
||
|
covered in the unit tests, and expected to work, unless in the
|
||
|
exceptions noted below.
|
||
|
|
||
|
Of course, there are many dark corners in YAML, and there certainly
|
||
|
can appear cases which ryml fails to parse. Your [bug reports or pull
|
||
|
requests](https://github.com/biojppm/rapidyaml/issues) are very
|
||
|
welcome.
|
||
|
|
||
|
See also [the roadmap](./ROADMAP.md) for a list of future work.
|
||
|
|
||
|
|
||
|
### Known limitations
|
||
|
|
||
|
ryml deliberately makes no effort to follow the standard in the following situations:
|
||
|
|
||
|
* Containers are not accepted as mapping keys: keys must be scalars.
|
||
|
* Tab characters after `:` and `-` are not accepted tokens, unless
|
||
|
ryml is compiled with the macro `RYML_WITH_TAB_TOKENS`. This
|
||
|
requirement exists because checking for tabs introduces branching
|
||
|
into the parser's hot code and in some cases costs as much as 10%
|
||
|
in parsing time.
|
||
|
* Anchor names must not end with a terminating colon: eg `&anchor: key: val`.
|
||
|
* `%YAML` directives have no effect and are ignored.
|
||
|
* `%TAG` directives are limited to a default maximum of 4 instances
|
||
|
per `Tree`. To increase this maximum, define the preprocessor symbol
|
||
|
`RYML_MAX_TAG_DIRECTIVES` to a suitable value. This arbitrary limit
|
||
|
reflects the usual practice of having at most 1 or 2 tag directives;
|
||
|
also, be aware that this feature is under consideration for removal
|
||
|
in YAML 1.3.
|
||
|
|
||
|
Also, ryml tends to be on the permissive side where the YAML standard
|
||
|
dictates there should be an error; in many of these cases, ryml will
|
||
|
tolerate the input. This may be good or bad, but in any case is being
|
||
|
improved on (meaning ryml will grow progressively less tolerant of
|
||
|
YAML errors in the coming releases). So we strongly suggest to stay
|
||
|
away from those dark corners of YAML which are generally a source of
|
||
|
problems, which is a good practice anyway.
|
||
|
|
||
|
If you do run into trouble and would like to investigate conformance
|
||
|
of your YAML code, beware of existing online YAML linters, many of
|
||
|
which are not fully conformant; instead, try using
|
||
|
[https://play.yaml.io](https://play.yaml.io), an amazing tool which
|
||
|
lets you dynamically input your YAML and continuously see the results
|
||
|
from all the existing parsers (kudos to @ingydotnet and the people
|
||
|
from the YAML test suite). And of course, if you detect anything wrong
|
||
|
with ryml, please [open an
|
||
|
issue](https://github.com/biojppm/rapidyaml/issues) so that we can
|
||
|
improve.
|
||
|
|
||
|
|
||
|
### Test suite status
|
||
|
|
||
|
As part of its CI testing, ryml uses the [YAML test
|
||
|
suite](https://github.com/yaml/yaml-test-suite). This is an extensive
|
||
|
set of reference cases covering the full YAML spec. Each of these
|
||
|
cases have several subparts:
|
||
|
* `in-yaml`: mildly, plainly or extremely difficult-to-parse YAML
|
||
|
* `in-json`: equivalent JSON (where possible/meaningful)
|
||
|
* `out-yaml`: equivalent standard YAML
|
||
|
* `emit-yaml`: equivalent standard YAML
|
||
|
* `events`: reference results (ie, expected tree)
|
||
|
|
||
|
When testing, ryml parses each of the 4 yaml/json parts, then emits
|
||
|
the parsed tree, then parses the emitted result and verifies that
|
||
|
emission is idempotent, ie that the emitted result is semantically the
|
||
|
same as its input without any loss of information. To ensure
|
||
|
consistency, this happens over four levels of parse/emission
|
||
|
pairs. And to ensure correctness, each of the stages is compared
|
||
|
against the `events` spec from the test, which constitutes the
|
||
|
reference. The tests also check for equality between the reference
|
||
|
events in the test case and the events emitted by ryml from the data
|
||
|
tree parsed from the test case input. All of this is then carried out
|
||
|
combining several variations: both unix `\n` vs windows `\r\n` line
|
||
|
endings, emitting to string, file or streams, which results in ~250
|
||
|
tests per case part. With multiple parts per case and ~400 reference
|
||
|
cases in the test suite, this makes over several hundred thousand
|
||
|
individual tests to which ryml is subjected, which are added to the
|
||
|
unit tests in ryml, which also employ the same extensive
|
||
|
combinatorial approach.
|
||
|
|
||
|
Also, note that in [their own words](http://matrix.yaml.io/), the
|
||
|
tests from the YAML test suite *contain a lot of edge cases that don't
|
||
|
play such an important role in real world examples*. And yet, despite
|
||
|
the extreme focus of the test suite, currently ryml only fails a minor
|
||
|
fraction of the test cases, mostly related with the deliberate
|
||
|
limitations noted above. Other than those limitations, by far the main
|
||
|
issue with ryml is that several standard-mandated parse errors fail to
|
||
|
materialize. For the up-to-date list of ryml failures in the
|
||
|
test-suite, refer to the [list of known
|
||
|
exceptions](test/test_suite/test_suite_parts.cpp) from ryml's test
|
||
|
suite runner, which is used as part of ryml's CI process.
|
||
|
|
||
|
|
||
|
------
|
||
|
|
||
|
## Alternative libraries
|
||
|
|
||
|
Why this library? Because none of the existing libraries was quite
|
||
|
what I wanted. When I started this project in 2018, I was aware of these two
|
||
|
alternative C/C++ libraries:
|
||
|
|
||
|
* [libyaml](https://github.com/yaml/libyaml). This is a bare C
|
||
|
library. It does not create a representation of the data tree, so
|
||
|
I don't see it as practical. My initial idea was to wrap parsing
|
||
|
and emitting around libyaml's convenient event handling, but to my
|
||
|
surprise I found out it makes heavy use of allocations and string
|
||
|
duplications when parsing. I briefly pondered on sending PRs to
|
||
|
reduce these allocation needs, but not having a permanent tree to
|
||
|
store the parsed data was too much of a downside.
|
||
|
* [yaml-cpp](https://github.com/jbeder/yaml-cpp). This library may
|
||
|
be full of functionality, but is heavy on the use of
|
||
|
node-pointer-based structures like `std::map`, allocations, string
|
||
|
copies, polymorphism and slow C++ stream serializations. This is
|
||
|
generally a sure way of making your code slower, and strong
|
||
|
evidence of this can be seen in the benchmark results above.
|
||
|
|
||
|
Recently [libfyaml](https://github.com/pantoniou/libfyaml)
|
||
|
appeared. This is a newer C library, fully conformant to the YAML
|
||
|
standard with an amazing 100% success in the test suite; it also offers
|
||
|
the tree as a data structure. As a downside, it does not work in
|
||
|
Windows, and it is also multiple times slower parsing and emitting.
|
||
|
|
||
|
When performance and low latency are important, using contiguous
|
||
|
structures for better cache behavior and to prevent the library from
|
||
|
trampling caches, parsing in place and using non-owning strings is of
|
||
|
central importance. Hence this Rapid YAML library which, with minimal
|
||
|
compromise, bridges the gap from efficiency to usability. This library
|
||
|
takes inspiration from
|
||
|
[RapidJSON](https://github.com/Tencent/rapidjson) and
|
||
|
[RapidXML](http://rapidxml.sourceforge.net/).
|
||
|
|
||
|
------
|
||
|
## License
|
||
|
|
||
|
ryml is permissively licensed under the [MIT license](LICENSE.txt).
|
||
|
|