Generating AWS Embedded Metrics Format in C++
Amazon Web Services introduced[ref] the Embedded Metrics Format (from now on referred to as EMF) which allows for metrics generation out of the log stream.
For AWS Lambda Serverless functions this is made even easier by just using the standard output stream, allowing for asynchronous ingestion of log data without any additional network requests
There are various libraries available for different languages, but I have not found one for C++ to be used in combination with the AWS Lambda Custom Runtime, so I decided to write one.
Generating EMF Messages
The example creates a EMG logger for the metrics namespace test_ns with one metric of type Count and two dimensions, one with a dynamic value, requestId and one with a static value dimension, called function_name
Once the scope of the logger variable is left and the object is destroyed, it flushes the message to the default message sink which is the standard output sink.
The values for the metrics and dynamic dimensions can either be set using an index, matching the position of the template, or by the name used in the logger setup.
emf_logger<"test_ns",
emf_metrics<
emf_metric<"my_counter", Aws::CloudWatch::Model::StandardUnit::Count>>,
emf_dimensions<
emf_dimension<"requestId">,
emf_static_dimension<"function_name", "my_lambda_fun">>> logger;
...
logger.metric_value_by_name<"my_counter">(12);
// Or
// logger.put_metrics_value<0>(12);
...
logger.dimension_value_by_name<"requestId">(request.id());
Message Sink
The message sink is the types used to send the log metric messages and are defined by the concept:
template concept emf_msg_sink_c = requires(S sink, const nlohmann::json& data) {
{ sink.sink(data) };
{ sink.generate() } -> std::same_as<bool>;
};
Currently there are 2 sinks available, the standard output sink and a null sink that will silently drop all messages. More Sinks could be supported for non AWS Lambda environments which require the use of the CloudWatch API to send those messages.
Dimensions
To accommodate "multiple" parameter packs the dimensions and metrics are wrapped into wrapper types for each. The individual dimensions are setup by using the types defined by the concept:
The compiler will also ensure there are no more then 9 dimensions specified as per the AWS EMF specification.
template concept emf_dimension_c = requires(D dimension) {
{ dimension.name() } -> std::same_as<std::string_view>;
{ dimension.value() } -> std::same_as<std::string_view>;
};
Currently there are 2 implementations available of this concept, the first one with a static value that is set at compile time and the second one with a dynamic value that can be set at runtime.
Metrics
The metrics are setup using the following interface as defined by the concept:
template concept emf_metric_c = requires (M metric, typename M::type value){
{ metric.name() } -> std::same_as;
{ metric.unit_name() } -> std::same_as;
{ metric.put_value(value) };
{ metric.size() } -> std::same_as;
};
The metrics are configured through 3 parameters, its name, its CloudWatch metric type and C++ data type, which is defaulted to double.
If you call the put_value() method multiple times (through the Loggers interface), the output generates a JSON array rather than a single numeric value. EMF has a limit of 100 elements per metric array. If that limit is exceeded, the library will create multiple messages with blocks of 100 values, or the remainder for the last message.
Benchmarks
To check on the performance of this, I ran catch2[ref] benchmarks on an Intel Core i7-8550U CPU at 1.80GHz with 40GB of RAM and an M.2 SSD. The first set of benchmarks is produced without actually generating the message output and are optimised release builds on gcc 10.2 with C++20 standard settings.
The first benchmark uses a single integer metric with a single value and no output:
benchmark name samples iterations estimated
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
Single metric, no output 100 1093 2.6232 ms
23.2795 ns 23.114 ns 23.6399 ns
1.18915 ns 0.670797 ns 2.37045 ns
The next benchmark uses two metrics which are set via the name lookup:
benchmark name samples iterations estimated
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
Metric By Name, no output 100 577 2.6542 ms
44.5421 ns 43.8828 ns 46.222 ns
4.79016 ns 0.696764 ns 9.05731 ns
Generating 150 metric values into a single metric using the index of the metric:
...............................................................................
benchmark name samples iterations estimated
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
150 Metrics, no output 100 73 2.7594 ms
391.399 ns 386.147 ns 399.212 ns
32.4228 ns 23.6457 ns 43.2131 ns
And then all three benchmarks again, this time generating the output into a string
...............................................................................
benchmark name samples iterations estimated
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
Single metric 100 5 3.191 ms
5.86022 us 5.78607 us 6.16137 us
656.242 ns 108.839 ns 1.53253 us
Metric By Name 100 4 3.2084 ms
8.40916 us 8.24599 us 8.68389 us
1.06261 us 738.91 ns 1.65966 us
150 Metrics 100 2 4.648 ms
21.8884 us 21.5728 us 22.4238 us
2.04818 us 1.33857 us 3.20579 us
Closing Notes
Currently the library uses Niels Lohmann JSON library[ref] to generate the output. As a future performance enhancement it could be considered to use just direct stream output without generating an intermediate JSON data structure.
For using "named" string template parameters, I use this utility class that was inspired by a stackoverflow comment:
template<int N> struct named {
constexpr named(char const (&s)[N]) {
std::copy_n(s, N, this->m_elems);
}
constexpr auto operator<=>(named const&) const = default;
constexpr const char* name() const {
return &m_elems[0];
}
/**
* Contained Data
*/
char m_elems[N];
};
template<int N> named(char const(&)[N])->named<N>;