Protocol Buffer

Protocol Buffer is a tool developed by Google for serializing structured data which can then be used to transmit data from one medium to another. For example, sensor readings captured from a microcontroller could be sent to the cloud for logging and diagnostics after encoding the data in a serialized format for easier transmission of data bytes.

Protocol buffer can serialize data from variety of languages such as Java, Python, Objective-C C++, Dart, Go, Ruby, and C# along with running on any platform.


Google's Protocol Buffer Tool can generate data structures for C++ and not for C. Since microcontrollers have limited RAM and code memory, the embedded industry does not prefer to program in C++, thus making Google's Protocol Buffer tool less suitable. Nanopb provides a C based library for encoding and decoding messages in Google's Protocol Buffers format.

Why do we need Protocol Buffers (or Nanopb)?

Consider the scenario where we need to transmit CAN message frames from vehicle to the cloud for data logging and analysis. The data of a CAN frame can be held in the following C Data structure:

typedef struct { // 8 bytes: uint32_t message_id; ///< Message ID of the CAN bus message uint32_t timestamp_ms; ///< The receive timestamp(in microseconds) of the message // 2 bytes: uint8_t dlc; ///< Data length code, 0-8 bytes uint8_t bus_id : 4; ///< Identifies which CAN bus this message belongs to uint8_t ide : 1; ///< ID Extended uint8_t rtr : 1; ///< Remote Transmission Request // Usually has 8 bytes. uint8_t data[8]; //* byte data } can_message_s;

For sending the data held in the data structure above to the cloud, a popular choice is using BSD Sockets. After establishing connection with the cloud and obtaining the socket file descriptor sockfd we can use send Socket API to send our data.

/** * @param sockfd Specifies the socket file descriptor. * @param buf Points to the buffer containing the message to send. * @param len Specifies the length of the message in bytes. * * @return number of bytes sent. */ ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Since the send API requires an array of bytes to be transferred, we need a way to convert our CAN based C data structure to serialized data bytes. This is where Protocol Buffers (or Nanopb) comes into play. Protocol Buffers will serialize the CAN data structure as an array of bytes in an efficient manner which can then be transferred to the cloud server using Socket send API.

Getting started

Start with reading the following materials which are very useful in getting up to speed about protobufs:

  1. Google's Documentation: A great tutorial with simple examples and common API explanation.

  2. Nanopb Documentation

Installation and Setup

For Sibros Employees:

Sibros internal repository integrates Nanopb library with Bazel. Custom cc_nanopb_library Bazel rule is used to auto-generate C structures from .proto file in the form of an header(.h) file. Refer to the Bazel BUILD file in the Nanopb examples to generate the headers(.h) and source files(.c).

For others:

To use Nanopb, protocol compiler needs to be installed. Install Python3 and Python3-Pip package:

$ apt-get install python3 python3-pip

Install protobuf and grpcio-tools packages need by Nanopb:

Protocol Buffers messages are defined in a proto file as follows:

This .proto file is used to generate the Nanopb headers(.h) and source files(.c) using the python script provided by Nanopb.

Implementation and Examples

Note: The examples should work out-of-the-box (if bazel targets are correctly defined by the user) for Sibro's Employees since all Bazel rules and BUILD files are setup. However, others need to create a Makefile to build and link all the source files (and also Nanopb library) for creating a binary target. I will add a Makefile for all the examples soon.

Example - 1

In this example, a single CAN message with a single byte of data is serialized to protobuf format using Nanopb.

Proto Files

Proto files are used to structure the protocol buffer data using protocol buffer language. We need to define the CAN message and its contents inside a proto file as follows:

In the proto file a message to be transmitted is defined using the message keyword, followed by the message name (proto_can_message in the exampe above). The proto message can have multiple data members such as message_id, bus_id, timestamp_ms, data_byte as above.

If required keyword is used before the declaration of the message's member, the member has to be initialized. Alternatively option keyword can also be used which makes initializing the data member optional.

The number after equality operator is the tag of the message member which are used to match fields when serializing and deserializing the data. For Eg. the tag number for timestamp_ms is 1.

For more information about proto message, visit Google's proto file documentation.

Compiling proto files and generation of C header(.h) file

For Bazel users, we create an instance (or target) of cc_nanopb_library rule, which when executed will compile the proto file and generate the (.h) file.

Building the target we obtain the Nanopb generated file.

Non-Bazel users, can generate the header (.h) file using the python script provided by Nanopb:

The Nanopb generated header (.h) file will create an proto message equivalent C structure as follows:

Refer to the complete Nanopb generated header.

Encoding data using Nanopb

We start with creating our .c and .h files for defining encoding data operation by creating encode_packet function. We have to include the Nanopb headers and the generated header files.

Note: serial_data_packet_s* packet holds the memory buffer for Nanopb encoded data bytes:

Now we can populate the Nanopb generated data strucutre proto_msg with our CAN message to be serialized as follows:

We then create an output stream for writing the data to the buffer:

  • pb_ostream_t pb_ostream_from_buffer(pb_byte_t *buf, size_t bufsize) constructs an output stream for writing data bytes into a memory buffer. It uses an internal callback function that stores the pointer in stream state field.

  • pb_ostream_t stream will hold the output stream buffer.

The Protobuf message is then encoded to a serialized format using Nanopb's pb_encode() API. The number of bytes encoded is obtained from stream.bytes_written data member.

pb_encode() function encodes the contents of a proto message C structure and writes it to output stream. Visit Nanopb Docs for more information on pb_encode().

proto_can_message_fields argument to the pb_encode() function is an auto-generated Struct field which will encoding specification for Nanopb. Visit Nanopb Docs for more information on proto_can_message_fields. Also visit the auto-generated Nanopb header file for reference.

At this point data bytes are encoded in a serialized format and stored in the serial_data_packet_s memory structure.


To verify the integrity of encoded data, we decode the encoded packet. First, we create a stream that reads from the encoded buffer.

Then we decode the encoded message by using Nanopb's decoding API such that the proto message (ie. proto_can_message* proto_msg) is re-populated.

Finally, we re-populate the CAN message using the newly populated proto_msg ie. Copy decoded data to CAN message.

  • Visit Nanopb Docs for more reference on pb_istream_from_buffer.

  • Visit Nanopb Docs for more reference on pb_decode.

Example - 2 : Using Callbacks for Nanopb

A CAN message usually has a 8 byte data field (if using CAN-FD the data field is 64 bytes). Thus we modify the CAN message in the proto file such that the CAN proto message can encode 8 bytes of data at once.

Note: The int32 data type for data_byte member has changed to bytes data type, which is used to allocate variable-length storage for message fields. Refer the Scalar Value Types section for more context on data types.

Compiling proto files and generation of C header(.h) file

On compiling the .proto file again and generating the Nanopb header (ex2.npb.h) file the Proto message C structure proto_can_message is modified as follows:

Note the data type of data_byte is changed from to int32_t to pb_callback_t type. Nanopb callbacks are explained next.


Callbacks are used when the members of a message have variable length and storage is not statically allocated for it. For example if a proto message (in a .proto file) contains a string member such as string name, instead of generating char *name Nanopb generates the variable name of pb_callback_t (ie. 'pb_callback_t name') datatype. This allows the user to allocate variable name with any number of chars using a custom callback function.

Thus, members of a Proto message are generated as pb_callback_t datatypes for variable-length arrays/strings and repeated member messages(This is demonstrated below).

The pb_callback_t structure is defined as follows:

The pb_callback_t structure consists of two members:

  • Union of Function pointers: This member holds a callback function for processing variable length member of a message.

  • void *arg: It is used to pass a pointer to the data structure which is processed by the callback function. If the function pointer is NULL, the corresponding message field will be skipped.

For more information on callbacks in Nanopb visit here.

Encoding a callback message

Let's encode a single CAN message instantiated as follows:

Here, the data field of CAN message has 8 bytes, and each byte is set to the value 0x0E.

Inside the encode_packet function proto_can_message *proto_msg (ie. The generated Proto C Structure) is populated as before from the can_message_s *can_msg. Since data_bytes member is of pb_callback_t type it is populated as follows:

  • proto_msg->data_byte.arg = can_msg: The can_msg can now be passed as an argument to callback function and it's variable amount of data bytes (8 in this case) can be encoded.

  • proto_msg->data_byte.funcs.encode = &callback_encode_can_bytes : Will hold the function pointer for the callback funtion. The callback function definition is described below.

We then call Nanopb's pb_encode() function. The pb_encode() will internally call the callback registered above to encode all the data bytes inside the CAN message.

The strucutre of the callback function is as follows:

We deference the arg pointer which points to the array of CAN messages, get the count of CAN messages inside the array and encode all bytes at once using Nanopb's pb_encode_string API. pb_encode_tag_for_field starts a field in the Protocol Buffers binary format. More information here.


Decoding the packet is similar to the first example however, we make use of callback structure to decoded multiple CAN bytes.

The callback_t structure members are assigned the pointer to callback function and CAN message data structure which will hold the decoded CAN message. The Nanopb's pb_decode API is then called.

The Callback to decode multiple CAN bytes reads all the bytes at once using pb_read API. Here, pb_byte_t pb_bytes is essentially a byte array. istream->bytes_left argument to pb_read API informs the Nanopb how many bytes are left to be read.

Example - 3 : Nested Callback Structures

Consider a scenario when we need to send multiple CAN frames, each frame containing multiple data bytes and some other information (for eg. software version information) to the cloud in the form of encoded data bytes.

For this we need to define the following .proto file:

  • The .proto file defines proto_can_message message as before.

  • version_info message is used to encode the release version information in integer data format.

  • proto_message wraps version_info message, multiple can_msgs and sw_version (ie. software version) in string format.

  • proto_can_message can_msgs member of proto_message has been declared as repeated. This allows us to encoded muliple CAN frames within a single proto message.

.options file

Using generator options, we can set maximum sizes for fields in order to allocate them statically. The preferred way to do this is to create an .options file with the same name as your .proto file:

In the .options file above we fix the size of proto_message.sw_version message member in the .proto file to 6 characters.

For more information on Proto .options file in Nanopb visit here.

Compiling proto files and generation of C header(.h) file

For Bazel users, we modify the BUILD file from the example-1 by adding ex3.options to the data attribute of proto_library rule.

Building the Nanopb target we generate the (ex3.npb.h) file.

So we need to have 2 callback functions:

  • First callback function callback_encode_can_messages will encode multiple (variable amount of) CAN messages

  • Second callback function callback_encode_can_bytes will encode multiple data bytes of the CAN message.

  • sw_version member is also defined as a char array of 6 bytes as defined in .option file.


First, we populate the software release information and sotware version string in the proto_message *proto_msg using the following function:

Then we start by passing N number of CAN messages to encode_packet function. The can_msg_count variable passes the number of CAN messages in the can_msg array.

We register the array of CAN messages to be encoded to the arg member of pb_callback_t structure along with the callback_encode_can_messages callback function. We then pass the entire proto_pkt to the Nanopb's pb_encode API for encoding.

The callback function callback_encode_can_messages is called internally by pb_encode function.

We then deference the array of CAN messages passed into the void *const *arg argument and iteratively encode individual CAN messages. Since the pb_callback_t can_msgs is a member field of proto_message we use the Nanopb's pb_encode_submessage() API to encode individual CAN messages.

Nanopb's pb_encode_submessage() function will internally call callback_encode_can_bytes callback function to encoded individual CAN data bytes.

Thus subsequently all the CAN messages are encoded.


Now to decode the Array of CAN messages we reverse the encoding process. We pass an empty array of CAN message which will hold the decoded data from the serial_data_packet_s *packet data structure.

To decode the array of messages we register the callback decode_callback_can_messages to the pb_callback_t structure can_msgs and also pass the empty CAN message array to store the decoded CAN data. We then call Nanopb's pb_decode() API.

The decode_callback_can_messages is invoked for EACH repeated message. This callback uses Nanopb's pb_decode() function to decode a single CAN message.

decode_callback_can_messages() function internally call the callback_decode_can_bytes callback as illustrated in the second example to decode the 8 bytes of CAN message's data field.

External Resources