Language Guide (proto 3) (2024)

Covers how to use the version 3 of Protocol Buffers in your project.

This guide describes how to use the protocol buffer language to structure yourprotocol buffer data, including .proto file syntax and how to generate dataaccess classes from your .proto files. It covers the proto3 version of theprotocol buffers language: for information on the proto2 syntax, see theProto2 Language Guide.

This is a reference guide – for a step by step example that uses many of thefeatures described in this document, see thetutorialfor your chosen language.

Defining A Message Type

First let’s look at a very simple example. Let’s say you want to define a searchrequest message format, where each search request has a query string, theparticular page of results you are interested in, and a number of results perpage. Here’s the .proto file you use to define the message type.

syntax = "proto3";message SearchRequest { string query = 1; int32 page_number = 2; int32 results_per_page = 3;}
  • The first line of the file specifies that you’re using proto3 syntax: ifyou don’t do this the protocol buffer compiler will assume you are usingproto2. This must bethe first non-empty, non-comment line of the file.
  • The SearchRequest message definition specifies three fields (name/valuepairs), one for each piece of data that you want to include in this type ofmessage. Each field has a name and a type.

Specifying Field Types

In the earlier example, all the fields are scalar types: two integers(page_number and results_per_page) and a string (query). You can alsospecify enumerations and composite types like other message types foryour field.

Assigning Field Numbers

You must give each field in your message definition a number between 1 and536,870,911 with the following restrictions:

  • The given number must be unique among all fields for that message.
  • Field numbers 19,000 to 19,999 are reserved for the Protocol Buffersimplementation. The protocol buffer compiler will complain if you use one ofthese reserved field numbers in your message.
  • You cannot use any previously reserved field numbers orany field numbers that have been allocated to extensions.

This number cannot be changed once your message type is in use because itidentifies the field in themessage wire format.“Changing” a field number is equivalent to deleting that field and creating anew field with the same type but a new number. See Deleting Fieldsfor how to do this properly.

Field numbers should never be reused. Never take a field number out of thereserved list for reuse with a new field definition. SeeConsequences of Reusing Field Numbers.

You should use the field numbers 1 through 15 for the most-frequently-setfields. Lower field number values take less space in the wire format. Forexample, field numbers in the range 1 through 15 take one byte to encode. Fieldnumbers in the range 16 through 2047 take two bytes. You can find out more aboutthis inProtocol Buffer Encoding.

Consequences of Reusing Field Numbers

Reusing a field number makes decoding wire-format messages ambiguous.

The protobuf wire format is lean and doesn’t provide a way to detect fieldsencoded using one definition and decoded using another.

Encoding a field using one definition and then decoding that same field with adifferent definition can lead to:

  • Developer time lost to debugging
  • A parse/merge error (best case scenario)
  • Leaked PII/SPII
  • Data corruption

Common causes of field number reuse:

  • renumbering fields (sometimes done to achieve a more aesthetically pleasingnumber order for fields). Renumbering effectively deletes and re-adds allthe fields involved in the renumbering, resulting in incompatiblewire-format changes.
  • deleting a field and not reserving the number to preventfuture reuse.

The max field is 29 bits instead of the more-typical 32 bits because three lowerbits are used for the wire format. For more on this, see theEncoding topic.

Specifying Field Labels

Message fields can be one of the following:

  • optional: An optional field is in one of two possible states:

    • the field is set, and contains a value that was explicitly set or parsedfrom the wire. It will be serialized to the wire.
    • the field is unset, and will return the default value. It will not beserialized to the wire.

    You can check to see if the value was explicitly set.

  • repeated: this field type can be repeated zero or more times in awell-formed message. The order of the repeated values will be preserved.

  • map: this is a paired key/value field type. SeeMaps for more onthis field type.

  • If no explicit field label is applied, the default field label, called“implicit field presence,” is assumed. (You cannot explicitly set a field tothis state.) A well-formed message can have zero or one of this field (butnot more than one). You also cannot determine whether a field of this typewas parsed from the wire. An implicit presence field will be serialized tothe wire unless it is the default value. For more on this subject, seeField Presence.

In proto3, repeated fields of scalar numeric types use packed encoding bydefault. You can find out more about packed encoding inProtocol Buffer Encoding.

Well-formed Messages

The term “well-formed,” when applied to protobuf messages, refers to the bytesserialized/deserialized. The protoc parser validates that a given protodefinition file is parseable.

In the case of optional fields that have more than one value, the protocparser will accept the input, but only uses the last field. So, the “bytes” maynot be “well-formed” but the resulting message would have only one and would be“well-formed” (but would not roundtrip the same).

Adding More Message Types

Multiple message types can be defined in a single .proto file. This is usefulif you are defining multiple related messages – so, for example, if you wantedto define the reply message format that corresponds to your SearchResponsemessage type, you could add it to the same .proto:

message SearchRequest { string query = 1; int32 page_number = 2; int32 results_per_page = 3;}message SearchResponse { ...}

Combining Messages leads to bloat While multiple message types (such asmessage, enum, and service) can be defined in a single .proto file, it canalso lead to dependency bloat when large numbers of messages with varyingdependencies are defined in a single file. It’s recommended to include as fewmessage types per .proto file as possible.

Adding Comments

To add comments to your .proto files, use C/C++-style // and /* ... */syntax.

/* SearchRequest represents a search query, with pagination options to * indicate which results to include in the response. */message SearchRequest { string query = 1; int32 page_number = 2; // Which page number do we want? int32 results_per_page = 3; // Number of results to return per page.}

Deleting Fields

Deleting fields can cause serious problems if not done properly.

When you no longer need a field and all references have been deleted from clientcode, you may delete the field definition from the message. However, youmust reserve the deleted field number. If you do notreserve the field number, it is possible for a developer to reuse that number inthe future.

You should also reserve the field name to allow JSON and TextFormat encodings ofyour message to continue to parse.

Reserved Fields

If you update a message type by entirely deleting a field, orcommenting it out, future developers can reuse the field number when makingtheir own updates to the type. This can cause severe issues, as described inConsequences of Reusing Field Numbers.

To make sure this doesn’t happen, add your deleted field number to thereserved list. To make sure JSON and TextFormat instances of your message canstill be parsed, also add the deleted field name to a reserved list.

The protocol buffer compiler will complain if any future developers try to usethese reserved field numbers or names.

message Foo { reserved 2, 15, 9 to 11; reserved "foo", "bar";}

Reserved field number ranges are inclusive (9 to 11 is the same as 9, 10, 11). Note that you can’t mix field names and field numbers in the samereserved statement.

What’s Generated from Your .proto?

When you run the protocol buffer compiler on a .proto, thecompiler generates the code in your chosen language you’ll need to work with themessage types you’ve described in the file, including getting and setting fieldvalues, serializing your messages to an output stream, and parsing your messagesfrom an input stream.

  • For C++, the compiler generates a .h and .cc file from each.proto, with a class for each message type described in your file.
  • For Java, the compiler generates a .java file with a class for eachmessage type, as well as a special Builder class for creating messageclass instances.
  • For Kotlin, in addition to the Java generated code, the compilergenerates a .kt file for each message type with an improved Kotlin API.This includes a DSL that simplifies creating message instances, a nullablefield accessor, and a copy function.
  • Python is a little different — the Python compiler generates a modulewith a static descriptor of each message type in your .proto, which isthen used with a metaclass to create the necessary Python data accessclass at runtime.
  • For Go, the compiler generates a .pb.go file with a type for eachmessage type in your file.
  • For Ruby, the compiler generates a .rb file with a Ruby modulecontaining your message types.
  • For Objective-C, the compiler generates a pbobjc.h and pbobjc.m filefrom each .proto, with a class for each message type described in yourfile.
  • For C#, the compiler generates a .cs file from each .proto, with aclass for each message type described in your file.
  • For PHP, the compiler generates a .php message file for each messagetype described in your file, and a .php metadata file for each .protofile you compile. The metadata file is used to load the valid message typesinto the descriptor pool.
  • For Dart, the compiler generates a .pb.dart file with a class for eachmessage type in your file.

You can find out more about using the APIs for each language by following thetutorial for your chosen language. For even more APIdetails, see the relevant API reference.

Scalar Value Types

A scalar message field can have one of the following types – the table shows thetype specified in the .proto file, and the corresponding type in theautomatically generated class:

.proto TypeNotesC++ TypeJava/Kotlin Type[1]Python Type[3]Go TypeRuby TypeC# TypePHP TypeDart Type
doubledoubledoublefloatfloat64Floatdoublefloatdouble
floatfloatfloatfloatfloat32Floatfloatfloatdouble
int32Uses variable-length encoding. Inefficient for encoding negativenumbers – if your field is likely to have negative values, use sint32instead.int32intintint32Fixnum or Bignum (as required)intintegerint
int64Uses variable-length encoding. Inefficient for encoding negativenumbers – if your field is likely to have negative values, use sint64instead.int64longint/long[4]int64Bignumlonginteger/string[6]Int64
uint32Uses variable-length encoding.uint32int[2]int/long[4]uint32Fixnum or Bignum (as required)uintintegerint
uint64Uses variable-length encoding.uint64long[2]int/long[4]uint64Bignumulonginteger/string[6]Int64
sint32Uses variable-length encoding. Signed int value. These moreefficiently encode negative numbers than regular int32s.int32intintint32Fixnum or Bignum (as required)intintegerint
sint64Uses variable-length encoding. Signed int value. These moreefficiently encode negative numbers than regular int64s.int64longint/long[4]int64Bignumlonginteger/string[6]Int64
fixed32Always four bytes. More efficient than uint32 if values are oftengreater than 228.uint32int[2]int/long[4]uint32Fixnum or Bignum (as required)uintintegerint
fixed64Always eight bytes. More efficient than uint64 if values are oftengreater than 256.uint64long[2]int/long[4]uint64Bignumulonginteger/string[6]Int64
sfixed32Always four bytes.int32intintint32Fixnum or Bignum (as required)intintegerint
sfixed64Always eight bytes.int64longint/long[4]int64Bignumlonginteger/string[6]Int64
boolboolbooleanboolboolTrueClass/FalseClassboolbooleanbool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text, and cannotbe longer than 232.stringStringstr/unicode[5]stringString (UTF-8)stringstringString
bytesMay contain any arbitrary sequence of bytes no longer than 232.stringByteStringstr (Python 2)
bytes (Python 3)
[]byteString (ASCII-8BIT)ByteStringstringList

[1] Kotlin uses the corresponding types from Java, even for unsignedtypes, to ensure compatibility in mixed Java/Kotlin codebases.

[2] In Java, unsigned 32-bit and 64-bit integers are representedusing their signed counterparts, with the top bit simply being stored in thesign bit.

[3] In all cases, setting values to a field will perform typechecking to make sure it is valid.

[4] 64-bit or unsigned 32-bit integers are always represented as longwhen decoded, but can be an int if an int is given when setting the field. Inall cases, the value must fit in the type represented when set. See [2].

[5] Python strings are represented as unicode on decode but can bestr if an ASCII string is given (this is subject to change).

[6] Integer is used on 64-bit machines and string is used on 32-bitmachines.

You can find out more about how these types are encoded when you serialize yourmessage inProtocol Buffer Encoding.

Default Values

When a message is parsed, if the encoded message does not contain a particularimplicit presence element, accessing the corresponding field in the parsedobject returns the default value for that field. These defaults aretype-specific:

  • For strings, the default value is the empty string.
  • For bytes, the default value is empty bytes.
  • For bools, the default value is false.
  • For numeric types, the default value is zero.
  • For enums, the default value is the first defined enum value, which mustbe 0.
  • For message fields, the field is not set. Its exact value islanguage-dependent. See thegenerated code guide for details.

The default value for repeated fields is empty (generally an empty list in theappropriate language).

Note that for scalar message fields, once a message is parsed there’s no way oftelling whether a field was explicitly set to the default value (for examplewhether a boolean was set to false) or just not set at all: you should bearthis in mind when defining your message types. For example, don’t have a booleanthat switches on some behavior when set to false if you don’t want thatbehavior to also happen by default. Also note that if a scalar message fieldis set to its default, the value will not be serialized on the wire. If afloat or double value is set to +0 it will not be serialized, but -0 isconsidered distinct and will be serialized.

See the generated code guide for yourchosen language for more details about how defaults work in generated code.

Enumerations

When you’re defining a message type, you might want one of its fields to onlyhave one of a predefined list of values. For example, let’s say you want to adda corpus field for each SearchRequest, where the corpus can be UNIVERSAL,WEB, IMAGES, LOCAL, NEWS, PRODUCTS or VIDEO. You can do this verysimply by adding an enum to your message definition with a constant for eachpossible value.

In the following example we’ve added an enum called Corpus with all thepossible values, and a field of type Corpus:

enum Corpus { CORPUS_UNSPECIFIED = 0; CORPUS_UNIVERSAL = 1; CORPUS_WEB = 2; CORPUS_IMAGES = 3; CORPUS_LOCAL = 4; CORPUS_NEWS = 5; CORPUS_PRODUCTS = 6; CORPUS_VIDEO = 7;}message SearchRequest { string query = 1; int32 page_number = 2; int32 results_per_page = 3; Corpus corpus = 4;}

As you can see, the Corpus enum’s first constant maps to zero: every enumdefinition must contain a constant that maps to zero as its first element.This is because:

  • There must be a zero value, so that we can use 0 as a numericdefault value.
  • The zero value needs to be the first element, for compatibility with theproto2 semantics wherethe first enum value is the default unless a different value is explicitlyspecified.

You can define aliases by assigning the same value to different enum constants.To do this you need to set the allow_alias option to true. Otherwise, theprotocol buffer compiler generates a warning message when aliases arefound. Though all alias values are valid during deserialization, the first valueis always used when serializing.

enum EnumAllowingAlias { option allow_alias = true; EAA_UNSPECIFIED = 0; EAA_STARTED = 1; EAA_RUNNING = 1; EAA_FINISHED = 2;}enum EnumNotAllowingAlias { ENAA_UNSPECIFIED = 0; ENAA_STARTED = 1; // ENAA_RUNNING = 1; // Uncommenting this line will cause a warning message. ENAA_FINISHED = 2;}

Enumerator constants must be in the range of a 32-bit integer. Since enumvalues usevarint encoding on thewire, negative values are inefficient and thus not recommended. You can defineenums within a message definition, as in the earlier example, or outside –these enums can be reused in any message definition in your .proto file. Youcan also use an enum type declared in one message as the type of a field in adifferent message, using the syntax _MessageType_._EnumType_.

When you run the protocol buffer compiler on a .proto that uses an enum, thegenerated code will have a corresponding enum for Java, Kotlin, or C++, or aspecial EnumDescriptor class for Python that’s used to create a set ofsymbolic constants with integer values in the runtime-generated class.

Important

Thegenerated code may be subject to language-specific limitations on the number ofenumerators (low thousands for one language). Review the limitations for thelanguages you plan to use.

During deserialization, unrecognized enum values will be preserved in themessage, though how this is represented when the message is deserialized islanguage-dependent. In languages that support open enum types with valuesoutside the range of specified symbols, such as C++ and Go, the unknown enumvalue is simply stored as its underlying integer representation. In languageswith closed enum types such as Java, a case in the enum is used to represent anunrecognized value, and the underlying integer can be accessed with specialaccessors. In either case, if the message is serialized the unrecognized valuewill still be serialized with the message.

Important

Forinformation on how enums should work contrasted with how they currently work indifferent languages, seeEnum Behavior.

For more information about how to work with message enums in yourapplications, see the generated code guidefor your chosen language.

Reserved Values

If you update an enum type by entirely removing an enum entry, orcommenting it out, future users can reuse the numeric value when making theirown updates to the type. This can cause severe issues if they later load oldversions of the same .proto, including data corruption, privacy bugs, and soon. One way to make sure this doesn’t happen is to specify that the numericvalues (and/or names, which can also cause issues for JSON serialization) ofyour deleted entries are reserved. The protocol buffer compiler will complainif any future users try to use these identifiers. You can specify that yourreserved numeric value range goes up to the maximum possible value using themax keyword.

enum Foo { reserved 2, 15, 9 to 11, 40 to max; reserved "FOO", "BAR";}

Note that you can’t mix field names and numeric values in the same reservedstatement.

Using Other Message Types

You can use other message types as field types. For example, let’s say youwanted to include Result messages in each SearchResponse message – to dothis, you can define a Result message type in the same .proto and thenspecify a field of type Result in SearchResponse:

message SearchResponse { repeated Result results = 1;}message Result { string url = 1; string title = 2; repeated string snippets = 3;}

Importing Definitions

In the earlier example, the Result message type is defined in the same file asSearchResponse – what if the message type you want to use as a field type isalready defined in another .proto file?

You can use definitions from other .proto files by importing them. To importanother .proto’s definitions, you add an import statement to the top of yourfile:

import "myproject/other_protos.proto";

By default, you can use definitions only from directly imported .proto files.However, sometimes you may need to move a .proto file to a new location.Instead of moving the .proto file directly and updating all the call sites ina single change, you can put a placeholder .proto file in the old location toforward all the imports to the new location using the import public notion.

Note that the public import functionality is not available in Java.

import public dependencies can be transitively relied upon by any codeimporting the proto containing the import public statement. For example:

// new.proto// All definitions are moved here
// old.proto// This is the proto that all clients are importing.import public "new.proto";import "other.proto";
// client.protoimport "old.proto";// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directoriesspecified on the protocol compiler command line using the -I/--proto_pathflag. If no flag was given, it looks in the directory in which the compiler wasinvoked. In general you should set the --proto_path flag to the root of yourproject and use fully qualified names for all imports.

Using proto2 Message Types

It’s possible to importproto2 message types anduse them in your proto3 messages, and vice versa. However, proto2 enums cannotbe used directly in proto3 syntax (it’s okay if an imported proto2 message usesthem).

Nested Types

You can define and use message types inside other message types, as in thefollowing example – here the Result message is defined inside theSearchResponse message:

message SearchResponse { message Result { string url = 1; string title = 2; repeated string snippets = 3; } repeated Result results = 1;}

If you want to reuse this message type outside its parent message type, yourefer to it as _Parent_._Type_:

message SomeOtherMessage { SearchResponse.Result result = 1;}

You can nest messages as deeply as you like. In the example below, note that thetwo nested types named Inner are entirely independent, since they are definedwithin different messages:

message Outer { // Level 0 message MiddleAA { // Level 1 message Inner { // Level 2 int64 ival = 1; bool booly = 2; } } message MiddleBB { // Level 1 message Inner { // Level 2 int32 ival = 1; bool booly = 2; } }}

Updating A Message Type

If an existing message type no longer meets all your needs – for example, you’dlike the message format to have an extra field – but you’d still like to usecode created with the old format, don’t worry! It’s very simple to updatemessage types without breaking any of your existing code when you use the binarywire format.

Note

Ifyou use JSON orproto text formatto store your protocol buffer messages, the changes that you can make in yourproto definition are different.

CheckProto Best Practices andthe following rules:

  • Don’t change the field numbers for any existing fields. “Changing” the fieldnumber is equivalent to deleting the field and adding a new field with thesame type. If you want to renumber a field, see the instructions fordeleting a field.
  • If you add new fields, any messages serialized by code using your “old”message format can still be parsed by your new generated code. You shouldkeep in mind the default values for these elements so that newcode can properly interact with messages generated by old code. Similarly,messages created by your new code can be parsed by your old code: oldbinaries simply ignore the new field when parsing. See theUnknown Fields section for details.
  • Fields can be removed, as long as the field number is not used again in yourupdated message type. You may want to rename the field instead, perhapsadding the prefix “OBSOLETE_”, or make the field numberreserved, so that future users of your .proto can’taccidentally reuse the number.
  • int32, uint32, int64, uint64, and bool are all compatible – thismeans you can change a field from one of these types to another withoutbreaking forwards- or backwards-compatibility. If a number is parsed fromthe wire which doesn’t fit in the corresponding type, you will get the sameeffect as if you had cast the number to that type in C++ (for example, if a64-bit number is read as an int32, it will be truncated to 32 bits).
  • sint32 and sint64 are compatible with each other but are notcompatible with the other integer types.
  • string and bytes are compatible as long as the bytes are valid UTF-8.
  • Embedded messages are compatible with bytes if the bytes contain anencoded version of the message.
  • fixed32 is compatible with sfixed32, and fixed64 with sfixed64.
  • For string, bytes, and message fields, optional is compatible withrepeated. Given serialized data of a repeated field as input, clients thatexpect this field to be optional will take the last input value if it’s aprimitive type field or merge all input elements if it’s a message typefield. Note that this is not generally safe for numeric types, includingbools and enums. Repeated fields of numeric types can be serialized in thepacked format,which will not be parsed correctly when an optional field is expected.
  • enum is compatible with int32, uint32, int64, and uint64 in termsof wire format (note that values will be truncated if they don’t fit).However, be aware that client code may treat them differently when themessage is deserialized: for example, unrecognized proto3 enum types willbe preserved in the message, but how this is represented when the message isdeserialized is language-dependent. Int fields always just preserve theirvalue.
  • Changing a single optional field or extension into a member of a newoneof is binary compatible, however for some languages (notably, Go) thegenerated code’s API will change in incompatible ways. For this reason,Google does not make such changes in its public APIs, as documented inAIP-180. Withthe same caveat about source-compatibility, moving multiple fields into anew oneof may be safe if you are sure that no code sets more than one at atime. Moving fields into an existing oneof is not safe. Likewise, changinga single field oneof to an optional field or extension is safe.
  • Changing a field between a map<K, V> and the corresponding repeatedmessage field is binary compatible (see Maps, below, for themessage layout and other restrictions). However, the safety of the change isapplication-dependent: when deserializing and reserializing a message,clients using the repeated field definition will produce a semanticallyidentical result; however, clients using the map field definition mayreorder entries and drop entries with duplicate keys.

Unknown Fields

Unknown fields are well-formed protocol buffer serialized data representingfields that the parser does not recognize. For example, when an old binaryparses data sent by a new binary with new fields, those new fields becomeunknown fields in the old binary.

Proto3 messages preserve unknown fields and includes them during parsing and inthe serialized output, which matches proto2 behavior.

Any

The Any message type lets you use messages as embedded types without havingtheir .proto definition. An Any contains an arbitrary serialized message asbytes, along with a URL that acts as a globally unique identifier for andresolves to that message’s type. To use the Any type, you need toimport google/protobuf/any.proto.

import "google/protobuf/any.proto";message ErrorStatus { string message = 1; repeated google.protobuf.Any details = 2;}

The default type URL for a given message type istype.googleapis.com/_packagename_._messagename_.

Different language implementations will support runtime library helpers to packand unpack Any values in a typesafe manner – for example, in Java, the Anytype will have special pack() and unpack() accessors, while in C++ there arePackFrom() and UnpackTo() methods:

// Storing an arbitrary message type in Any.NetworkErrorDetails details = ...;ErrorStatus status;status.add_details()->PackFrom(details);// Reading an arbitrary message from Any.ErrorStatus status = ...;for (const google::protobuf::Any& detail : status.details()) { if (detail.Is<NetworkErrorDetails>()) { NetworkErrorDetails network_error; detail.UnpackTo(&network_error); ... processing network_error ... }}

Currently the runtime libraries for working with Any types are underdevelopment.

The Any message types can hold arbitrary proto3 messages, similar to proto2messages which can allowextensions.

Oneof

If you have a message with many fields and where at most one field will be setat the same time, you can enforce this behavior and save memory by using theoneof feature.

Oneof fields are like regular fields except all the fields in a oneof sharememory, and at most one field can be set at the same time. Setting any member ofthe oneof automatically clears all the other members. You can check which valuein a oneof is set (if any) using a special case() or WhichOneof() method,depending on your chosen language.

Note that if multiple values are set, the last set value as determined by theorder in the proto will overwrite all previous ones.

Field numbers for oneof fields must be unique within the enclosing message.

Using Oneof

To define a oneof in your .proto you use the oneof keyword followed by youroneof name, in this case test_oneof:

message SampleMessage { oneof test_oneof { string name = 4; SubMessage sub_message = 9; }}

You then add your oneof fields to the oneof definition. You can add fields ofany type, except map fields and repeated fields. If you need to add arepeated field to a oneof, you can use a message containing the repeated field.

In your generated code, oneof fields have the same getters and setters asregular fields. You also get a special method for checking which value (if any)in the oneof is set. You can find out more about the oneof API for your chosenlanguage in the relevant API reference.

Oneof Features

  • Setting a oneof field will automatically clear all other members of theoneof. So if you set several oneof fields, only the last field you setwill still have a value.

    SampleMessage message;message.set_name("name");CHECK_EQ(message.name(), "name");// Calling mutable_sub_message() will clear the name field and will set// sub_message to a new instance of SubMessage with none of its fields set.message.mutable_sub_message();CHECK(message.name().empty());
  • If the parser encounters multiple members of the same oneof on the wire,only the last member seen is used in the parsed message.

  • A oneof cannot be repeated.

  • Reflection APIs work for oneof fields.

  • If you set a oneof field to the default value (such as setting an int32oneof field to 0), the “case” of that oneof field will be set, and the valuewill be serialized on the wire.

  • If you’re using C++, make sure your code doesn’t cause memory crashes. Thefollowing sample code will crash because sub_message was already deletedby calling the set_name() method.

    SampleMessage message;SubMessage* sub_message = message.mutable_sub_message();message.set_name("name"); // Will delete sub_messagesub_message->set_... // Crashes here
  • Again in C++, if you Swap() two messages with oneofs, each message willend up with the other’s oneof case: in the example below, msg1 will have asub_message and msg2 will have a name.

    SampleMessage msg1;msg1.set_name("name");SampleMessage msg2;msg2.mutable_sub_message();msg1.swap(&msg2);CHECK(msg1.has_sub_message());CHECK_EQ(msg2.name(), "name");

Backwards-compatibility issues

Be careful when adding or removing oneof fields. If checking the value of aoneof returns None/NOT_SET, it could mean that the oneof has not been set orit has been set to a field in a different version of the oneof. There is no wayto tell the difference, since there’s no way to know if an unknown field on thewire is a member of the oneof.

Tag Reuse Issues

  • Move fields into or out of a oneof: You may lose some of yourinformation (some fields will be cleared) after the message is serializedand parsed. However, you can safely move a single field into a new oneofand may be able to move multiple fields if it is known that only one is everset. See Updating A Message Type for further details.
  • Delete a oneof field and add it back: This may clear your currently setoneof field after the message is serialized and parsed.
  • Split or merge oneof: This has similar issues to moving regular fields.

Maps

If you want to create an associative map as part of your data definition,protocol buffers provides a handy shortcut syntax:

map<key_type, value_type> map_field = N;

&mldr;where the key_type can be any integral or string type (so, anyscalar type except for floating point types and bytes). Note thatneither enum nor proto messages are valid for key_type.The value_type can be any type except another map.

So, for example, if you wanted to create a map of projects where each Projectmessage is associated with a string key, you could define it like this:

map<string, Project> projects = 3;

Maps Features

  • Map fields cannot be repeated.
  • Wire format ordering and map iteration ordering of map values are undefined,so you cannot rely on your map items being in a particular order.
  • When generating text format for a .proto, maps are sorted by key. Numerickeys are sorted numerically.
  • When parsing from the wire or when merging, if there are duplicate map keysthe last key seen is used. When parsing a map from text format, parsing mayfail if there are duplicate keys.
  • If you provide a key but no value for a map field, the behavior when thefield is serialized is language-dependent. In C++, Java, Kotlin, and Pythonthe default value for the type is serialized, while in other languagesnothing is serialized.
  • No symbol FooEntry can exist in the same scope as a map foo, becauseFooEntry is already used by the implementation of the map.

The generated map API is currently available for all supported languages. Youcan find out more about the map API for your chosen language in the relevantAPI reference.

Backwards compatibility

The map syntax is equivalent to the following on the wire, so protocol buffersimplementations that do not support maps can still handle your data:

message MapFieldEntry { key_type key = 1; value_type value = 2;}repeated MapFieldEntry map_field = N;

Any protocol buffers implementation that supports maps must both produce andaccept data that can be accepted by the earlier definition.

Packages

You can add an optional package specifier to a .proto file to prevent nameclashes between protocol message types.

package foo.bar;message Open { ... }

You can then use the package specifier when defining fields of your messagetype:

message Foo { ... foo.bar.Open open = 1; ...}

The way a package specifier affects the generated code depends on your chosenlanguage:

  • In C++ the generated classes are wrapped inside a C++ namespace. Forexample, Open would be in the namespace foo::bar.
  • In Java and Kotlin, the package is used as the Java package, unlessyou explicitly provide an option java_package in your .proto file.
  • In Python, the package directive is ignored, since Python modules areorganized according to their location in the file system.
  • In Go, the package directive is ignored, and the generated .pb.gofile is in the package named after the corresponding go_proto_libraryBazel rule. For open source projects, you must provide either a go_package option or set the Bazel -M flag.
  • In Ruby, the generated classes are wrapped inside nested Rubynamespaces, converted to the required Ruby capitalization style (firstletter capitalized; if the first character is not a letter, PB_ isprepended). For example, Open would be in the namespace Foo::Bar.
  • In PHP the package is used as the namespace after converting toPascalCase, unless you explicitly provide an option php_namespace in your.proto file. For example, Open would be in the namespace Foo\Bar.
  • In C# the package is used as the namespace after converting toPascalCase, unless you explicitly provide an option csharp_namespace inyour .proto file. For example, Open would be in the namespace Foo.Bar.

Note that even when the package directive does not directly affect thegenerated code, for example in Python, it is still strongly recommended tospecify the package for the .proto file, as otherwise it may lead to namingconflicts in descriptors and make the proto not portable for other languages.

Packages and Name Resolution

Type name resolution in the protocol buffer language works like C++: first theinnermost scope is searched, then the next-innermost, and so on, with eachpackage considered to be “inner” to its parent package. A leading ‘.’ (forexample, .foo.bar.Baz) means to start from the outermost scope instead.

The protocol buffer compiler resolves all type names by parsing the imported.proto files. The code generator for each language knows how to refer to eachtype in that language, even if it has different scoping rules.

Defining Services

If you want to use your message types with an RPC (Remote Procedure Call)system, you can define an RPC service interface in a .proto file and theprotocol buffer compiler will generate service interface code and stubs in yourchosen language. So, for example, if you want to define an RPC service with amethod that takes your SearchRequest and returns a SearchResponse, you candefine it in your .proto file as follows:

service SearchService { rpc Search(SearchRequest) returns (SearchResponse);}

The most straightforward RPC system to use with protocol buffers isgRPC: a language- and platform-neutral open source RPC systemdeveloped at Google. gRPC works particularly well with protocol buffers and letsyou generate the relevant RPC code directly from your .proto files using aspecial protocol buffer compiler plugin.

If you don’t want to use gRPC, it’s also possible to use protocol buffers withyour own RPC implementation. You can find out more about this in theProto2 Language Guide.

There are also a number of ongoing third-party projects to develop RPCimplementations for Protocol Buffers. For a list of links to projects we knowabout, see thethird-party add-ons wiki page.

JSON Mapping

Proto3 supports a canonical encoding in JSON, making it easier to share databetween systems. The encoding is described on a type-by-type basis in the tablebelow.

When parsing JSON-encoded data into a protocol buffer, if a value is missing orif its value is null, it will be interpreted as the correspondingdefault value.

When generating JSON-encoded output from a protocol buffer, if a protobuf fieldhas the default value and if the field doesn’t support field presence, it willbe omitted from the output by default. An implementation may provide options toinclude fields with default values in the output.

A proto3 field that is defined with the optional keyword supports fieldpresence. Fields that have a value set and that support field presence alwaysinclude the field value in the JSON-encoded output, even if it is the defaultvalue.

proto3JSONJSON exampleNotes
messageobject{"fooBar": v, "g": null, ...}Generates JSON objects. Message field names are mapped tolowerCamelCase and become JSON object keys. If thejson_name field option is specified, the specified valuewill be used as the key instead. Parsers accept both the lowerCamelCasename (or the one specified by the json_name option) and theoriginal proto field name. null is an accepted value forall field types and treated as the default value of the correspondingfield type. However, null cannot be used for thejson_name value. For more on why, seeStricter validation for json_name.
enumstring"FOO_BAR"The name of the enum value as specified in proto is used. Parsersaccept both enum names and integer values.
map<K,V>object{"k": v, ...}All keys are converted to strings.
repeated Varray[v, ...]null is accepted as the empty list [].
booltrue, falsetrue, false
stringstring"Hello World!"
bytesbase64 string"YWJjMTIzIT8kKiYoKSctPUB+"JSON value will be the data encoded as a string using standard base64encoding with paddings. Either standard or URL-safe base64 encodingwith/without paddings are accepted.
int32, fixed32, uint32number1, -10, 0JSON value will be a decimal number. Either numbers or strings areaccepted.
int64, fixed64, uint64string"1", "-10"JSON value will be a decimal string. Either numbers or strings areaccepted.
float, doublenumber1.1, -10.0, 0, "NaN", "Infinity"JSON value will be a number or one of the special string values "NaN","Infinity", and "-Infinity". Either numbers or strings are accepted.Exponent notation is also accepted.
Anyobject{"@type": "url", "f": v, ... }If the Any contains a value that has a special JSONmapping, it will be converted as follows: {"@type": xxx, "value":yyy}. Otherwise, the value will be converted into a JSON object,and the "@type" field will be inserted to indicate theactual data type.
Timestampstring"1972-01-01T10:00:20.021Z"Uses RFC 3339, where generated output will always be Z-normalizedand uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" arealso accepted.
Durationstring"1.000340012s", "1s"Generated output always contains 0, 3, 6, or 9 fractional digits,depending on required precision, followed by the suffix "s". Acceptedare any fractional digits (also none) as long as they fit intonano-seconds precision and the suffix "s" is required.
Structobject{ ... }Any JSON object. See struct.proto.
Wrapper typesvarious types2, "2", "foo", true, "true", null, 0, ...Wrappers use the same representation in JSON as the wrapped primitivetype, except that null is allowed and preserved during dataconversion and transfer.
FieldMaskstring"f.fooBar,h"See field_mask.proto.
ListValuearray[foo, bar, ...]
ValuevalueAny JSON value. Checkgoogle.protobuf.Valuefor details.
NullValuenullJSON null
Emptyobject{}An empty JSON object

JSON Options

A proto3 JSON implementation may provide the following options:

  • Always emit fields without presence: Fields that don’t support presenceand that have their default value are omitted by default in JSON output (forexample, an implicit presence integer with a 0 value, implicit presencestring fields that are empty strings, and empty repeated and map fields). Animplementation may provide an option to override this behavior and outputfields with their default values.
  • Ignore unknown fields: Proto3 JSON parser should reject unknown fieldsby default but may provide an option to ignore unknown fields in parsing.
  • Use proto field name instead of lowerCamelCase name: By default proto3JSON printer should convert the field name to lowerCamelCase and use that asthe JSON name. An implementation may provide an option to use proto fieldname as the JSON name instead. Proto3 JSON parsers are required to acceptboth the converted lowerCamelCase name and the proto field name.
  • Emit enum values as integers instead of strings: The name of an enumvalue is used by default in JSON output. An option may be provided to usethe numeric value of the enum value instead.

Options

Individual declarations in a .proto file can be annotated with a number ofoptions. Options do not change the overall meaning of a declaration, but mayaffect the way it is handled in a particular context. The complete list ofavailable options is defined in /google/protobuf/descriptor.proto.

Some options are file-level options, meaning they should be written at thetop-level scope, not inside any message, enum, or service definition. Someoptions are message-level options, meaning they should be written inside messagedefinitions. Some options are field-level options, meaning they should bewritten inside field definitions. Options can also be written on enum types,enum values, oneof fields, service types, and service methods; however, nouseful options currently exist for any of these.

Here are a few of the most commonly used options:

  • java_package (file option): The package you want to use for your generatedJava/Kotlin classes. If no explicit java_package option is given in the.proto file, then by default the proto package (specified using the“package” keyword in the .proto file) will be used. However, protopackages generally do not make good Java packages since proto packages arenot expected to start with reverse domain names. If not generating Java orKotlin code, this option has no effect.

    option java_package = "com.example.foo";
  • java_outer_classname (file option): The class name (and hence the filename) for the wrapper Java class you want to generate. If no explicitjava_outer_classname is specified in the .proto file, the class namewill be constructed by converting the .proto file name to camel-case (sofoo_bar.proto becomes FooBar.java). If the java_multiple_files optionis disabled, then all other classes/enums/etc. generated for the .protofile will be generated within this outer wrapper Java class as nestedclasses/enums/etc. If not generating Java code, this option has no effect.

    option java_outer_classname = "Ponycopter";
  • java_multiple_files (file option): If false, only a single .java filewill be generated for this .proto file, and all the Javaclasses/enums/etc. generated for the top-level messages, services, andenumerations will be nested inside of an outer class (seejava_outer_classname). If true, separate .java files will be generatedfor each of the Java classes/enums/etc. generated for the top-levelmessages, services, and enumerations, and the wrapper Java class generatedfor this .proto file won’t contain any nested classes/enums/etc. This is aBoolean option which defaults to false. If not generating Java code, thisoption has no effect.

    option java_multiple_files = true;
  • optimize_for (file option): Can be set to SPEED, CODE_SIZE, orLITE_RUNTIME. This affects the C++ and Java code generators (and possiblythird-party generators) in the following ways:

    • SPEED (default): The protocol buffer compiler will generate code forserializing, parsing, and performing other common operations on yourmessage types. This code is highly optimized.
    • CODE_SIZE: The protocol buffer compiler will generate minimal classesand will rely on shared, reflection-based code to implementserialialization, parsing, and various other operations. The generatedcode will thus be much smaller than with SPEED, but operations will beslower. Classes will still implement exactly the same public API as theydo in SPEED mode. This mode is most useful in apps that contain a verylarge number of .proto files and do not need all of them to beblindingly fast.
    • LITE_RUNTIME: The protocol buffer compiler will generate classes thatdepend only on the “lite” runtime library (libprotobuf-lite instead oflibprotobuf). The lite runtime is much smaller than the full library(around an order of magnitude smaller) but omits certain features likedescriptors and reflection. This is particularly useful for apps runningon constrained platforms like mobile phones. The compiler will stillgenerate fast implementations of all methods as it does in SPEED mode.Generated classes will only implement the MessageLite interface ineach language, which provides only a subset of the methods of the fullMessage interface.
    option optimize_for = CODE_SIZE;
  • cc_generic_services, java_generic_services, py_generic_services (fileoptions): Generic services are deprecated. Whether or not the protocolbuffer compiler should generate abstract service code based onservices definitions in C++, Java, and Python, respectively.For legacy reasons, these default to true. However, as of version 2.3.0(January 2010), it is considered preferable for RPC implementations toprovidecode generator pluginsto generate code more specific to each system, rather than rely on the“abstract” services.

    // This file relies on plugins to generate service code.option cc_generic_services = false;option java_generic_services = false;option py_generic_services = false;
  • cc_enable_arenas (file option): Enablesarena allocation for C++generated code.

  • objc_class_prefix (file option): Sets the Objective-C class prefix whichis prepended to all Objective-C generated classes and enums from this.proto. There is no default. You should use prefixes that are between 3-5uppercase characters asrecommended by Apple.Note that all 2 letter prefixes are reserved by Apple.

  • packed (field option): Defaults to true on a repeated field of a basicnumeric type, causing a more compactencoding to beused. There is no downside to using this option, but it can be set tofalse. Note that prior to version 2.3.0, parsers that received packed datawhen not expected would ignore it. Therefore, it was not possible to changean existing field to packed format without breaking wire compatibility. In2.3.0 and later, this change is safe, as parsers for packable fields willalways accept both formats, but be careful if you have to deal with oldprograms using old protobuf versions.

    repeated int32 samples = 4 [packed = false];
  • deprecated (field option): If set to true, indicates that the field isdeprecated and should not be used by new code. In most languages this has noactual effect. In Java, this becomes a @Deprecated annotation. For C++,clang-tidy will generate warnings whenever deprecated fields are used. Inthe future, other language-specific code generators may generate deprecationannotations on the field’s accessors, which will in turn cause a warning tobe emitted when compiling code which attempts to use the field. If the fieldis not used by anyone and you want to prevent new users from using it,consider replacing the field declaration with a reservedstatement.

    int32 old_field = 6 [deprecated = true];

Enum Value Options

Enum value options are supported. You can use the deprecated option toindicate that a value shouldn’t be used anymore. You can also create customoptions using extensions.

The following example shows the syntax for adding these options:

import "google/protobuf/descriptor.proto";extend google.protobuf.EnumValueOptions { optional string string_name = 123456789;}enum Data { DATA_UNSPECIFIED = 0; DATA_SEARCH = 1 [deprecated = true]; DATA_DISPLAY = 2 [ (string_name) = "display_value" ];}

See Custom Options to see how to apply custom options to enumvalues and to fields.

Custom Options

Protocol Buffers also allows you to define and use your own options. Note thatthis is an advanced feature which most people don’t need. If you do thinkyou need to create your own options, see theProto2 Language Guidefor details. Note that creating custom options usesextensions,which are permitted only for custom options in proto3.

Option Retention

Options have a notion of retention, which controls whether an option isretained in the generated code. Options have runtime retention by default,meaning that they are retained in the generated code and are thus visible atruntime in the generated descriptor pool. However, you can set retention = RETENTION_SOURCE to specify that an option (or field within an option) must notbe retained at runtime. This is called source retention.

Option retention is an advanced feature that most users should not need to worryabout, but it can be useful if you would like to use certain options withoutpaying the code size cost of retaining them in your binaries. Options withsource retention are still visible to protoc and protoc plugins, so codegenerators can use them to customize their behavior.

Retention can be set directly on an option, like this:

extend google.protobuf.FileOptions { optional int32 source_retention_option = 1234 [retention = RETENTION_SOURCE];}

It can also be set on a plain field, in which case it takes effect only whenthat field appears inside an option:

message OptionsMessage { int32 source_retention_field = 1 [retention = RETENTION_SOURCE];}

You can set retention = RETENTION_RUNTIME if you like, but this has no effectsince it is the default behavior. When a message field is markedRETENTION_SOURCE, its entire contents are dropped; fields inside it cannotoverride that by trying to set RETENTION_RUNTIME.

Note

Asof Protocol Buffers 22.0, support for option retention is still in progress andonly C++ and Java are supported. Go has support starting from 1.29.0. Pythonsupport is complete but has not made it into a release yet.

Option Targets

Fields have a targets option which controls the types of entities that thefield may apply to when used as an option. For example, if a field hastargets = TARGET_TYPE_MESSAGE then that field cannot be set in a custom optionon an enum (or any other non-message entity). Protoc enforces this and willraise an error if there is a violation of the target constraints.

At first glance, this feature may seem unnecessary given that every customoption is an extension of the options message for a specific entity, whichalready constrains the option to that one entity. However, option targets areuseful in the case where you have a shared options message applied to multipleentity types and you want to control the usage of individual fields in thatmessage. For example:

message MyOptions { string file_only_option = 1 [targets = TARGET_TYPE_FILE]; int32 message_and_enum_option = 2 [targets = TARGET_TYPE_MESSAGE, targets = TARGET_TYPE_ENUM];}extend google.protobuf.FileOptions { optional MyOptions file_options = 50000;}extend google.protobuf.MessageOptions { optional MyOptions message_options = 50000;}extend google.protobuf.EnumOptions { optional MyOptions enum_options = 50000;}// OK: this field is allowed on file optionsoption (file_options).file_only_option = "abc";message MyMessage { // OK: this field is allowed on both message and enum options option (message_options).message_and_enum_option = 42;}enum MyEnum { MY_ENUM_UNSPECIFIED = 0; // Error: file_only_option cannot be set on an enum. option (enum_options).file_only_option = "xyz";}

Generating Your Classes

To generate the Java, Kotlin, Python, C++, Go, Ruby, Objective-C, or C# codethat you need to work with the message types defined in a .proto file, youneed to run the protocol buffer compiler protoc on the .proto file. If youhaven’t installed the compiler,download the package and follow theinstructions in the README. For Go, you also need to install a special codegenerator plugin for the compiler; you can find this and installationinstructions in the golang/protobufrepository on GitHub.

The Protocol Compiler is invoked as follows:

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
  • IMPORT_PATH specifies a directory in which to look for .proto files whenresolving import directives. If omitted, the current directory is used.Multiple import directories can be specified by passing the --proto_pathoption multiple times; they will be searched in order. -I=_IMPORT_PATH_can be used as a short form of --proto_path.

  • You can provide one or more output directives:

    • --cpp_out generates C++ code in DST_DIR. See theC++ generated code referencefor more.
    • --java_out generates Java code in DST_DIR. See theJava generated code referencefor more.
    • --kotlin_out generates additional Kotlin code in DST_DIR. See theKotlin generated code referencefor more.
    • --python_out generates Python code in DST_DIR. See thePython generated code referencefor more.
    • --go_out generates Go code in DST_DIR. See theGo generated code referencefor more.
    • --ruby_out generates Ruby code in DST_DIR. See theRuby generated code referencefor more.
    • --objc_out generates Objective-C code in DST_DIR. See theObjective-C generated code referencefor more.
    • --csharp_out generates C# code in DST_DIR. See theC# generated code referencefor more.
    • --php_out generates PHP code in DST_DIR. See thePHP generated code referencefor more.

    As an extra convenience, if the DST_DIR ends in .zip or .jar, thecompiler will write the output to a single ZIP-format archive file with thegiven name. .jar outputs will also be given a manifest file as required bythe Java JAR specification. Note that if the output archive already exists,it will be overwritten.

  • You must provide one or more .proto files as input. Multiple .protofiles can be specified at once. Although the files are named relative to thecurrent directory, each file must reside in one of the IMPORT_PATHs sothat the compiler can determine its canonical name.

File location

Prefer not to put .proto files in the samedirectory as other language sources. Considercreating a subpackage proto for .proto files, under the root package foryour project.

Location Should be Language-agnostic

When working with Java code, it’s handy to put related .proto files in thesame directory as the Java source. However, if any non-Java code ever uses thesame protos, the path prefix will no longer make sense. So ingeneral, put the protos in a related language-agnostic directory such as//myteam/mypackage.

The exception to this rule is when it’s clear that the protos will be used onlyin a Java context, such as for testing.

Supported Platforms

For information about:

Language Guide (proto 3) (2024)
Top Articles
Latest Posts
Article information

Author: Melvina Ondricka

Last Updated:

Views: 6117

Rating: 4.8 / 5 (68 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.