python 序列化存储和读取多个 protobuf 对象

2021/6/12 1:20:56

编程Tag： 序列化 protobuf 字节 python write 22 b1 flen

本文主要是介绍python 序列化存储和读取多个 protobuf 对象，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

个人公众号“代码就是生产力”，发布更多有用的工具

先定义下要解决的问题：由于protobuf存储和传输数据的速度特别快，所以我们希望用它来存储和读取数据，存储的数据里面有多个 protobuf 对象，但是读取的时候只能读取到最后一个，例如: 我顺序存储了10个 protobuf 对象到二进制文件，但是读取的时候，只能读取到最后一个，本篇文章就是提出了一个解决这个问题的方案。

【Protobuf】proto二进制文件的生成与解析（附完整源码）_Yngz_Miao的博客-CSDN博客

上面链接的文章也是在尝试解决这个问题，不过思路略微有所区别，也可以参考，它的思路是：

在每段序列化的二进制数据前，都放置4个字节大小的内容，这块内容用来保存接下来的二进制数据的字节长度。
字节长度通过以下方法获得：
proto_len = obj.ByteSize()

我的思路是另外写一个TXT文件，专门用于保存每个 protobuf 对象的字节长度，这个方案没有上面博客的优雅，但是实现的代码数量和难度降低很多，以 python 代码为例，展示下它的流程。

path = r’point.txt’
pathlen = r’point_len.txt’

import ImuPointCloud_pb2

#-------------------------save part---------------------------------------------

b = ImuPointCloud_pb2.Cloud() # 第一个对象
b.id = 11
po = b.Points.add()
po.x = 11
po.y = 11
po.z = 11
b.latitude = 11.88
b.lon = 11
b.yaw = 11
b.speed = 11.4
b.yawRate = 11.99
b.imuTime = 11.99
b.lidarTime = 11

b1 = ImuPointCloud_pb2.Cloud() # 第二个对象
b1.id = 22
po1 = b1.Points.add()
po1.x = 22
po1.y = 22
po1.z = 22
b1.latitude = 22.88
b1.lon = 22
b1.yaw = 22
b1.speed = 22.4
b1.yawRate = 22.99
b1.imuTime = 22.99
b1.lidarTime = 22

flen = open(pathlen, ‘w’)
with open(path, “wb”) as f:
bb = b.SerializeToString()
bb1 = b1.SerializeToString()
# print(len(bb), len(bb1))

flen.write(str(len(bb)))   # 每次写一个对象的同时也写入字节长度
flen.write('\n')
f.write(b.SerializeToString())

flen.write(str(len(bb1)))
flen.write('\n')
f.write(b1.SerializeToString())

flen.close()

#-------------------------read part---------------------------------------------

flen = open(pathlen, ‘r’)
lens = flen.readlines()

a = ImuPointCloud_pb2.Cloud()
try:
with open(path, ‘rb’) as f:
for le in lens:
print(int(le))
a.ParseFromString(f.read(int(le)))
print(a)
except IOError:
print(path+ “: not found”)

输出两个 protobuf 对象：

这种现象的原因如下：

if you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

如果您想将多条消息写入单个文件或流，则由您来跟踪一条消息的结束位置和下一条消息的开始位置。 Protocol Buffer 有线格式不是自定界的，因此 Protocol Buffer 解析器无法自行确定消息的结束位置。解决这个问题最简单的方法是在写消息本身之前写下每条消息的大小。当您读回消息时，您读取大小，然后将字节读入单独的缓冲区，然后从该缓冲区解析。（如果您想避免将字节复制到单独的缓冲区，请查看 CodedInputStream 类（在 C++ 和 Java 中），它可以被告知将读取限制为一定数量的字节。）

从上述描述可以自述，PB的格式是非描述性的，ParFromIstream会读完整流个，解析里面的key-value对，并具体确定key的值（key由字段号标识）设置值。

简单的说，就是protobuf 存储的多个消息不具有自描述性，必须自己确定多个消息的开始和结束位置。

另外有个要注意的点是，如果用C++ 解析的时候，如果消息的字节数非常大，这个时候不能使用临时变量来存储消息，必须使用全局变量。

for(const int le: len_vec){
result_proto.Clear();
char* temp = &gReadBuf[0]; // gReadBuf 是一个容量很大的数组，它的容量和你要解析的消息的大小匹配
ifsp.read(temp, le);
string temp2(temp, le);
result_proto.ParseFromString(temp2);

}

这篇关于python 序列化存储和读取多个 protobuf 对象的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

python 序列化存储和读取多个 protobuf 对象

相关编程文章