Space-first protobuffer/json decoder

Original link: https://blog.codingnow.com/2022/08/memory_compat_protobuffer_json_unmarshaling.html

Today, my colleague sent me a post , talking about the problem that when golang processes a large number of concurrent json / protobuffer unmarshaling transactions, a large amount of (10GB) temporary memory may be generated, which cannot be recovered in time.

My point is: if a module of a system is likely to use 10G of memory, then it must be a core problem that needs to be treated specially. The core problem should have a way to consider the core problem. This is not the fault of the GC, nor is it a solution to manually managing memory. Even if you manage memory manually, it is nothing more than transferring the management of memory blocks to a “heap” data structure that you usually don’t want to care about. I hope someone has implemented a general solution to help you solve it as much as possible. If you use it at will, there will be problems like memory fragmentation that cannot be merged, eating up your extra memory. If it exists in your core module, you also need to consider carefully.

On this specific issue, I think it is better to simplify the data structure first and make the core data structure manageable. Obviously, fixed-size data structures are the easiest to manage. Because if your data structure is fixed size, there is no need for heap management anymore, but a fixed array scrolling to handle temporary data.

A flexible data structure like json does not seem to be easy to describe with a fixed-size data structure, but it is not difficult to do it with a little thought. C has a lot of libraries in this area, so I won’t list them one by one. The core idea is generally: use a fixed data structure to reference slices in encoded data. The decoded structure preserves only the element’s type and position in the original data. If you can roughly know the order of magnitude of the data elements that need to be decoded, then you can use a fixed-size structure to carry the decoded results without any dynamic memory allocation.

It’s just that most libraries focus on the time factor to make decoding faster; less on the space factor, so that the decoding process uses less memory; so this decoding method is not commonly used. If your core problem is this, you have to consider such a plan.

protobufffer would be more suitable for this than json structure. Because it has converted the key part of the data into an array id, and the overall data structure is clear. But just because protobuffer encoding is more complicated, not as simple as json, but there are fewer related libraries.


Second, if you really have such a need: to decode thousands of data packets with similar data structures. The following patterns should be considered:

Assuming the type of data block you want to decode is X , you can pre-generate an accessor called X . When your business logic needs to access the Xab field, you should call Xab(binary), where binary refers to the encoded json/protobuffer data block, Xab() is a pre-generated accessor function, which can Extract the required data from the data block in an optimized way.

We can further make an index function for X, which can preprocess binary and make an index information structure with a fixed memory size to speed up the access of specific fields.

The accessor object doesn’t really care how complicated it is, since it may be initialized only once in the entire program.

This article is reproduced from: https://blog.codingnow.com/2022/08/memory_compat_protobuffer_json_unmarshaling.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment