Data Structure Optimization of Ordinal Sequence

Original link: https://blog.codingnow.com/2023/07/monotonic_array.html

In our ecs module, an important internal data structure is an array of eids. It is part of the Component structure, indicating which Entity each Component belongs to. Currently, it is implemented as an ordered array of ids.

The common operations of this data structure are: traversal, random access, and finding the location of the id. A sorted array does the job well. O(1) random access time, O(Log N) lookup time.

Our Entity ID is 48bit, I think it is a bit wasteful to save 48 or 64bit id array, and the larger the space, the lower the actual operation efficiency will be. In fact, the number of Entity will not be too many, I think it is enough to limit to 2^24 (about 16 million), so an indirect index is used to save the 24bit id, and then use it to index the real entity id .

Three bytes for the internal id sounds a bit awkward, since memory accesses are not aligned. But 4 bytes is a bit wasteful, and I worry that 2 bytes is not enough. So, last month, I tried to do a little optimization:

Group data by a fixed 32 (or at most 256) elements. Each group of 64bits header + 32 * 8bits. The header holds the high 40bits of the first element, followed by the low 8bits of each element. Each group is divided into three sections: 1. Same high order 2. High order +1 3. Other discrete data. 2 bytes are used in the header to save the number of 1 and 2 segments. Use 16bits to save the offset of discrete data.

All discrete data are additionally stored in 40+24 bits array in turn. 40 bits high, 24bits is its index. The discrete segments of each group in front are continuous in this discrete array, and can be directly indexed by offset. So: any random access is strictly O(1).

In the worst case, the data structure degenerates into a 64bits array of size n * 31 / 32 plus a fixed overhead of 10bits per element. The number of comparisons for binary search does not worsen in the worst case.

I tried to implement it, but after writing it, I found that the code was too complicated. While big-O estimation performance isn’t an issue, it’s actually much worse than flat arrays. So I dropped the idea.

Of course, the more important reason that prompted me to give up is: In our actual game measurement, the peak number of Entity is only about 10,000, which is less than the million level considered in the design. If the number of entities can be controlled below 64K, a 2-byte internal id can be used directly.


So I turned to another question:

A flat ordered array, its insertion and deletion efficiency is very low, and the time complexity is O(n). Is there room for optimization here?

Of course, in our ecs system, it is not allowed to insert Component (Entity must determine its Component composition when it is created), and there is no problem in actual use. The delete operation is concentrated on each frame. That is, when the frame deletes all Components, the overhead is O(n) instead of O(n^2).

But tag is an exception, a Component without data is called tag, it is only used for Entity screening. Tags can be added and removed dynamically. I did some optimizations specifically for it in the first two years. The complexity of deletion can be reduced to O(log N), addition is at worst O(n) but can usually be smaller than that.

How to optimize the insertion and deletion of ordered arrays? This is actually a classic question. The best known methods are B-trees and skip lists.

The idea of ​​the B tree is: when we save this sequence, we divide it into segments, and each segment is not filled. In this way, when inserting, it usually only needs to move a section of data, instead of processing the entire array.

The idea of ​​the jump table is: use a linked list to save the sequence, so that the strategy of merging the freelist can insert or delete elements at a time cost of O(1). However, the linked list does not support random access, and the time complexity of O(n) is needed to index the elements in a specific position. In order to solve this problem, an extra space is added in exchange for time, a number of elements are separated, and a number of high-level linked lists are added. In this way, during random access, iterating from the high level to the low level can be retrieved with a time complexity of about O(Log N).

The problem I am facing is that the data scale is not large, ranging from hundreds to tens of thousands, and rarely more than 100,000. It is completely possible to design a simpler and more efficient data structure in a targeted manner. This is what I achieved today, it is like a combination of B-tree and table adjustment ideas:

I group the series in groups of 256 and save them in sections. Each group has a small header. Within each segment of 256 slots is a full circular queue. Only the last group is an exception: it may be dissatisfied.

When we do random access, we first calculate which group it is, and then do a simple modulo operation combined with the head position of the circular queue to locate the element.

When deleting an element, delete it in the group first, and move up to 127 elements (maybe scrolling forward or backward), and then move in the first element of each subsequent group in turn.

Adding an element is the inverse of deleting, and it also moves up to 127 + n elements, where n is the number of groups.


Because in most cases, the number of our entities is not too much, so the number of internal id is not large, I think 2 bytes are enough. However, I still want to support the case of more than 64K. Therefore, I divide the above grouping into two types: dense and sparse.

The so-called dense grouping, the difference between the smallest element and the largest element in the grouping does not exceed 64k, so a 32bit base is saved in the header, and the int16 delta is saved in the actual chunk. The delta is a signed number ranging from -32K to 32K. If the grouping cannot be saved in this form, use sparse grouping to directly save 256 4-byte ids. Sparse grouping requires exactly twice as much memory as dense grouping.

In order to manage these two different types of groups, I use freelist, each node is the size of a sparse group (1024 bytes), it can save two dense groups at the same time.

Most ids are stored in compact groups. Two groups next to each other occupy a node. If we need to promote a group from dense to sparse, first check whether it has a partner. If there is, set the partner as single, and allocate an independent new node from the freelist; if there is no partner, just monopolize the current node directly.

Downgrading from sparse to dense just needs to be done in reverse.


Today, I implemented it in C language. At O3 optimization, random access efficiency is almost the same compared to a flat simple array; while binary search performance suffers about 5% to 10%. The larger the array, the smaller the performance gap. I think it is because the traditional array is 32bits per element, but most of the current implementations are 16bits, which is more friendly to memory access. Moreover, the base information carried in the header also facilitates the preliminary retrieval in the first step.

This article is transferred from: https://blog.codingnow.com/2023/07/monotonic_array.html
This site is only for collection, and the copyright belongs to the original author.