A VLA (dynamic-length array) implementation

Original link: https://blog.codingnow.com/2022/06/vla.html

VLA (dynamic-length array) is a very convenient language feature added by the C language after C99, but MSVC has clearly not supported VLA. And Linux kernel code used to use VLA, and now VLA has been removed. It appears that VLAs pose more security concerns than convenience.

However, in daily development in C language, variable-length arrays are often needed. Since there are many problems with VLA directly using C language, it is better to implement an additional one. C does not have template support like C++, and it is difficult for general-purpose VLA implementations to be type-safe. Even with C++, vector in STL, the most commonly used VLA implementation, is not always suitable for application scenarios. For example, std::vector’s data is generally allocated on the heap, not the stack. It may have performance implications and may create more heap fragmentation than an array created natively on the stack.

I think a generic VLA library should do:

  1. Strongly typed support, and no need to specify the data type every time the related API is called.
  2. When we use VLA on the stack, we should try to get as close to the performance of the native array as possible and avoid allocating memory from the heap as much as possible.
  3. VLAs make it easy to pass references between functions.
  4. References to VLAs are persistent.
  5. Accessing VLA data can be done inline, avoiding extra function calls as much as possible.

Since my C code interacts a lot with Lua, it would be a nice bonus if the VLA could use Lua to manage memory instead of using C’s memory heap directly. In this way, when VLA is temporarily used in Lua functions, small blocks of memory can be allocated directly on the C Stack; large blocks of memory can be used Lua’s temporary userdata, and then handed over to Lua’s GC for recycling. We don’t have to explicitly destroy the VLA object when we exit the function call.

I have written several versions of the VLA library before and I am not very satisfied. Recently, I found some skills and re-implemented a version. The above requirements are basically satisfied.

The first tricky question is, how can the C language achieve a general and type-safe VLA? I solved it like this:

For a VLA object, it is actually divided into two parts, one is the pointer used for data access, which I call the accessor, and the other is the data of the VLA itself, including actual data and metadata. Metadata generally includes the size and capacity of the array, and the actual data is generally stored in contiguous memory space.

The accessor itself needs to conform to the type of the data inside the VLA. The C language is weak in generic support, and the traditional method is to use void * to refer to the data area, which makes many APIs implemented by VLA difficult to use.

My VLA implementation introduces two concepts. For the VLA object itself, I use an abstract vla_handle_t type to hold the reference, and the accessor uses the raw pointer, which is directly the type of the internal data. Because the accessor itself can be constructed from the handle very cheaply, it does not need to be persisted. This leaves room for macro tricks.

So, the final API is vla_using(name, type, handle) . This is a macro whose role is to create an accessor named name on the stack and associate the related VLA object with handle. The following is a simplified schematic version (actually more complicated than this because of the need to implement more functions):

 #define vla_using(name, type, handle) \     type * name; \     vla_handle_t * name##_ref_ = &handle; \     init_vla_accessor((void **)&name, name##_ref_); 

When using this api, the stack actually has two more variables. One is an accessor with a user-specified name, which is a native pointer, so when accessing data, it is no different from a native array; at the same time, a reference to the VLA object itself is recorded on the stack. When we want to operate on the VLA object, we can access it. For example, the API for finding VLA size is also a macro:

 #define vla_size(name) vla_size_(name##_ref_) 

This macro forwards the operation of the pointer accessor to the corresponding VLA object handle.

The second question is how to balance the memory usage of heap and stack.

If there is only one VLA data structure, when we use it on the stack, we tend to reserve a temporary space of several hundred bytes in the data structure. If the data does not exceed this space, there is no need to allocate heap memory. After the function call ends, the stack space is reclaimed immediately. But if we need persistent references to VLA objects, the extra space reserved is too wasteful. Of course, two sets of implementations can be explicitly made, and the users who use them use the corresponding solutions according to the occasion. However, it is still necessary to unify the two, so that there can be a consistent interface. When the module does not care about the details, it can be used consistently.

So I chose to use a handle to represent the VLAs of different implementations.

The third problem is related to Lua. If the temporary memory space on the stack exceeds, we need to apply for additional memory to maintain a larger VLA. In a typical implementation, this additional memory is allocated from the C memory heap. Correspondingly, when the function returns, the memory needs to be released. If we temporarily create userdata of lua, there will be no such trouble. After Lua exits the C function, those temporarily constructed userdata will lose their reference and then be reclaimed by the lua GC. In Lua 5.4, with generational GC enabled, this collection is very timely.

So, we also need a third VLA implementation: using Lua to manage memory.

Lua’s memory management is very different from traditional C programs. It is not paired with allocation/deallocation calls, but around references. When we do Lua binding for C modules, we often encounter a situation where a VLA object is required in a fixed C Struct. At this time, if the VLA also uses Lua to manage the memory lifetime, there is no need to add additional gc metamethods to C Struct, but only the userdata used by the VLA object is referenced on the host object through uservalue.

Our VLA module needs to be compatible with this usage.

In summary, I have initially implemented a version .

This article is reproduced from: https://blog.codingnow.com/2022/06/vla.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment