2023-18: Increase OpenDAL KV performance by 1000%

Original link: https://xuanwo.io/reports/2023-18/

This weekly report shares a low-hanging fruit of OpenDAL: OpenDAL KV performance increased by 1000% by removing additional copy overhead!

background

OpenDAL added Kv Adapter a long time ago: by abstracting the common GET/SET operations of the Key-Value storage backend to greatly simplify the cost of docking a kv service. Maintainers need to simply implement functions such as GET/SET/DELETE to be able to connect to storage backends such as Redis/HashMap:

 #[async_trait]  
pub trait Adapter: Send + Sync + Debug + Unpin + 'static {  
 async fn get ( & self, path: & str ) -> Result < Option < Vec < u8 >>> ;  
  
 async fn set ( & self, path: & str , value: & [ u8 ]) -> Result < () > ;  
  
 async fn delete ( & self, path: & str ) -> Result < () > ;  
}  

But for pure memory data structures, Kv Adapter is not zero-overhead:

  • When writing, additional multiple copies of the data are required
  • All data needs to be copied out when reading

Not only that, but the current Kv Adapter can only store byte streams and cannot store additional metadata, which limits its usage scenarios unnecessarily.

Improve

Issue kv: Use Bytes as value to avoid copy between read/write proposes that Bytes can be stored in map instead of Vec<u8> to avoid additional copying, and Alternative implementation for memory backend further points out that we can store in map A custom data structure in which metadata can be attached.

Combining the above two ideas, I propose a new Typed Kv Adapter :

 #[derive(Debug, Clone)]  
pub struct Value {  
 pub metadata: Metadata ,  
 pub value: Bytes ,  
}  
  
#[async_trait]  
pub trait Adapter: Send + Sync + Debug + Unpin + 'static {  
 async fn get ( & self, path: & str ) -> Result < Option < Value >> ;  
  
 async fn set ( & self, path: & str , value: Value ) -> Result < () > ;  
  
 async fn delete ( & self, path: & str ) -> Result < () > ;  
}  

Introduce a self-designed Value structure in the data structure, store Bytes that support zero-overhead copying instead of Vec<u8> , so that our Adapter can read and write data without additional copying.

migrate moka

Migrating a service from kv::Adapter to typed_kv::Adapter is very simple, we only need to modify its definition statement:

 - inner: SegmentedCache<String, Vec<u8>>, + inner: SegmentedCache<String, typed_kv::Value>,

Then modify the interface corresponding to the Adapter, for example:

 - async fn get(&self, path: &str) -> Result<Option<Vec<u8>>> + async fn get(&self, path: &str) -> Result<Option<typed_kv::Value>>

Effect

In the demo PR, I migrated moka from kv::Adapter to typed_kv::Adapter . Across all benchmarks, read and write performance improved significantly. Among them, the lowest read increased by 10% , and the highest increased by 1519.8% ; the write part benefited from Bytes type implemented with zero overhead, and achieved a performance improvement of 166323% on service_moka_write_once/16.0 MiB .

next step

Next, OpenDAL will migrate all memory backends to typed_kv::Adapter and add more kv backend support. Then you can perform a horizontal byte read and write performance comparison to see which memory backend has better performance in the standard usage scenario of OpenDAL~

This article is transferred from: https://xuanwo.io/reports/2023-18/
This site is only for collection, and the copyright belongs to the original author.