A Lock-Free Sorted List, my implementation

Chris Benesch
7 min readAug 14, 2021

The holy grail of modern computing is the concept of a lock free list. This is a list of objects in memory that does not have to wait for other processes to complete to add, read, or delete from it. It is commonly referred to as a ring buffer, however the only difference between what I needed and that is that I didn’t tie the head to the tail, so its a list.

I needed a very simple data structure, a time_t for a time index into an array of metadata represented as a JSON string in Qts QByteArray object. All the while, multiple producers, multiple consumers, and a cleaning thread deleting and moving off expired entries into permanent storage (BDB). Being sorted would be a huge improvement speed wise, as the common case was an entry gets added, and another process comes along soon and gets it.The biggest thing to consider is not necessarily the edge cases, but the common use cases and optimize for that. I knew the lock free list existed but I could only find scant resources on it and nothing seemed to really “fit” what I needed so I set about writing one that worked and hopefully sharing my implementation will help someone.

The theory behind it is a singly linked list with atomic pointers to the next item in the list. Now why atomic pointers instead of just mutex guarding sensitive code? For one, multiple threads can read and write at the same time. To maintain integrity with mutexes, a lock would need to be obtained for each operation. Even then, sometimes mutexes can be defeated. Also, mutexes are implemented at the operating system level whereas atomic operations are implemented at the hardware level. Although the operations are short in duration, every thread wanting access will have to wait its turn in a mutex world. In the lock free world, everyone can do what they need to and wait for only a single memory address to be written to. You get 2, maybe 3 threads simultaneously working on it before saturation occurs and you spend more time locked than working. Using this implementation, our code which had close to a gigabyte of memory usage and 20–30% CPU usage went to barely noticeable CPU and 1/4 the memory (less copying).

First of all, I wrote it as a template with C++11/14 constructs, so you can have any data type you want being used, I figured this may be useful to others.

#ifndef LOCKFREESORTEDLIST_H
#define LOCKFREESORTEDLIST_H
#include <atomic>
#include <vector>

template <class K,class v>
class LockFreeSortedList
{
public:
LockFreeSortedList():head(nullptr),m_size(0),m_transcount(0) {};
~LockFreeSortedList() {clear();}
struct LockFreeSortedListItem
{
K key;
v value;
std::atomic<struct LockFreeSortedListItem *> next_ptr;
LockFreeSortedListItem(K mkey,v mvalue):key(mkey),value(mvalue),next_ptr(nullptr){}
};

First off, we have a standard header guard, and a class with two template parameters, K and v representing key and value. I also have a size and transaction count. I included atomic for obvious reasons and vector for returns from queries. There is also a private struct used to represent the data itself being stored. As you can tell it is a singly linked list. A doubly linked list introduces a time window for something to go amiss when adjusting both values. At any time you can have changes to the data, but using atomic values, you can ensure they are going to be correct and not junk data, ie dangling pointers.

The clear function is interesting and illustrates that idea:

/**
* @brief clear — clears the list
*/
void clear() {
struct LockFreeSortedListItem *p = head.load(),*n = nullptr;
head = nullptr;
while (p)
{
n = p->next_ptr;
p->next_ptr = nullptr;
delete p;
p = n;
if (!p) // At end, lets see if new stuff came in
{
p = head.load();
head = nullptr;
}
}
m_size.store(0);
m_transcount.store(0);
}

In this case, when hitting the end, it assumes something else may have come along and added items. For retrieving items, we cant rely on pointers as they may become deleted in the intervening time, so although a little expensive, we copy the values themselves to a return array:

/**
* @brief findRange find a range of values given a key range
* @param begin beginning value of key to search for <= 0 means no beginning
* @param end ending value of key to search for <= 0 means no end
* @return a vector of const pointers to values inside the list
*/
std::vector<v> findRange(K begin,K end, bool *scanNeeded = nullptr)
{
m_transcount.fetch_add(1);
struct LockFreeSortedListItem *p = head;
std::vector<v> rv;
if (scanNeeded)
*scanNeeded = true;
while (p)
{
K ik = p->key;
if ((ik == 0 || ik >= begin || begin <= 0) && (ik == 0 || ik <= end || end <= 0))
{
v rba(p->value);
rv.push_back(rba);
} else {
if (ik < begin && scanNeeded)
*scanNeeded = false;
}
p = p->next_ptr;
}
return rv;
}

The purpose of the scanNeeded variable is to indicate that the data we are interested in was all found in memory, or further querying against permanent storage is needed. I personally needed this as the data sets we use can be very large. So far all of this has been relatively similar to standard singly linked list data structures, but the insert is where the magic happens:

/**
* @brief insert creates a new unique node in the list
* @param key
* @param value
* @return true if inserted false if not
*/
bool insert(K key,v value)
{
m_transcount.fetch_add(1);
struct LockFreeSortedListItem *p = head.load();
bool success = false;
if (!p) // new list
{
head.store(new LockFreeSortedListItem(key,value));
m_size.fetch_add(1);
return true;
} else if (p->key < key) { // new head node candidate
struct LockFreeSortedListItem *newItem = new LockFreeSortedListItem(key,value);
newItem->next_ptr = p;
if (!head.compare_exchange_strong(p,newItem))
{
delete newItem;
return insert(key,value); // eventually it will work
}
m_size.fetch_add(1);
return true;
}
// Walk down to find value to stick it between
while (!success)
{
struct LockFreeSortedListItem *n = nullptr;
while (p)
{
if (p->key == key && p->value == value)
return false;
n = p->next_ptr;
if (!n)
break;
if (n->key < key) // Next pointer is lower, so this pointer is the one to insert after
break;
p = n;
}
struct LockFreeSortedListItem *newItem = new LockFreeSortedListItem(key,value);
newItem->next_ptr.store(n);
// We have our new object set up pointing to the next one
// If its changed, we just throw it away and try again.
// compare exchange will do it atomically only if nothing has changed
if (!p->next_ptr.compare_exchange_strong(n,newItem))
{
delete newItem;
p = head.load();
} else {
success = true;
m_size.fetch_add(1);
}
}
return success;
}

In this case, to maintain a sorted list, I have each item insert between the next higher value and the equal or less than value. The key is to set up the entire record beforehand, and swap the pointers once and be done (compare_exchange_strong). That is why there are two pointers in there (p,n). The comparison is between the last known value and the current value. If it has changed, we knew we need to start over. That scenario doesn’t happen very often, but enough to be accounted for. If the expected value matches, the pointer is swapped to the new object which already points to the old next one in line. Now delete is a little complex as well, the purpose is to isolate the node we want to delete then “delete” it.

/**
* @brief remove remove a node with key and value combination
* @param key
* @param value
* @param hints — an optional vector of pointers to internal items to try starting the search at
*/
void remove(K key,v value, std::vector<LockFreeSortedListItem *> *hints = nullptr)
{
m_transcount.fetch_add(1);
struct LockFreeSortedListItem *p = head.load();
struct LockFreeSortedListItem *n = nullptr,*i = nullptr;
bool found = false;

// Check our passed in hints to make this a lot faster
if (hints)
{
for (auto it = hints->begin(); it < hints->end(); it++)
{
i = dynamic_cast<struct LockFreeSortedListItem *>(*it);
if (!i)
continue;
struct LockFreeSortedListItem *j = i->next_ptr;
if (i && j && j->key == key && j->value == value)
{
struct LockFreeSortedListItem *j = i->next_ptr;
if (i->next_ptr.compare_exchange_strong(j,j->next_ptr))
{
p = i;
n = j;
found = true;
m_size.fetch_sub(1);
break;
} else {
break;
}
}
}
}
// Test for head node needing removed
if (!found && p && p->key == key && p->value == value)
{
n = p->next_ptr;
if (head.compare_exchange_strong(p,n))
{
n = p;
found = true;
m_size.fetch_sub(1);
}
}

// Loop till we find it
while (p && !found)
{
n = p->next_ptr;
if (!n)
{
return; // Never found and at end of list, return
}
if (n->key == key && n->value == value)
{
// We found it, n points to the item to remove, p points to previous
if (!p->next_ptr.compare_exchange_strong(n,n->next_ptr)) // Item changed
{
p = head.load(); // start again
} else {
found = true;
m_size.fetch_sub(1);
break; // Exchange happened, n is isolated node now
}
}
p = n;
}
if (found)
delete n;
}

As you may notice I also added a pointer to a vector of “hints” to search for. Let me explain why. Since this is a dynamic singly linked list, there is no “end” pointer, and the removal use case was usually a process scanning through the whole list and wiping out items that shouldn’t be in there anymore. Therefore, I added the “hints” vector so that as you got further and further along, you would not only get more items needing to be removed, but a lot of needless scanning to get there in the first place. That is approaching O(n² ) performance, which is a no-no in computer science. (Unfortunately sometimes there is no good way around it, but not in this case).

For a full example please see: https://github.com/beneschtech/LockFreeSortedList . Its a quick little example program and assumes a *nix environment with gcc, clang or mingw, etc.. and standard make.

Anyway, hopefully this is helpful to someone, and if you have ideas or see holes where this wont work, please dont hesitate to contact me or better yet, make a pull request.

--

--

Chris Benesch

Professional software engineer. Math enthusiast. Entrepreneur at heart. http://www.beneschtech.com