Crafting Cross-platform Memory Allocator in C



This content originally appeared on DEV Community and was authored by Unknown Rori

Trinity Club meeting from Blue Archive

Have you ever wondered on how memory allocation work in most programming languages? Most of the time we do not care about how memory are being handled because we take these feature for granted.

So today in this article we are going to discuss on how memory allocation work by requesting to our operating system, and by doing that we also going to make this allocator work cross-platform.

Table of Content

  • Interlude
  • Getting Started
  • Windows Memory Allocation
  • Linux Memory Allocation
  • Malloc and Free
  • Finishing Thought

Interlude

First off we should install our tools for windows you can install Visual Studio or Msys2, in Linux feel free to use gcc or clang.

So what is allocator?

Arona from Blue Archive

Allocator is a mechanism that manage and distribute memory to a program, but… why do we need it?

Our program might uses memory more than we original asked for, for example our array might need to be resized, and it’s not so efficient if we request memory from OS every time we need.

Why bother on creating this thing?

There are some cases when we are limited on memory space on the hardware or we are our own without help of standard library, or we love learning stuff.

Getting Started

Okay, so far so good now, let’s start creating our project, let’s call it blog_malloc, but before that let’s create a abstraction or should I say helper function that return a requested memory from operating system.

Millenium Gamepad from Blue Archive

First off, we should create file called blog_memory.h we are going to use concept called stb, why we should do this? because it’s much simpler than handling complex dependency if we can just include single header file right?

Let’s start by setting up classic macro so we didn’t accidentally redefine it when we import it on our project along with macro for default implementation.

#ifndef     BLOG_MEMORY_H
#define     BLOG_MEMORY_H

#ifdef BLOG_MEMORY_IMPLEMENTATION
// our implementation detail
#endif

#endif      // BLOG_MEMORY_H

Our ifndef is compile time macro that everything inside it will exist if BLOG_MEMORY_H not defined, but when we define it, it will not include it again, and then we got ifdef this macro is the same as ifndef but in reverse if the BLOG_MEMORY_IMPLEMENTATION is defined then everything inside of it till #endif will exist.

‘Kay now we need to define struct that will hold our memory page and chunk, but what is page and chunk?

Ui from Blue Archive

There is quite loose definition for Chunk but Chunk is just data either in the disk or RAM or the thing we are processing, and for Page is block of memory in RAM that usually be aligned within 4 Kib and it handled by the operating system.

But why do we need to align it?

Hina from Blue Archive
We need these alignment because our OS or should I say our hardware always read on these increment (4 – 8 bytes based on architecture), so if we miss align our memory our hardware require 2 read access to our memory which can be slow, and some hardware expect aligned data as well such as 128 bit operation.

All right time to write our abstraction, firstly we need our malloc and free function definition, we can just copy from stdlib pretty straight forward.

// stuff

#include <stddef.h>

void* blog_malloc(size_t size);
void blog_free(void* ptr);

#ifdef BLOG_MEMORY_IMPLEMENTATION
// our implementation detail
#endif // BLOG_MEMORY_IMPLEMENTATION

// stuff

Good, now we need a way for us to know what the state of memory we requested since we are the one who manage it of course. We can define a struct that act like a header for our memory page or chunk, let’s put it inside our implementation.

// stuff
typedef struct blog_memory_chunk {
    size_t  capacity;
    char    flags;

    struct blog_memory_chunk* next;
} BlogMemoryChunk;

typedef struct blog_memory_page {
    size_t  capacity;

    struct blog_memory_page* next;
    BlogMemoryChunk*         child;
} BlogMemoryPage;
// stuff

As you might notice that our memory chunk has flags which will we use to check if the chunk is in use or not, these struct will act like header for our memory.

We should also put a constants for our in use flag.

#define BLOCK_USE               0b10000000

The concept we are going to use is Linked List because since we are going to have many of our BlogMemoryChunk and BlogMemoryPage we can’t store it as an array because we don’t know the exact size, but with Linked List we can use pointer to point the next data, let me visualize it.

Linked List Visualization

As you can see our ptr will point to next data until the ptr is NULL which we can confidently say that our list is finished, this approach is simpler but if you are cool guy who worried about performance the big O notation of this algorithm is O(n).

Right, next stop is we need function that abstract our call to OS for new memory and put it inside our implementation, let’s call it blog_request_memory_page and blog_free_memory_page for freeing the memory.

// stuff

#ifdef BLOG_MEMORY_IMPLEMENTATION

BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
    // TODO: Implementation
}

void blog_free_memory_page(BlogMemoryPage* ptr)
{
    // TODO: Implementation
}


#endif // BLOG_MEMORY_IMPLEMENTATION


// stuff

We also accepting capacity but we will need to align later, let’s create a macro that handle these memory alignment and put it above our memory header struct.

#define ALIGN(x, align) (((x) + ((align) - 1)) & ~((align) - 1))

These simple macro will help us to align given x with size of alignment, so if we put x is 4 and our alignment is 64, these macro will return 64 because 4 is below 64, and these alignment only work properly in the power of 2.

So now, how do we make this function to be able to operate different operating system seamlessly? that’s right we can use macro for it, we can check the macro that identify OS in this awesome resource.

// stuff
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
    // TODO : Implementation
#elif defined(__linux__)
    // TODO : Implementation
#else
    #error "Unsupported OS"
#endif
}

void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
    // TODO : Implementation
#elif defined(__linux__)
    // TODO : Implementation
#else
    #error "Unsupported OS"
#endif
}
// stuff

If you are not sleeping right now, the defined is just a way for us to check if the _WIN32 is defined by our compiler or by ourself, and error is just way for us to tell the compiler if you reach this part then throw an error.

We also need to put this macro to have conditional import on our implementation.

#if defined(_WIN32) || defined(_WIN64)
    // TODO
#elif defined(__linux__)
    // TODO
#else
    #error "Unsupported OS"
#endif

All right onto next part – Windows | Linux.

Happy Fubuku

Windows Memory Allocation

Now we come to fun part, how do we request memory in Windows?

Yuzu from Veritas

Looking at microsoft documentation we can use VirtualAlloc to request memory from windows and VirtualFree for freeing memory, we will use MEM_COMMIT to commit our memory for usage and PAGE_READWRITE to allow us to read and write onto that memory.

BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
    capacity = ALIGN(capacity + sizeof(BlogMemoryPage), 4096);

    BlogMemoryPage* page = (BlogMemoryPage*)VirtualAlloc(NULL, capacity, MEM_COMMIT, PAGE_READWRITE);


    if (page == NULL) return NULL;

    page->capacity = capacity - sizeof(BlogMemoryPage);
    page->child = NULL;
    page->next = NULL;

    return page;
#elif defined(__linux__)
    // Implementation
#else
    #error "Unsupported OS"
#endif
}

Please note that the page we are allocating only store the capacity it can hold so it doesn’t hold the total of the memory it has, we also setting our child (BlogMemoryChunk) to NULL since we don’t have any data yet and next as null because there is no BlogMemoryPage after this yet.

After that we can add our VirtualFree to our free function.

void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
    VirtualFree(ptr, 0, MEM_RELEASE);
#elif defined(__linux__)
    // Implementation
#else
    #error "Unsupported OS"
#endif
}

Since we want to release the memory we need to set the second parameter as 0 to release all the memory of that page.

Don’t forget to include the windows header

#if defined(_WIN32) || defined(_WIN64)
    #include <Windows.h>
#elif defined(__linux__)
    // TODO
#else
    #error "Unsupported OS"
#endif

Linux Memory Allocation

Now we come to fun part, how do we request memory in Linux?

Yuzu from Veritas

In the man page there is called mmap, these function allow us to request memory from the OS, it has lot of feature it can create mapping to file, creating shared access but let’s talk mapping on the other time but we can use these to allocate memory from OS.

So we can use it like this.

// stuff
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
    // Implementation
#elif defined(__linux__)
    capacity = ALIGN(capacity + sizeof(BlogMemoryPage), 4096);

    BlogMemoryPage* page = (BlogMemoryPage*)mmap(
        NULL, 
        capacity, 
        PROT_READ | PROT_WRITE, 
        MAP_ANONYMOUS | MAP_PRIVATE, 
        -1, 
        0
    );

    if (page == MAP_FAILED) return NULL;

    page->capacity = capacity - sizeof(BlogMemoryPage);
    page->child = NULL;
    page->next = NULL;

    return page;
#else
    #error "Unsupported OS"
#endif
}
// stuff

Since we want our memory is only readable and write-able we can use PROT_READ and PROT_WRITE, we also want our mapping is private too, these function will return NULL if it fail to allocate, don’t forget we need to align it 4 Kib too.

Please note that the page we are allocating only store the capacity it can hold so it doesn’t hold the total of the memory it has, we also setting our child (BlogMemoryChunk) to NULL since we don’t have any data yet and next as null because there is no BlogMemoryPage after this yet.

We also need to free it too, it’s quite easy.

void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
    // Implementation
#elif defined(__linux__)
    munmap(ptr, ptr->capacity + sizeof(BlogMemoryPage));
#else
    #error "Unsupported OS"
#endif
}

We need to pass our pointer and capacity to allow munmap to deallocate.

Don’t forget to include the mman header to allow us to use mmap.

#if defined(_WIN32) || defined(_WIN64)
    // TODO
#elif defined(__linux__)
    #include <sys/mman.h>
#else
    #error "Unsupported OS"
#endif

Malloc and Free

All right we should have working blog_request_memory_page and blog_free_memory_page, now come to fun part.

How do we implement the malloc and free?

Yuuka with Phone from Blue Archive

We can start by defining our number one page and set it as null, we can use this as foundation along with our function implementation of blog_malloc and blog_free, make sure to put this stuff inside the implementation section.

BlogMemoryPage *__ALLOCATOR = NULL;

void* blog_malloc(size_t size) { }

void blog_free(void* ptr) { }

Firstly we need to align the size first, since we are in 64-bit we can go 8 bytes alignment.

size = ALIGN(size, 8);

We can start by checking the __ALLOCATOR is NULL or not blog_malloc and if null then we will allocate it.

void* blog_malloc(size_t size)
{
    size = ALIGN(size, 8);
    if (__ALLOCATOR == NULL) {
        __ALLOCATOR = blog_request_memory_page(size);
        if (__ALLOCATOR == NULL) return NULL;
    }
}

This will ensure us that our __ALLOCATOR always is valid and since our implementation of blog_request_memory_page is automatically align it with 4 Kib we can safely pass size as parameter.

Next part is checking that our page has slot for new memory chunk, but firstly we need to get the chunk first, we can do it via pointer arithmetic.

BlogMemoryChunk *current_chunk = (BlogMemoryChunk)(__ALLOCATOR + 1);

We can use plus 1 to skip the current BlogMemoryPage and we just cast it into BlogMemoryChunk, next we need to initialize it and then return the pointer to the caller but don’t forget to skip the BlogMemoryChunk by using plus 1, we don’t want the end user modify our header.

void* blog_malloc(size_t size)
{
    // stuff

    BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(__ALLOCATOR + 1);
    current_chunk->capacity = size;
    current_chunk->flags = BLOCK_USE;
    current_chunk->next = NULL;
    __ALLOCATOR->child = current_chunk;

    return current_chunk + 1;
}

now this is the simplest and dumbest thing that this function can do, but let’s see if we get proper address.

Create main.c and with this basic example code.

#include <stdio.h>

#define BLOG_MEMORY_IMPLEMENTATION
#include "../blog_memory.h"

int main()
{
    char *ptr = (char*)blog_malloc(4);
    if (ptr == NULL) {
        printf("Memory allocation fail\n");
        return -1;
    }
    printf("Allocated: %p\n", ptr);


    ptr[0] = 'H';
    ptr[1] = 'i';
    ptr[2] = '!';
    ptr[3] = '\0';
    printf("%s\n", ptr);

    return 0;
}

If everything is all right and it should print Hi! along with memory address, under the hood blog_malloc will actually give 8 bytes because of the alignment.

Okay now we need to free it, if we don’t free it we got memory leak, let’s implement our blog_free function, it actually simple, we just set the flag as free one by removing the bit that indicate being use.

void blog_free(void* ptr)
{
    BlogMemoryChunk* current = ((BlogMemoryChunk*)ptr) - 1;
    current->flags = current->flags & (~BLOCK_USE);
}

We are using bit masking by negating the BLOCK_USE and use and to remove the bits that set BLOCK_USE inside current->flags.

By using these flag technique we can reuse old BlogMemoryChunk without worrying another allocation from OS.

Now if we do another memory allocation it will corrupt our current allocated memory because we didn’t make search for empty space, so we need to rewrite our blog_malloc.

Confused Hare from Blue Archive

Let’s start by searching empty spot inside our BlogMemoryPage, we can use while loop .

// stuff
size = ALIGN(size, 8);
if (__ALLOCATOR == NULL) {
    __ALLOCATOR = blog_request_memory_page(size);
    if (__ALLOCATOR == NULL) return NULL;
}

BlogMemoryChunk* chunk = __ALLOCATOR->child;
BlogMemoryChunk* last  = NULL;
while (chunk) {
    last = chunk;
    chunk = chunk->next;
}

if (last == NULL) {
    BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(__ALLOCATOR + 1);
    current_chunk->capacity = size;
    current_chunk->flags = BLOCK_USE;
    current_chunk->next = NULL;
    __ALLOCATOR->child = current_chunk;

    return current_chunk + 1;
}

BlogMemoryChunk *current_chunk = (last + 1);
current_chunk->capacity = size;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
last->next = current_chunk;

return current_chunk + 1;

// stuff

With this loop we can loop over our Linked List BlogMemoryChunk until we get NULL and then in the bottom, if we got last == NULL basically we are new to the page so we use the pointer of __ALLOCATOR as a base, if not then we use the last pointer.

With this setup we got new chunk, but we might overflow our page if we keep continue like this so we need to calculate the current allocated BlogMemoryChunk when we loop, and then we will do check.

// stuff
size_t allocated = 0;
BlogMemoryChunk* chunk = __ALLOCATOR->child;
BlogMemoryChunk* last  = NULL;
while (chunk) {
    allocated += chunk->capacity + sizeof(BlogMemoryChunk);

    last = chunk;
    chunk = chunk->next;
}

if (allocated + sizeof(BlogMemoryChunk) + size >= __ALLOCATOR->capacity) {
    return NULL; // We need to request new page
}
// stuff

Now we are missing one thing, if you are not sleeping you might be remember that when we free we don’t deallocate on OS level but just setting the flag right? We can check flag on BlogMemoryChunk as we loop.

// stuff
while (chunk) {
    allocated += chunk->capacity + sizeof(BlogMemoryChunk);
    if (!(chunk->flags & BLOCK_USE) && chunk->capacity >= size) {
        chunk->flags |= BLOCK_USE;
        return chunk + 1;
    }

    last = chunk;
    chunk = chunk->next;
}
// stuff

Yatta, now we can reuse old BlogMemoryChunk now.

Happy Fubuku

But are we forgetting something? yeah if we are out of space on our page we need to allocate new page. We can approach the solution in many way but we are going to divide our problem into smaller one first.

We have

  1. Allocate allocator if null
  2. Allocate a chunk if it have empty slot
  3. Allocate new page and append it to parent next pointer
  4. Go back to number 2

We can start by creating utility function that handle allocating allocator if null.

static BlogMemoryPage* __blog_init_allocator(size_t capacity)
{
    if (__ALLOCATOR != NULL) return __ALLOCATOR;
    __ALLOCATOR = blog_request_memory_page(capacity);
    return __ALLOCATOR;
}

By using static keyword we can say that this function is only exist on that file so with current setup it can only be accessed inside main.c.

Now next part is to allocate chunk if it have empty slot, we can create a function that received the BlogMemoryPage and return a pointer of new BlogMemoryChunk.

static BlogMemoryChunk* __blog_allocate_chunk_on_page(size_t capacity, BlogMemoryPage* page)
{
    size_t allocated = 0;
    BlogMemoryChunk* chunk = page->child;
    BlogMemoryChunk* last  = NULL;
    while (chunk) {
        allocated += chunk->capacity + sizeof(BlogMemoryChunk);
        if (!(chunk->flags & BLOCK_USE) && chunk->capacity >= capacity) {
            chunk->flags |= BLOCK_USE;
            return chunk + 1;
        }

        last = chunk;
        chunk = chunk->next;
    }

    if (allocated + sizeof(BlogMemoryChunk) + capacity >= page->capacity) {
        return NULL;
    }

    if (last == NULL) {
        BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(page + 1);
        current_chunk->capacity = capacity;
        current_chunk->flags = BLOCK_USE;
        current_chunk->next = NULL;
        page->child = current_chunk;

        return current_chunk;
    }


    BlogMemoryChunk *current_chunk = (last + 1);
    current_chunk->capacity = capacity;
    current_chunk->flags = BLOCK_USE;
    current_chunk->next = NULL;
    last->next = current_chunk;


    return current_chunk;
}

you might notice that this is actually code from our blog_malloc with exception of instead of using __ALLOCATOR we use page from parameter.

Next part is allocating new page if we need, since we know that our function that return new BlogMemoryChunk return NULL if it doesn’t have enough space we can do this on our blog_malloc.

void* blog_malloc(size_t size)
{
    size = ALIGN(size, 8);
    if (__blog_init_allocator(size) == NULL) return NULL;

    BlogMemoryChunk* ptr = __blog_allocate_chunk_on_page(size, __ALLOCATOR);

    BlogMemoryPage* page = __ALLOCATOR;
    while(ptr == NULL) {
        if (page->next == NULL) {
            page->next = blog_request_memory_page(size);

            if (page->next == NULL) return NULL;
        }

        page = page->next;
        ptr = __blog_allocate_chunk_on_page(size, page);
    }

    return ptr + 1;
}

We change our initialization for our allocator at the beginning of the function, and we loop through our page Linked List if we found the NULL in the next ptr we just ask the OS for new page, don’t forget to add 1 before we return the pointer, we don’t want the user to modify header freely.

If we try our code in main.c instead of 4 we use 4096 as the size…

Panic Veritas member from Blue Archive

We got infinite loop, this is flaw on current implementation, because the size doesn’t account our BlogMemoryChunk header struct so it just keep allocating new page until we run out of memory.

The fix is kinda simple we just add our size of our BlogMemoryChunk before we request new memory page.

// stuff
if (page->next == NULL) {
    page->next = blog_request_memory_page(size + sizeof(BlogMemoryChunk));

    if (page->next == NULL) return NULL;
}
// stuff

That’s it we finished this whole ordeal and if you are not sleeping congratulation for reaching this part!

Finishing Thought

Milenium Conference

We are successfully craft our own allocator, it was interesting recreational programming session, we learn a lot of stuff from Linked List, API from OS, memory alignment.

There still lot of improvement or optimization we can made or maybe edge cases that we didn’t dealt with and we still missing freeing our page but that is not necessarily required for short live application but if you want to create that I will leave it as homework.

The entire source code can be accessed on this url :
https://github.com/UnknownRori/blog_memory

That’s it from me folks if there are feedback let me know. So go on do whatever you usually do. Cheers!


This content originally appeared on DEV Community and was authored by Unknown Rori