This content originally appeared on DEV Community and was authored by Unknown Rori
Have you ever wondered on how memory allocation work in most programming languages? Most of the time we do not care about how memory are being handled because we take these feature for granted.
So today in this article we are going to discuss on how memory allocation work by requesting to our operating system, and by doing that we also going to make this allocator work cross-platform.
Table of Content
- Interlude
- Getting Started
- Windows Memory Allocation
- Linux Memory Allocation
- Malloc and Free
- Finishing Thought
Interlude
First off we should install our tools for windows you can install Visual Studio or Msys2, in Linux feel free to use gcc
or clang
.
So what is allocator
?
Allocator is a mechanism that manage and distribute memory to a program, but… why do we need it?
Our program might uses memory more than we original asked for, for example our array might need to be resized, and it’s not so efficient if we request memory from OS every time we need.
Why bother on creating this thing?
There are some cases when we are limited on memory space on the hardware or we are our own without help of standard library, or we love learning stuff.
Getting Started
Okay, so far so good now, let’s start creating our project, let’s call it blog_malloc
, but before that let’s create a abstraction or should I say helper function that return a requested memory from operating system.
First off, we should create file called blog_memory.h
we are going to use concept called stb, why we should do this? because it’s much simpler than handling complex dependency if we can just include single header file right?
Let’s start by setting up classic macro so we didn’t accidentally redefine it when we import it on our project along with macro for default implementation.
#ifndef BLOG_MEMORY_H
#define BLOG_MEMORY_H
#ifdef BLOG_MEMORY_IMPLEMENTATION
// our implementation detail
#endif
#endif // BLOG_MEMORY_H
Our ifndef
is compile time macro that everything inside it will exist if BLOG_MEMORY_H
not defined, but when we define it, it will not include it again, and then we got ifdef
this macro is the same as ifndef
but in reverse if the BLOG_MEMORY_IMPLEMENTATION
is defined then everything inside of it till #endif
will exist.
‘Kay now we need to define struct that will hold our memory page and chunk, but what is page and chunk?
There is quite loose definition for Chunk but Chunk is just data either in the disk or RAM or the thing we are processing, and for Page is block of memory in RAM that usually be aligned within 4 Kib and it handled by the operating system.
But why do we need to align it?
We need these alignment because our OS or should I say our hardware always read on these increment (4 – 8 bytes based on architecture), so if we miss align our memory our hardware require 2 read access to our memory which can be slow, and some hardware expect aligned data as well such as 128 bit operation.
All right time to write our abstraction, firstly we need our malloc
and free
function definition, we can just copy from stdlib
pretty straight forward.
// stuff
#include <stddef.h>
void* blog_malloc(size_t size);
void blog_free(void* ptr);
#ifdef BLOG_MEMORY_IMPLEMENTATION
// our implementation detail
#endif // BLOG_MEMORY_IMPLEMENTATION
// stuff
Good, now we need a way for us to know what the state of memory we requested since we are the one who manage it of course. We can define a struct that act like a header for our memory page or chunk, let’s put it inside our implementation.
// stuff
typedef struct blog_memory_chunk {
size_t capacity;
char flags;
struct blog_memory_chunk* next;
} BlogMemoryChunk;
typedef struct blog_memory_page {
size_t capacity;
struct blog_memory_page* next;
BlogMemoryChunk* child;
} BlogMemoryPage;
// stuff
As you might notice that our memory chunk has flags which will we use to check if the chunk is in use or not, these struct will act like header for our memory.
We should also put a constants for our in use flag.
#define BLOCK_USE 0b10000000
The concept we are going to use is Linked List because since we are going to have many of our BlogMemoryChunk
and BlogMemoryPage
we can’t store it as an array because we don’t know the exact size, but with Linked List we can use pointer to point the next data, let me visualize it.
As you can see our ptr
will point to next data until the ptr
is NULL
which we can confidently say that our list is finished, this approach is simpler but if you are cool guy who worried about performance the big O notation of this algorithm is O(n).
Right, next stop is we need function that abstract our call to OS for new memory and put it inside our implementation, let’s call it blog_request_memory_page
and blog_free_memory_page
for freeing the memory.
// stuff
#ifdef BLOG_MEMORY_IMPLEMENTATION
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
// TODO: Implementation
}
void blog_free_memory_page(BlogMemoryPage* ptr)
{
// TODO: Implementation
}
#endif // BLOG_MEMORY_IMPLEMENTATION
// stuff
We also accepting capacity
but we will need to align later, let’s create a macro that handle these memory alignment and put it above our memory header struct.
#define ALIGN(x, align) (((x) + ((align) - 1)) & ~((align) - 1))
These simple macro will help us to align given x
with size of alignment, so if we put x is 4 and our alignment is 64, these macro will return 64 because 4 is below 64, and these alignment only work properly in the power of 2.
So now, how do we make this function to be able to operate different operating system seamlessly? that’s right we can use macro for it, we can check the macro that identify OS in this awesome resource.
// stuff
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
// TODO : Implementation
#elif defined(__linux__)
// TODO : Implementation
#else
#error "Unsupported OS"
#endif
}
void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
// TODO : Implementation
#elif defined(__linux__)
// TODO : Implementation
#else
#error "Unsupported OS"
#endif
}
// stuff
If you are not sleeping right now, the defined
is just a way for us to check if the _WIN32
is defined by our compiler or by ourself, and error
is just way for us to tell the compiler if you reach this part then throw an error.
We also need to put this macro to have conditional import on our implementation.
#if defined(_WIN32) || defined(_WIN64)
// TODO
#elif defined(__linux__)
// TODO
#else
#error "Unsupported OS"
#endif
All right onto next part – Windows | Linux.
Windows Memory Allocation
Now we come to fun part, how do we request memory in Windows?
Looking at microsoft documentation we can use VirtualAlloc
to request memory from windows and VirtualFree
for freeing memory, we will use MEM_COMMIT
to commit our memory for usage and PAGE_READWRITE
to allow us to read and write onto that memory.
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
capacity = ALIGN(capacity + sizeof(BlogMemoryPage), 4096);
BlogMemoryPage* page = (BlogMemoryPage*)VirtualAlloc(NULL, capacity, MEM_COMMIT, PAGE_READWRITE);
if (page == NULL) return NULL;
page->capacity = capacity - sizeof(BlogMemoryPage);
page->child = NULL;
page->next = NULL;
return page;
#elif defined(__linux__)
// Implementation
#else
#error "Unsupported OS"
#endif
}
Please note that the page
we are allocating only store the capacity it can hold so it doesn’t hold the total of the memory it has, we also setting our child (BlogMemoryChunk
) to NULL
since we don’t have any data yet and next
as null because there is no BlogMemoryPage
after this yet.
After that we can add our VirtualFree
to our free function.
void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
VirtualFree(ptr, 0, MEM_RELEASE);
#elif defined(__linux__)
// Implementation
#else
#error "Unsupported OS"
#endif
}
Since we want to release the memory we need to set the second parameter as 0
to release all the memory of that page.
Don’t forget to include the windows header
#if defined(_WIN32) || defined(_WIN64)
#include <Windows.h>
#elif defined(__linux__)
// TODO
#else
#error "Unsupported OS"
#endif
Linux Memory Allocation
Now we come to fun part, how do we request memory in Linux?
In the man
page there is called mmap
, these function allow us to request memory from the OS, it has lot of feature it can create mapping to file, creating shared access but let’s talk mapping on the other time but we can use these to allocate memory from OS.
So we can use it like this.
// stuff
BlogMemoryPage* blog_request_memory_page(size_t capacity)
{
#if defined(_WIN32) || defined(_WIN64)
// Implementation
#elif defined(__linux__)
capacity = ALIGN(capacity + sizeof(BlogMemoryPage), 4096);
BlogMemoryPage* page = (BlogMemoryPage*)mmap(
NULL,
capacity,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
-1,
0
);
if (page == MAP_FAILED) return NULL;
page->capacity = capacity - sizeof(BlogMemoryPage);
page->child = NULL;
page->next = NULL;
return page;
#else
#error "Unsupported OS"
#endif
}
// stuff
Since we want our memory is only readable and write-able we can use PROT_READ
and PROT_WRITE
, we also want our mapping is private too, these function will return NULL
if it fail to allocate, don’t forget we need to align it 4 Kib too.
Please note that the page
we are allocating only store the capacity it can hold so it doesn’t hold the total of the memory it has, we also setting our child (BlogMemoryChunk
) to NULL
since we don’t have any data yet and next
as null because there is no BlogMemoryPage
after this yet.
We also need to free it too, it’s quite easy.
void blog_free_memory_page(BlogMemoryPage* ptr)
{
#if defined(_WIN32) || defined(_WIN64)
// Implementation
#elif defined(__linux__)
munmap(ptr, ptr->capacity + sizeof(BlogMemoryPage));
#else
#error "Unsupported OS"
#endif
}
We need to pass our pointer and capacity to allow munmap
to deallocate.
Don’t forget to include the mman
header to allow us to use mmap
.
#if defined(_WIN32) || defined(_WIN64)
// TODO
#elif defined(__linux__)
#include <sys/mman.h>
#else
#error "Unsupported OS"
#endif
Malloc and Free
All right we should have working blog_request_memory_page
and blog_free_memory_page
, now come to fun part.
How do we implement the malloc and free?
We can start by defining our number one page and set it as null, we can use this as foundation along with our function implementation of blog_malloc
and blog_free
, make sure to put this stuff inside the implementation section.
BlogMemoryPage *__ALLOCATOR = NULL;
void* blog_malloc(size_t size) { }
void blog_free(void* ptr) { }
Firstly we need to align the size first, since we are in 64-bit we can go 8 bytes alignment.
size = ALIGN(size, 8);
We can start by checking the __ALLOCATOR
is NULL
or not blog_malloc
and if null then we will allocate it.
void* blog_malloc(size_t size)
{
size = ALIGN(size, 8);
if (__ALLOCATOR == NULL) {
__ALLOCATOR = blog_request_memory_page(size);
if (__ALLOCATOR == NULL) return NULL;
}
}
This will ensure us that our __ALLOCATOR
always is valid and since our implementation of blog_request_memory_page
is automatically align it with 4 Kib we can safely pass size
as parameter.
Next part is checking that our page has slot for new memory chunk, but firstly we need to get the chunk first, we can do it via pointer arithmetic.
BlogMemoryChunk *current_chunk = (BlogMemoryChunk)(__ALLOCATOR + 1);
We can use plus 1 to skip the current BlogMemoryPage
and we just cast it into BlogMemoryChunk
, next we need to initialize it and then return the pointer to the caller but don’t forget to skip the BlogMemoryChunk
by using plus 1, we don’t want the end user modify our header.
void* blog_malloc(size_t size)
{
// stuff
BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(__ALLOCATOR + 1);
current_chunk->capacity = size;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
__ALLOCATOR->child = current_chunk;
return current_chunk + 1;
}
now this is the simplest and dumbest thing that this function can do, but let’s see if we get proper address.
Create main.c
and with this basic example code.
#include <stdio.h>
#define BLOG_MEMORY_IMPLEMENTATION
#include "../blog_memory.h"
int main()
{
char *ptr = (char*)blog_malloc(4);
if (ptr == NULL) {
printf("Memory allocation fail\n");
return -1;
}
printf("Allocated: %p\n", ptr);
ptr[0] = 'H';
ptr[1] = 'i';
ptr[2] = '!';
ptr[3] = '\0';
printf("%s\n", ptr);
return 0;
}
If everything is all right and it should print Hi!
along with memory address, under the hood blog_malloc
will actually give 8 bytes because of the alignment.
Okay now we need to free it, if we don’t free it we got memory leak, let’s implement our blog_free
function, it actually simple, we just set the flag as free one by removing the bit that indicate being use.
void blog_free(void* ptr)
{
BlogMemoryChunk* current = ((BlogMemoryChunk*)ptr) - 1;
current->flags = current->flags & (~BLOCK_USE);
}
We are using bit masking by negating the BLOCK_USE
and use and
to remove the bits that set BLOCK_USE
inside current->flags
.
By using these flag technique we can reuse old BlogMemoryChunk
without worrying another allocation from OS.
Now if we do another memory allocation it will corrupt our current allocated memory because we didn’t make search for empty space, so we need to rewrite our blog_malloc
.
Let’s start by searching empty spot inside our BlogMemoryPage
, we can use while loop .
// stuff
size = ALIGN(size, 8);
if (__ALLOCATOR == NULL) {
__ALLOCATOR = blog_request_memory_page(size);
if (__ALLOCATOR == NULL) return NULL;
}
BlogMemoryChunk* chunk = __ALLOCATOR->child;
BlogMemoryChunk* last = NULL;
while (chunk) {
last = chunk;
chunk = chunk->next;
}
if (last == NULL) {
BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(__ALLOCATOR + 1);
current_chunk->capacity = size;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
__ALLOCATOR->child = current_chunk;
return current_chunk + 1;
}
BlogMemoryChunk *current_chunk = (last + 1);
current_chunk->capacity = size;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
last->next = current_chunk;
return current_chunk + 1;
// stuff
With this loop we can loop over our Linked List BlogMemoryChunk
until we get NULL
and then in the bottom, if we got last == NULL
basically we are new to the page so we use the pointer of __ALLOCATOR
as a base, if not then we use the last
pointer.
With this setup we got new chunk, but we might overflow our page if we keep continue like this so we need to calculate the current allocated BlogMemoryChunk
when we loop, and then we will do check.
// stuff
size_t allocated = 0;
BlogMemoryChunk* chunk = __ALLOCATOR->child;
BlogMemoryChunk* last = NULL;
while (chunk) {
allocated += chunk->capacity + sizeof(BlogMemoryChunk);
last = chunk;
chunk = chunk->next;
}
if (allocated + sizeof(BlogMemoryChunk) + size >= __ALLOCATOR->capacity) {
return NULL; // We need to request new page
}
// stuff
Now we are missing one thing, if you are not sleeping you might be remember that when we free
we don’t deallocate on OS level but just setting the flag right? We can check flag on BlogMemoryChunk
as we loop.
// stuff
while (chunk) {
allocated += chunk->capacity + sizeof(BlogMemoryChunk);
if (!(chunk->flags & BLOCK_USE) && chunk->capacity >= size) {
chunk->flags |= BLOCK_USE;
return chunk + 1;
}
last = chunk;
chunk = chunk->next;
}
// stuff
Yatta, now we can reuse old BlogMemoryChunk
now.
But are we forgetting something? yeah if we are out of space on our page we need to allocate new page. We can approach the solution in many way but we are going to divide our problem into smaller one first.
We have
- Allocate allocator if null
- Allocate a chunk if it have empty slot
- Allocate new page and append it to parent next pointer
- Go back to number 2
We can start by creating utility function that handle allocating allocator if null.
static BlogMemoryPage* __blog_init_allocator(size_t capacity)
{
if (__ALLOCATOR != NULL) return __ALLOCATOR;
__ALLOCATOR = blog_request_memory_page(capacity);
return __ALLOCATOR;
}
By using static
keyword we can say that this function is only exist on that file so with current setup it can only be accessed inside main.c
.
Now next part is to allocate chunk if it have empty slot, we can create a function that received the BlogMemoryPage
and return a pointer of new BlogMemoryChunk
.
static BlogMemoryChunk* __blog_allocate_chunk_on_page(size_t capacity, BlogMemoryPage* page)
{
size_t allocated = 0;
BlogMemoryChunk* chunk = page->child;
BlogMemoryChunk* last = NULL;
while (chunk) {
allocated += chunk->capacity + sizeof(BlogMemoryChunk);
if (!(chunk->flags & BLOCK_USE) && chunk->capacity >= capacity) {
chunk->flags |= BLOCK_USE;
return chunk + 1;
}
last = chunk;
chunk = chunk->next;
}
if (allocated + sizeof(BlogMemoryChunk) + capacity >= page->capacity) {
return NULL;
}
if (last == NULL) {
BlogMemoryChunk *current_chunk = (BlogMemoryChunk*)(page + 1);
current_chunk->capacity = capacity;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
page->child = current_chunk;
return current_chunk;
}
BlogMemoryChunk *current_chunk = (last + 1);
current_chunk->capacity = capacity;
current_chunk->flags = BLOCK_USE;
current_chunk->next = NULL;
last->next = current_chunk;
return current_chunk;
}
you might notice that this is actually code from our blog_malloc
with exception of instead of using __ALLOCATOR
we use page
from parameter.
Next part is allocating new page if we need, since we know that our function that return new BlogMemoryChunk
return NULL
if it doesn’t have enough space we can do this on our blog_malloc
.
void* blog_malloc(size_t size)
{
size = ALIGN(size, 8);
if (__blog_init_allocator(size) == NULL) return NULL;
BlogMemoryChunk* ptr = __blog_allocate_chunk_on_page(size, __ALLOCATOR);
BlogMemoryPage* page = __ALLOCATOR;
while(ptr == NULL) {
if (page->next == NULL) {
page->next = blog_request_memory_page(size);
if (page->next == NULL) return NULL;
}
page = page->next;
ptr = __blog_allocate_chunk_on_page(size, page);
}
return ptr + 1;
}
We change our initialization for our allocator at the beginning of the function, and we loop through our page Linked List if we found the NULL
in the next ptr we just ask the OS for new page, don’t forget to add 1 before we return the pointer, we don’t want the user to modify header freely.
If we try our code in main.c
instead of 4
we use 4096
as the size…
We got infinite loop, this is flaw on current implementation, because the size doesn’t account our BlogMemoryChunk
header struct so it just keep allocating new page until we run out of memory.
The fix is kinda simple we just add our size of our BlogMemoryChunk
before we request new memory page.
// stuff
if (page->next == NULL) {
page->next = blog_request_memory_page(size + sizeof(BlogMemoryChunk));
if (page->next == NULL) return NULL;
}
// stuff
That’s it we finished this whole ordeal and if you are not sleeping congratulation for reaching this part!
Finishing Thought
We are successfully craft our own allocator, it was interesting recreational programming session, we learn a lot of stuff from Linked List, API from OS, memory alignment.
There still lot of improvement or optimization we can made or maybe edge cases that we didn’t dealt with and we still missing freeing our page but that is not necessarily required for short live application but if you want to create that I will leave it as homework.
The entire source code can be accessed on this url :
https://github.com/UnknownRori/blog_memory
That’s it from me folks if there are feedback let me know. So go on do whatever you usually do. Cheers!
This content originally appeared on DEV Community and was authored by Unknown Rori