måndag 13 februari 2017

C++ Memory for Beginners

This is a a short introduction to the C++ memory model. In C++ the programmer has big control over how the program handles it memory, but you usually do not have to think a lot about it. Still it is good to know the basics to understand how things work under the hood.

The memory a program uses is typically divided into different areas, called this is simple model of segments:


The code segment, where the compiled program is stored in memory.
The data segment where global and static variables are stored.
The heap, where dynamically allocated variables are allocated from.
The stack, where function parameters, local variables, and other function-related information are stored.

The heap

The heap segment (also known as the “free store”) keeps track of memory used for dynamic memory allocation. The heap is not automatically handled. You allocate memory and delete allocations yourself. In C++ you use the new operator to allocate memory in heap segment. You free the memory with the delete-operator.

Allocated memory stays allocated until it is deallocated (beware memory leaks) or the application ends (at which point the OS should clean it up).

The stack

The stack keeps track of all the active functions (those that have been called but have not yet terminated) from the start of the program to the current point of execution, and handles allocation of all function parameters and local variables.

The call stack is implemented as a stack data structure and is handled automatically.

When a function call is encountered, the data of the function, the memory position to return to, and the paramaters is pushed onto the call stack. The data stored on the stack is often referred to as a stack-frame.

When the current function ends, that function is popped off the call stack. The state before the call is restored (if the function was without side-effects, for example if a variable was passed by reference), the return value is put on the stack (or returned by CPU registers depending on computer architecture and compiler) and execution continues where the function was called.

Allocating memory on the stack is comparatively fast. Memory allocated on the stack stays in scope 
as long as it is on the stack. It is destroyed when it is popped off the stack

This is a trap…

A common mistake even by experienced programmers is to return a pointer to a value on the stack. The pointer is often created by using a plain c-array and trying to return it. The most common is an array of char, also known as a c-string.

// Don’t do this
char* get_name() {
    char name[50];
    std::cin >> name;
    return name;
}

Look at the program above. The function returns a pointer to a character array, but the array is local in the function and thus created on the stack. After the function call is ended the stack will be popped. The programmer might be “lucky” and the data is still in that memory segment, but it also might be over-written. It might work sometimes (like when debugging but not in release) and it might fail in another operating system or with another compiler. In that way this is a bug very hard to correct and it should be avoided. The easy way out is to return a C++ sting or to take the c-string by reference as a parameter. The C++ string works as it knows how to copy itself when returned.