måndag 27 mars 2017

C++ Containers

There are a lot of different containers in C++. A container is a holder object that stores a collection of other objects (its elements). They are implemented as class templates, which makes them able to store almost any type of objects.

The container manages the storage space for its elements and provides member functions to access them, either directly or through iterators. This makes them much more powerful and safe than the plain old C-arrays.

Containers implements structures very commonly used in data science: dynamic arrays (vector), queues (queue), stacks (stack), heaps (priority_queue), linked lists (list), trees (set), associative arrays (map) and more. Each of these have their own characteristics.

C++ have a lot of containers, but in this series of articles we will first look at the characteristics of each of the four ones you probably will need in your toolbox:
  • std::list 
  • std::vector 
  • std:set 
  • std::map 

Today we will start with the list and later issues will handle the rest. We also will look at iterators.

The list


std::list is implemented as a double-inked list. A list allow insert and erase operations anywhere within the sequence, and iteration in both directions. The cost of growing the list is always constant time and memory.

The drawback with linked lists are that they add extra data for each element so the total storage will be larger than for a C-array or std::vector. So if you have long lists, they might be better stored as vectors to conserve memory. Another drawback is that lists don’t allow direct access to their elements, you will have to iterate through the container to find a specific element.

In the sample below you can see some of the features of std::list. We create a list of numbers, push numbers at front and back.

Then we use an iterator to point to an element in the list. The iterator is set by the find-algorithm. We will get back to the algorithms in STL (standard template library) in a later issue of this news-letter.

int main()
{
    // Create a list containing integers    
    std::list<int> l = { 7, 5, 16, 8 };
    // Add the integer 25 to the front of the list    
    l.push_front(25);
    // Add the integer 13 to the back of the list    
    l.push_back(13);

    // Insert 42 before 16 by searching    
    auto it = std::find(l.begin(), l.end(), 16);
    if (it != l.end()) {
        l.insert(it, 42);
    }
    // Iterate and print values of the list    
    for (auto element: l) {
        std::cout << element << std::endl;
    }
}

Iterators

A std::iterator is an object that, pointing to some element in a container). It also has the ability to iterate through the elements of that range using a standard set of operators for example increment (++) and dereference (*) operators. The syntax for the std::iterators in many ways works as standard c-pointers but with a lot more under the hood.

To iterate over a container with an iterator has two benefits: It is generic and works on all containers so if you change container type you don’t have to change your code. It is also guaranteed to be the most efficient form of iteration for the given container.

When should you use the C++ std::list?


The std::list is ideal for small datasets that change a lot, especially when you add things in other places than at the end. For larger datasets the memory-overhead could be too large to ignore. This is always compared to the size of the objects stored: Big objects might be expensive to move, which might be needed in other containers.