måndag 10 april 2017

Vectors and sets in the standard library

In the last post we looked at the std::list in C++ standard library. It was implemented as a linked list. The drawback with this is that we don’t have direct access to elements and that there is a memory overhead for the “links”. The benefits are that insertions in the list are at constant time. The vector is often a better alternative as it has direct access to specific elements without iterators and usually handles growth good.

std::vector is a container that is similar to arrays in Pascal and to C-arrays. The C++ vector is not of fixed size and can grow and shrink runtime. Adding elements at the end are usually cheap, but not always constant time.

std::vector:s use contiguous storage locations for their elements, which means that their elements can also be accessed using offsets on regular pointers to its elements, and just as efficiently as in C-arrays. But unlike C-arrays, their size can change dynamically, with their storage being handled automatically by the container.

Internally, vectors use a dynamically allocated C-array to store their elements. This array may need to be reallocated in order to grow in size when new elements are inserted, which implies allocating a new array and moving all elements to it. This is a relatively expensive task in terms of processing time, and thus, vectors do not reallocate each time an element is added to the container. They allocate extra space at the end just in case and you can give them a initial capacity when you create them.

Adding or removing elements on other places than at the end are always expensive in time as the old data has to be moved in the C-array used for internal storage.

To access an element you use the operator [] or as with any container an iterator:

std::vector<string> strings;
strings.push_back("Hello");
strings.push_back("World");
std::cout << strings[0] << " " << strings[1] << std::endl;

If you are used to arrays it might be tempting to iterate over a vector using an integer as a counter-variable. This works, but it is preferable to use an iterator for the loop. This makes your code more generic and possible to use with any container. The iterator approach also always guaranties the most efficient way to iterate:

vector<int> ints;
for(int i = 0; i<1000; ++i) {
   ints.push_back(i);
}
for(vector::iterator it = ints.begin(); it != ints.end(); ++it) {
   std::cout << *it << std::endl;
}

Set

The set is a container that works as a list without duplicates. If you insert a new element with the same value as an existing element it will replace the old one.

A set is implemented as a balanced binary tree in C++. This means that the elements in the set are sorted. This also means that the elements have to be comparable to each other. Otherwise they cannot be sorted.

An element is comparable if it can be compared with the operators <, > and ==. std:string, int and the rest of the built in types supports this. If you create your own types (structs and classes) you have to create these operators yourself. Consider the following struct:

struct Person {
 std::string fname;
 std::string lname;
};


The set would not know how to sort the persons: first-name or last-name?

Most modern IDE-s can add the needed operators automagically, through a wizard flow. But we can define the operators by hand:

struct Person {
 std::string fname;
 std::string lname;
 bool operator>(const Person& other) const;
 bool operator==(const Person& other) const;
};

bool Person::operator<(const Person& other) const {
   if(lname<other.lname)
       return true;
   else
       return fname<other.fname;
}
bool Person::operator>(const Person& other) const {
   if(lname>other.lname)
       return true;
   else
       return fname>other.fname;
}
bool Person::operator==(const Person& other) const {
   return lname==other.lname && fname==other.fname;operator<(const Person& other) const;
 bool

}