måndag 10 april 2017

Vectors and sets in the standard library

In the last post we looked at the std::list in C++ standard library. It was implemented as a linked list. The drawback with this is that we don’t have direct access to elements and that there is a memory overhead for the “links”. The benefits are that insertions in the list are at constant time. The vector is often a better alternative as it has direct access to specific elements without iterators and usually handles growth good.

std::vector is a container that is similar to arrays in Pascal and to C-arrays. The C++ vector is not of fixed size and can grow and shrink runtime. Adding elements at the end are usually cheap, but not always constant time.

std::vector:s use contiguous storage locations for their elements, which means that their elements can also be accessed using offsets on regular pointers to its elements, and just as efficiently as in C-arrays. But unlike C-arrays, their size can change dynamically, with their storage being handled automatically by the container.

Internally, vectors use a dynamically allocated C-array to store their elements. This array may need to be reallocated in order to grow in size when new elements are inserted, which implies allocating a new array and moving all elements to it. This is a relatively expensive task in terms of processing time, and thus, vectors do not reallocate each time an element is added to the container. They allocate extra space at the end just in case and you can give them a initial capacity when you create them.

Adding or removing elements on other places than at the end are always expensive in time as the old data has to be moved in the C-array used for internal storage.

To access an element you use the operator [] or as with any container an iterator:

std::vector<string> strings;
std::cout << strings[0] << " " << strings[1] << std::endl;

If you are used to arrays it might be tempting to iterate over a vector using an integer as a counter-variable. This works, but it is preferable to use an iterator for the loop. This makes your code more generic and possible to use with any container. The iterator approach also always guaranties the most efficient way to iterate:

vector<int> ints;
for(int i = 0; i<1000; ++i) {
for(vector::iterator it = ints.begin(); it != ints.end(); ++it) {
   std::cout << *it << std::endl;


The set is a container that works as a list without duplicates. If you insert a new element with the same value as an existing element it will replace the old one.

A set is implemented as a balanced binary tree in C++. This means that the elements in the set are sorted. This also means that the elements have to be comparable to each other. Otherwise they cannot be sorted.

An element is comparable if it can be compared with the operators <, > and ==. std:string, int and the rest of the built in types supports this. If you create your own types (structs and classes) you have to create these operators yourself. Consider the following struct:

struct Person {
 std::string fname;
 std::string lname;

The set would not know how to sort the persons: first-name or last-name?

Most modern IDE-s can add the needed operators automagically, through a wizard flow. But we can define the operators by hand:

struct Person {
 std::string fname;
 std::string lname;
 bool operator>(const Person& other) const;
 bool operator==(const Person& other) const;

bool Person::operator<(const Person& other) const {
       return true;
       return fname<other.fname;
bool Person::operator>(const Person& other) const {
       return true;
       return fname>other.fname;
bool Person::operator==(const Person& other) const {
   return lname==other.lname && fname==other.fname;operator<(const Person& other) const;


måndag 27 mars 2017

C++ Containers

There are a lot of different containers in C++. A container is a holder object that stores a collection of other objects (its elements). They are implemented as class templates, which makes them able to store almost any type of objects.

The container manages the storage space for its elements and provides member functions to access them, either directly or through iterators. This makes them much more powerful and safe than the plain old C-arrays.

Containers implements structures very commonly used in data science: dynamic arrays (vector), queues (queue), stacks (stack), heaps (priority_queue), linked lists (list), trees (set), associative arrays (map) and more. Each of these have their own characteristics.

C++ have a lot of containers, but in this series of articles we will first look at the characteristics of each of the four ones you probably will need in your toolbox:
  • std::list 
  • std::vector 
  • std:set 
  • std::map 

Today we will start with the list and later issues will handle the rest. We also will look at iterators.

The list

std::list is implemented as a double-inked list. A list allow insert and erase operations anywhere within the sequence, and iteration in both directions. The cost of growing the list is always constant time and memory.

The drawback with linked lists are that they add extra data for each element so the total storage will be larger than for a C-array or std::vector. So if you have long lists, they might be better stored as vectors to conserve memory. Another drawback is that lists don’t allow direct access to their elements, you will have to iterate through the container to find a specific element.

In the sample below you can see some of the features of std::list. We create a list of numbers, push numbers at front and back.

Then we use an iterator to point to an element in the list. The iterator is set by the find-algorithm. We will get back to the algorithms in STL (standard template library) in a later issue of this news-letter.

int main()
    // Create a list containing integers    
    std::list<int> l = { 7, 5, 16, 8 };
    // Add the integer 25 to the front of the list    
    // Add the integer 13 to the back of the list    

    // Insert 42 before 16 by searching    
    auto it = std::find(l.begin(), l.end(), 16);
    if (it != l.end()) {
        l.insert(it, 42);
    // Iterate and print values of the list    
    for (auto element: l) {
        std::cout << element << std::endl;


A std::iterator is an object that, pointing to some element in a container). It also has the ability to iterate through the elements of that range using a standard set of operators for example increment (++) and dereference (*) operators. The syntax for the std::iterators in many ways works as standard c-pointers but with a lot more under the hood.

To iterate over a container with an iterator has two benefits: It is generic and works on all containers so if you change container type you don’t have to change your code. It is also guaranteed to be the most efficient form of iteration for the given container.

When should you use the C++ std::list?

The std::list is ideal for small datasets that change a lot, especially when you add things in other places than at the end. For larger datasets the memory-overhead could be too large to ignore. This is always compared to the size of the objects stored: Big objects might be expensive to move, which might be needed in other containers.

måndag 13 februari 2017

C++ Memory for Beginners

This is a a short introduction to the C++ memory model. In C++ the programmer has big control over how the program handles it memory, but you usually do not have to think a lot about it. Still it is good to know the basics to understand how things work under the hood.

The memory a program uses is typically divided into different areas, called this is simple model of segments:

The code segment, where the compiled program is stored in memory.
The data segment where global and static variables are stored.
The heap, where dynamically allocated variables are allocated from.
The stack, where function parameters, local variables, and other function-related information are stored.

The heap

The heap segment (also known as the “free store”) keeps track of memory used for dynamic memory allocation. The heap is not automatically handled. You allocate memory and delete allocations yourself. In C++ you use the new operator to allocate memory in heap segment. You free the memory with the delete-operator.

Allocated memory stays allocated until it is deallocated (beware memory leaks) or the application ends (at which point the OS should clean it up).

The stack

The stack keeps track of all the active functions (those that have been called but have not yet terminated) from the start of the program to the current point of execution, and handles allocation of all function parameters and local variables.

The call stack is implemented as a stack data structure and is handled automatically.

When a function call is encountered, the data of the function, the memory position to return to, and the paramaters is pushed onto the call stack. The data stored on the stack is often referred to as a stack-frame.

When the current function ends, that function is popped off the call stack. The state before the call is restored (if the function was without side-effects, for example if a variable was passed by reference), the return value is put on the stack (or returned by CPU registers depending on computer architecture and compiler) and execution continues where the function was called.

Allocating memory on the stack is comparatively fast. Memory allocated on the stack stays in scope 
as long as it is on the stack. It is destroyed when it is popped off the stack

This is a trap…

A common mistake even by experienced programmers is to return a pointer to a value on the stack. The pointer is often created by using a plain c-array and trying to return it. The most common is an array of char, also known as a c-string.

// Don’t do this
char* get_name() {
    char name[50];
    std::cin >> name;
    return name;

Look at the program above. The function returns a pointer to a character array, but the array is local in the function and thus created on the stack. After the function call is ended the stack will be popped. The programmer might be “lucky” and the data is still in that memory segment, but it also might be over-written. It might work sometimes (like when debugging but not in release) and it might fail in another operating system or with another compiler. In that way this is a bug very hard to correct and it should be avoided. The easy way out is to return a C++ sting or to take the c-string by reference as a parameter. The C++ string works as it knows how to copy itself when returned.

måndag 30 januari 2017

Maintaining Legacy Software


"Weed is a plant that has ended up in the wrong place. Either it got there through carelessness or it has been allowed to germinate and grow from being insignificant. "(New Farmers Handbook)
Software is full of weeds. With weeds in this case I mean badly architectured software and not pure bugs. But, believe me: bugs arize from this later even if the are not there yet. This is not only from sloppy programmers and architects. In my last blog post I wrote about architecture in legacy software and with weeds I mean parts of the software not adhering to the architecture. Vines going between  layers in the chosen software model.

When creating an architecture we often have high ambitions. Whichever framework we use: Domain Driven Design, Model View Controller or similar. Folders and modules created for the then current ways of working and separation of concern.

Since starting work, the product owner has ordered thins and you have had to solve problems. You are  forced to be pragmatic to create the feature that should be in the sprint. An advanced architecture sometimes feels as if it opposes the simple solutions required to handle all the problems you need to solve. You take shortcuts and allows a view to do something that a controller should have done, only because it is convenient or because you do not want to infringe someone else's code.

The project also evolves as it continues. Someone new takes over parts of the code and start writing things. This new person has a different view of architecture and it starts to become inconsistent. Different parts of the code follows different cultures and the cultures evolve. The way of writing code over time is not the same. Old code can have fine "structured programming" as the paradigm was called then. Then there might be a time of object orientation. Object orientation itself has evolved from complex objects handling everything from infrastructure to apis in the same object to modern ways of looking in separation of concern. The revolution of patterns-thinking can leave traces. It is like archeology with cultural layers to dig through.


It is important to remove weeds. Keep in mind that a lot of weeds can be useful to others. It's often something that has ended up in the wrong place, rather than being wrong in itself. A good refactoring tool allows it to be moved to the right place with some clicks if you know where to put it. Step one is to separate the misplaced code to a new method if it is not separated from the beginning. Often the IDE will have refactoring tools to help with extracting a new method. Then move the method to the class it belongs to. Is it not useful at the moment so “compost” it: Remove it and allow revision management system take care of it.


A problem that is close weeding. The first case deals with branches from the other parts of the code entering other parts like a tear from a plant. This may involve unwanted dependencies where things reach too far into the code to work in. If you need to expose any outward so it requires a clear interface. Creating this and do not expose the interior of your code out. Create a contract, a clear API which different parts of the code communicate with each other through this interface.

Bugs and vermin

This is perhaps the most common form of problems in software. Something that we all know. Bugs can be fought in many ways. Spraying is not uncommon in agriculture and the equivalent in terms of software are different kinds of automatic  linter. Just as with the spraying in agriculture, you should beware of using these tools. They complain, too much so developers tend to ignore them more and more. Maybe you miss large errors because they are hidden by numerous small. Therefore, set your static linter at a reasonable level for your project.

måndag 9 januari 2017

Architechture Legacy Software

The software developer has typically used the engineer as a model. This is obviously sprung out that we are dealing with technology. We build stuff! Our line of work grew out of electrical engineering and mathematics, hand in hand, it became a new profession.

Development also has a soft side that you might not see from the outside and all developers not themselves discover. They become stuck in problem solving and happy that things are working. Programmers borrows many words from linguists. We have the language to describe what we do, languages have a grammar and helps us to communicate. Communication is not only between us and the computer, but also with other developers. Really well written programming code can be like poetry for those who understand it. Code is in many ways a cultural manifestation, almost art.

Most books and articles we read about software development often describe an ideal stage. You get recipes on how to make the project design and how to write their code to create good software. We who have worked with software development and are reading them usually agree with that there are good principles, but in reality, it usually not looks like that. We are moving in a landscape where the old code lives as archeological layers under our feet and we do not know what we will find when we dig among them.

The layers also have been moved around by changes by several developers through the history of the code-base. The way to write code has changed through time. What was considered the best practices a few years ago is no longer so today. Different and changing ways of looking at what is right through time and for different developers collide and makes the code more messy. Each developer has his or her own perspective, often good, in their context.

Software systems are often described as machines. The technical background shines through. To me they are usually more organic: small systems are like gardens that are sometimes tied together in large systems and thus the entire cultural landscape. This is an attempt to describe software development from the perspective of software is something that grows under our supervision rather than constructed. We are gardeners or farmers who grow software.

Now you may be thinking: Software not grow or is not written by itself. It is of course true, but the body of software being written by a team and over a long time, tend to have almost a life of its own. There are so many factors that influence the elaboration so that everything will not be rationally designed, or constructed. If you do not take this into account then it may become problem, but when used correctly, this can become a strength and improve the system.

Taking care of the software is in many ways similar to tillage. It is a process of cultivation. The idea in this text is to create a good culture to grow and develop software as well as yourself. I would like to describe an iterative process, much like the seasons to adapt the development process to reality, instead of dreaming of the ideal project where each sprint from planning to retrospective merely a moment of happiness which we commend each other and smile drinking coffee with smiles as in a religious advertising brochure.The Swedish Farmer's Almanac (Bondepraktikan) was originally a German book of advice to farmers on how to adapt their work to the seasons and therefore the weather. It eventually became incorporated into our Swedish cultural tradition. The book contains a lot of advice on how to manage farming by reading the signs of nature.

"Pious readers buy me now.
Much wisdom I learn you.
Bondepraktikan is my name.
Read me, you will benefit.
The flow of the year I want you to learn, then you will rule. "

(Bondepraktikan - “Swedish Farmers Almanac”)

This is an attempt to give some thoughts on an almanac for software development.

Planning of the lands

The main objective of planning is to create separation of concerns to keep the system stable.
Illustration 1: A Map of the Lands (New Farmers Handbook)

The term Separation of Concerns means in computer programming to use different mechanisms to separate things that do not have with each other to do. The most known is might be the MVC pattern with separates Models, Views and Controllers.

Often the planning is already done long before you enter the lands. Parts have been added and other parts have been removed. Parts might have been split in not so logical ways and parts may have been growing into each other. Some projects might be quite messy other stay orderly for all lifetime. There might be circular dependencies that makes changes hard to do.

Regardless of the project status you should have a map of reality. Often the architectural maps you get are old on not updated. The best is if you have a system where the code itself is the map or generates the map. By good package and sub-package names you always have a current map.

When planning a new project, refactoring- or adding your own architecture you should have good documented way how you separate concerns.

First note that the separation is in two different directions: The different data (like persons and books) the application handles and the different logical layers of an application (like in the MVC). The data is often separated in different bounded contexts and the application is often separated in layers handling things like storage, business logic and front-end.

There is no “right” way how to do things. Below I present a model that I like and can inspire the model you think is right for your project.

Note that you always should begin in the reality. Use the existing landscape and think of change as a long process. In Swedish history we have had land reforms (enskifte, laga skifte etc) decided by a king and locally applied all over the country. In software this could be done, a land reform all over the system, but it is hard and violates the organic growth. Also it might hinder development during the reform.

That being said here is my model:
Table 1: The layer structure

Application layer

The application layer is responsible for driving the work flow of the application. This layer can be used for transactions, high-level logging, and security.
The application layer is often thin compared to the domain logic – it is just coordinating domain objects without actually performing the work. The application layer also provides the interface access to the domain layer. The application starts up the services and the interfaces needed, but then delegates the work to them.
Most of this layer is often given to you out of the box from the framework you use when you get a standard skeleton for an application using that framework.

Interface Layer

This layer holds everything interacts with other systems, such as web services, RPC, user interface or web applications. It handles the interpretation, validation and translation of incoming data. It also handles serialization of output data, such as JSON over REST.

This layer is often called the Ports and Adapters layer. We can have a number of possible ways of connectivity (ports) and adapters that adapts the domain objects to the protocols and acts as an anti-corruption layer.

Domain objects are never directly visible outside the domain. They have other representations in the protocol. This is essential to keep the system stable. You should be able to change the domain-model while keeping the old interfaces stable. Instead you define DTO-s (Data Transfer Objects) that is a representation of the data sent, regardless of the internal model. A DTO-converter is used to translate between Domain Objects and the world outside. This is even true for communication inside your application. Different bounded context should communicate in a defined way.

If you want to combine DDD with a MVC or MVP pattern in a desktop application this can be where to place your views and controllers, as one of the ports. Another way is to think about the MVC-part of your application as a separate client connecting to the interface layer through the Interface layer.

The domain layer

The domain layer is the heart of the program, and that is where all the interesting stuff happens. In Java, do you normally do one package per bounded context, and each unit includes entities, value objects, domain events, a repository interface, and sometimes factories.

The core of the business logic is here. The structure and the naming of assemblies, classes and methods in the domain layer follows the ubiquitous language, and you should be able to explain to a domain expert how this part of the program works by drawing a few simple graphs and use real class and methodG names from the source code.

The interface to the upper layers and other bounded contexts are provided by one or more services, adapted by the interface- or port and adapters- layer when someone wants to connect the service from outside the domain. The interface is defined by the service.

Repositories and the factories have to use the infrastructure to get access to the database. They thus define interfaces for Data Objects (DO) and Data Access Objects (DAO) used to store entities in the database. The implementation is later done in the infrastructure, but the domain is in charge of the data and logic. The domain drives the development.

The relationship between the repositories, factories and the DAO might be hard to understand. The responsibility of the DAO is simple storage. All domain logic should ideally be in the repository. If you for example need to validate data before storing the repository should handle this and then use the DAO to store the data of the entity. Most calls to repositories have the structure of first doing some domain logic and then store the result with the DAO as delegate.

One DAO should only handle one type of DO-s. Likewise a repository ideally also should only handle one type of entities. To handle more complex operations, including many types of entities, you should create a domain-service that uses the factories and repositories. A domain service should cover one bounded context.

We will get back to the relationship between a domain entity and a data-object. For now we can say that the entity is the abstraction of the data we handle and the data-object is the actual data stored by the infrastructure.

This is what makes some people think the model is a bit cumbersome: We have to define a lot of interfaces in the domain that later is implemented in the infrastructure. The different abstraction levels also add to the complexity. The system also have a lot of classes in the domain handling the same entities: The services, repositories and factories. The system becomes a bit heavy to handle with these extra layers of abstraction. The benefit is the separation of concern and the possibility to change infrastructure without having to change the domain logic.

Infrastructure Layer

In addition to the three vertical layers, there are also infrastructure. The application runs in some environment where it has to use infrastructure-services.

In the farming analogy this is the machine park. The solution should not depend on any special machines. You should be able to “grow your crops” with different sets of machines and replace them as time passes. As the picture shows, supports all of the three layers. This is done in different ways, which facilitates communication between the layers. Simply put, the infrastructure consists of everything that exists independently of our application: external libraries, database, application server, messaging and so on.

All communication between the layers to be abstracted so that it is independent of changes in other layers. The clearest example of this is that the domain layer uses Repository to manage storage. This repository then define interfaces for the database connection against infrastructure, but the implementations are in the infrastructure layer.

Although it can be difficult to give a hard definition of what type of code belong to the infrastructure layer for each given situation, it should be possible to completely replace infrastructure and still be able to use the domain layer and possibly the application layer to solve the key business problems.

Bounded contexts

Now we are entering the main part of planning your land: To the domain.


The environment in which a word or proposition that determines it's meaning.

Bounded contexts are a breakdown of the domain. All major projects covering different sub-models and when the code based on the different models are combined, it can become complex. Communication to and from, and within the team gets confusing. It is often not clear in which context one model should not be applied. The model must be divided in limited contexts with clear interfaces to other parts of the system. Each bounded context shall be independent of the others.

When a number of people working in the same bounded context, there is a risk model becomes fragmented. The more people who are inside, the greater the problems. This breaks down the system into smaller contexts and eventually lose the integration and coherence. There must be a process of merging all code and other artifacts with automated testing. Use the ubiquitous language in order to hammer out a common understanding of the model and concepts. Continuous integration is an essential part of the DDD.

Implementations that have grown during time with no good analysis of the bounded contexts often has a design pattern that could be called BBoM – Big Ball of Mud. The borders between the parts of the server are unclear. Even the responsibilities between the client and the server are entangled. There is no good way to see the protocols.

Context Map

A bounded context leaves some problems in the absence of global perspective. The relationships with other parts of the model may still be vague and constant change.

Developers of other teams may lack knowledge of your context's limits and will unconsciously make changes that blurs the edges or complicating relations. When the connections to be made between different contexts, they tend to bleed into each other.

Identify why each sub model in the project and define its distinct contexts. Name each bounded context and include the name in to use a ubiquitous language. Describe points of contact between models and their communications. Preferably, the communication is through service-classes and not directly into other defined context.