Repository Evolution

Repository

A repository is a collection of source code and other files organized in a hierarchical directory structure used to build and maintain software artifacts.

This document describes the BENV repository structure as an evolution of a more traditional organization.

Organization

Repository organization is one of the biggest problems that must be solved if software reuse is to be effective. Too often, repository organization is often ad-hoc at best. Unfortunately, repository organization directly affects the successful reuse of source code by subsequent projects.

The effect of repository organization on the reusability of source code is amplified when source code must be used across a variety of target platforms. This is often the case for embedded systems, which is where this organization evolved.

Project Oriented Archetype

Traditionally, source code tends to be organized along project boundaries. Most software efforts start with a project concept and then evolve as described in the Big Ball Of Mud pattern. A combination of forces including schedule pressures, tradition, and laziness result in an organizational structure that precludes software reuse.

The diagram below illustrates a traditional project oriented structure.

Project Oriented 1

Now lets see what happens if we want to create a new project that reuses module2 as illustrated below.

Project Oriented 2

Notice that this organization cannot be accomplished within the confines of a traditional hierarchical file system, since module1 cannot be a sub directory of both project1 and project2.

Yes, it is possible on some platforms to create symbolic links to simulate this structure, but this is not always the case. Symbolic links have subtle (and some not so subtle) properties that differ from directories. The result is that one project would have a stronger relationship to module2 than the other.

Also, symbolic links are often not supported by configuration managment (revision control) systems.

Another problem that will be discussed later is that Makefiles are a project specific kind of meta-data that is not directly related to the source code contained within the same directory. While the Makefile is dependent upon the source files that it builds, the source files themselves have no dependency on any particular Makefile. A Makefile contains information about the usage of the source code, which is independent of the source code itself.

Reuse Orientation

A reuse oriented structure recognizes the fact that we plan to resuse the source code of one project in another project (without modification.)

A better solution is to eliminate the notion that module2 belongs to the project, by maintaining the module outside of any project directory tree.

Project Oriented 3

Notice that module2 is no longer contained by either project1 or project2.

Also notice the introduction of the root directory. Root is italicized for two reasons: 1) to indicate that this is not the root of the file system (e.g. Linux '/' directory); and 2) to indicate that the name of this directory is unspecified and not important to the structure. In other words the root directory can have any name. It is simply any directory that contains the repository.

Keep'm Separated

We further improve the repository structure described above by removing the project specific meta-data, which are the Makefiles, from the shared source code directory.

The illustration below accomplishes this by assuming that each project contains a script (which we'll call buildscript to reduce confusion) that describes how to build the project. This script encompasses the functionality of the Makefiles scattered about the repository structure.

Most importantly, the buildscript eliminates the potentially project specific meta-data Makefile from the shared module2 directory.

Build Script Separation

Clearly, if the buildscript files each contain "script fragments" to build the common module2 directory, then there is certainly the potential for redundancy that is clearly contrary to our goal of reuse, albeit in the build system rather than the target source code. If the "script fragments" used by project1 and project2 to build module2 are identical, then there is even more redundant build code.

Shared Build Scripts

By creating the directory benv to contain the common buildscripts, the build mechanisms that are common across projects can be shared projects.

Build Specification Separation

The buildspec in each project directory represents one or more files that specify the project specific (not reusable) portions of the build for each project.

As an example, the project1/buildspec may have instructions to compile and link together main.c, module1 and module2.

Likewise, the project2/buildspec may simply compile module1 and create an exported library of the result.

Shared Derived Objects

Consider, however, what happens if both project1 and project2 are both built, and each project leaves derived objects in the module2 directory.

Derived objects are typically the output of the compiler (files with a '.o' or '.obj' extension) or the output of a linker (files with a '.o', '.a' or '.so' extension), but may also include other files that are created as the result of the build process. Derived objects are never stored as part of the the configuration or revision control system, since they can be reproduced by building the project.

In the following illustration, the derived objects are:

Shared Derived Objects

Notice, however, that only the derived objects in the module2 directory are shared. Those include:

The existence of derived objects in the shared directories create a situation where only one project in the repository can be built at any time because of the artifacts remaining from the build of another project.

Some of the reasons that shared derived objects may cause problems include:

Sharing of derived objects can also cause problems if two processes are attempting to build two separate projects and both attempt to build the same shared directory.

Inadvertant sharing of derived objects can result in problems that are very difficult to diagnose if the shared derived objects are compatible to the linker, but intended to be built with different options than the project expects.

Derived Object Tree

To solve the problems associated with shared derived objects as described in the previous section, it is desirable for each project to store its own derived objects such that they are contained within the project directory. This is illustrated below.

Derived Object Tree

Notice that module2 contains no derived objects. The derived objects related to module2 are deposited in the project tree in the project1/module2 directory and the project2/module2 directory.

The project1/module2 and project2/module2 directories must be created as a part of the build process. This is an important function that must be provided by the benv/buildscripts (e.g. BENV) and can be implemented in part by using the vpath mechanism of make.

Abstract Trees

Up until this point, we have iteratively evolved our initial example repository structure to enable us to share the source code of module2.

The first level of the repository structure is shown in the following illustration.

Level 1 Directories

By analyzing the resulting tree, we can identify three abstract components:

Clearly, given the preceeding discussion, we wish to allow for more than one project within the structure.

It should also be clear that allowing for more than one shared source directory is also desirable. For example module3 in the following illustration.

Level 1 Directories 2

We have evolved our structure in such a way that the shared source code has no dependencies upon either the projects that use its contents or the buildscripts.

Any implementation of the buildscripts must understand the structure that we have set forth, and adhere to our requirements for handling derived objects, but aside from that, there are many possible implementations of a buildscripts system.

This leads us to the observation that we can support more than one set of tools that satisfy the buildscripts requirements.

Indeed, allowing for more than one source of buildscripts protects our source code investment from being the victim of a single source set of buildscripts.

To summarize, we have identified three fundamental types of repository directories:

Level 1

For organizational purposes, we treat these directories as containers for their respective types. The result is the following illustration.

Level 1 plus

To illustrate the notion that we may have more than one buildscript implementation, benvc is added to the following illustration.

Level 1 BENVC

Vendor Namespaces

Examining the relationship between benv and benvc, we can see that each of these directories can be considered as an independent namespace. Each name, while it must be unique within the tools directory is arbitrary, and can either reflect the name of a product or that of a vendor.

We have given our repository structure the ability to accommodate multiple projects, shared source modules, and build tools. The illustration below adds another layer to our repository structure to allow for the presence of third party vendors and products within the repository by creating a namespace layer.

Namespace

In this case mnm and jtt are vendor names, and oscl is a product name.

If we assume that project1, project2, module2 and module3 are all owned by mnm, then the result is a structure like the one in the following illustration.

Namespace2

Following is a final illustration of the layered BENV repository organization. Included is the substructure of the BENVC (BENV Classic) build environment support scripts contained in tools/benvc.

Repository Layers