This article introduces you to Git, including how to install the necessary software to access Git servers where your software project will be stored.
Version control concepts
To understand Git and the concept of version control, looking at version control from an historical perspective is helpful. There have been three generations of version control software.
The first generation
The first generation was very simple. Developers worked on the same physical system and “checked out” one file at a time.
This generation of version control software made use of a technique called file locking. When a developer checked out a file, it was locked so no other developer could edit the file. Figure 1 illustrates the concept of this type of version control.
Examples of first-generation version control software include Revision Control System (RCS) and Source Code Control System (SCCS).
The second generation
The problems with the first generation included the following:
Only one developer can work on a file at a time. This results in a bottleneck in the development process.
Developers have to log in directly to the system that contains the version control software.
These problems were solved in the second generation of version control software. In the second generation, files are stored on a centralized server in a repository. Developers can check out separate copies of a file. When the developer completes work on a file, the file is checked in to the repository. Figure 2 illustrates the concept of this type of version control.
If two developers check out the same version of a file, then the potential for issues exists. This is handled by a process called a merge.
What is a merge? Suppose two developers, Bob and Sue, check out version 5 of a file named
abc.txt. After Bob completes his work, he checks the file back in. Typically, this results in a new version of the file, version 6.
Sometime later, Sue checks in her file. This new file must incorporate her changes and Bob’s changes. This is accomplished through the process of a merge.
Depending on the version control software that you use, there could be different ways to handle this merge. In some cases, such as when Bob and Sue have worked on completely different parts of the file, the merge process is very simple. However, in cases in which Sue and Bob worked on the same lines of code in the file, the merge process can be more complex. In those cases, Sue will have to make decisions, such as whether Bob’s code or her code will be in the new version of the file.
After the merge process completes, the process of committing the file to the repository takes place. To commit a file essentially means to create a new version in the repository; in this case, version 7 of the file.
Examples of second-generation version control software include Concurrent Versions System (CVS) and Subversion.
The third generation
The third generation is referred to as distributed version control systems (DVCSs). As with the second generation, a central repository server contains all the files for the project. However, developers don’t check out individual files from the repository. Instead, the entire project is checked out, allowing the developer to work on the complete set of files rather than just individual files. Figure 3 illustrates the concept of this type of version control.
Another (very big) difference between the second and third generation of version control software has to do with how the merge and commit process works. As previously mentioned, the steps in the second generation are to perform a merge and then commit the new version to the repository.
With third-generation version control software, files are checked in and then they are merged. To understand the difference between these two techniques, see Figure 4.
In phase 1 of Figure 4, two developers check out a file that is based on the third version. In phase 2, one developer checks that file in, resulting in a version 4 of the file.
In phase 3, the second developer must first merge the changes from his checked-out copy with the changes of version 4 (and, potentially, other versions). After the merge is complete, the new version can be committed to the repository as version 5.
If you focus on what is in the repository (the center part of each phase), you see that there is a very straight line of development (ver1, ver2, ver3, ver4, ver5, and so on). This simple approach to software development poses some potential problems:
- Requiring a developer to merge before committing often results in developers’ not wanting to commit their changes on a regular basis. The merge process can be a pain and developers might decide to just wait until later and do one merge rather than a bunch of regular merges. This has a negative impact on software development as suddenly huge chunks of code are added to a file. Additionally, you want to encourage developers to commit changes to the repository, just like you want to encourage someone who is writing a document to save on a regular basis.
- Very important: Version 5 in this example is not necessarily the work that the developer originally completed. During the merging process, the developer might discard some of his work to complete the merge process. This isn’t ideal because it results in the loss of potentially good code.
A better, although arguably more complex, technique can be used. It is called directed acyclic graph (DAG), and you can see an example of how it works in Figure 5.
Phases 1 and 2 are the same as shown in Figure 4. However, note that in phase 3 the second checkin process results in a version 5 file that is not based on version 4, but rather independent of version 4. In phase 4 of the process, versions 4 and 5 of the file have been merged to create a version 6.
Although this process is more complex (and, potentially, much more complex if you have a large number of developers), it does provide some advantages over a single line of development:
- Developers can commit their changes on a regular basis and not have to worry about merging until a later time.
- The merging process could be delegated to a specific developer who has a better idea of the entire project or code than the other developers have.
- At any time, the project manager can go back and see exactly what work each individual developer created.
Certainly an argument exists for both methods. However, keep in mind that this article focuses on Git, which uses the directed acyclic graph method of third-generation version control systems.
You might already have Git on your system because it is sometimes installed by default (or another administrator might have installed it). If you have access to the system as a regular user, you can execute the following command to determine whether you have Git installed:
ocs@ubuntu:~$ which git /usr/bin/git
If Git is installed, then the path to the
git command is provided, as shown in the preceding command. If it isn’t installed, then you either get no output or an error like the following:
[ocs@centos ~]# which git /usr/bin/which: no git in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/ bin:/usr/sbin:/bin:/sbin:/root/bin)
As an administrator on a Debian-based system, you can use the
dpkg command to determine whether the Git package has been installed:
root@ubuntu:~# dpkg -l git Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/ ➥Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-========-=============-=============-======================================== ii git 1:1.9.1-1ubun amd64 fast, scalable, distributed ➥revision con
As an administrator on a Red Hat–based system, you could use the
rpm command to determine whether the git package has been installed:
[root@centos ~]# rpm -q git git-22.214.171.124-6.el7_2.1.x86_64
If Git isn’t installed on your system, you must either log in as the root user or use
su to install the software. If you are logged in as the root user on a Debian-based system, you can use the following command to install Git:
apt-get install git
If you are logged in as the root user on a Red Hat–based system, you can use the following command to install Git:
yum install git
Git concepts and features
One of the challenges to using Git is just understanding the concepts behind it. If you don’t understand the concepts, then all the commands just seem like some sort of black magic. This section focuses on the critical Git concepts as well as introduces you to some of the basic commands.
It is very important to remember that you check out an entire project and that most of the work you do will be local to the system that you are working on. The files that you check out will be placed in a directory under your home directory.
To get a copy of a project from a Git repository, you use a process called cloning. Cloning doesn’t just create a copy of all the files from the repository; it actually performs three primary functions:
- Creates a local repository of the project under the project_name/.git directory in your home directory. The files of the project in this location are considered to be checked out from the central repository.
- Creates a directory where you can directly see the files. This is called the working area. Changes made in the working area are not immediately version controlled.
- Creates a staging area. The staging area is designed to store changes to files before you commit them to the local repository.
This means that if you were to clone a project called Jacumba, the entire project would be stored in the
Jacumba/.git directory under your home directory. You should not try to modify these directly. Instead, look directly in the
~/Jacumba directory tol see the files from the project. These are the files that you should change.
Suppose you made a change to a file, but you have to work on some other files before you were ready to commit changes to the local repository. In that case, you would stage the file that you have finished working on. This would prepare it to be committed to the local repository.
After you make all changes and stage all files, then you commit them to the local repository. See Figure 6 for a visual demonstration of this process.
Realize that committing the staged files only sends them to the local repository. This means that only you have access to the changes that have been made. The process of checking in the new versions to the central repository is called a push.
Choosing your Git repository host
First, the good news: Many organizations provide Git hosting—at the time of this writing, there are more than two dozen choices. This means you have many options to choose from. That’s the good news … and the bad news.
It is only bad news because it means you really need to spend some time researching the pros and cons of the hosting organizations. For example, most don’t charge for basic hosting but do charge for large-scale projects. Some only provide public repositories (anyone can see your repository) whereas others let you create private repositories. There are many other features to consider.
One feature that might be high on your list is a web interface. Although you can do just about all repository operations locally on your system, being able to perform some operations via a web interface can be very useful. Explore the interface that is provided before making your choice.
At the very least, I recommend considering the following:
Note that I chose Gitlab.com for the examples in the book. Any of the hosts in the preceding list would have been just fine for the book; I chose Gitlab.com simply because it happened to be the one I used on my last Git project.
Now that you have gotten through all the theory, it is time to actually do something with Git. This next section assumes the following:
- You have installed the
git-allsoftware package on your system.
- You have created an account on a Git hosting service.