Git has turned out to be one of the leading Version Control System in today's time. Till now, we have developed an understanding of Version Control System. We have also discussed different types of Version Control System. One of the most popular Version Control System is Git. We will be learning about it in this series. Let us start with learning
- What is Git?
- Why use Git?
What is Git?
Git is an Open Source Distributed Version Control System. It is designed for
- Speed
- Simplicity
- Fully Distributed
- Excellent support for parallel development, support for hundreds of parallel branches.
- Integrity
As we remember from our previous tutorial, in a Distributed Version Control system a local copy is saved on each node (computer of a person who is part of the project). There is also a centralized server where all the team members push their changes. This way Git is resilient to crashes as each node has its own copy of the source tree.
History of Git
Git was developed by Linus Torvalds, creator of Linux Operating System. Git was developed when the relationship between the existing Version Control System (BitKeeper) and Linux team broke.
How Git stores revisions
Git stores the changes in files differently as compared to other Version Control Systems like SVN and CVS. This is one of the important concept of Git that you should internalize as early as possible. Version control systems store the difference between the two versions. For example, consider File A that got changed three times. The First version of the file will be stored as is, in the sense complete file will be stored. As new versions are introduced only the difference from the previous version will be saved. This will become clearer when we look at the image below, this image shows how over multiple revisions changes are stored for three files.
Here we can see that other CVS store the delta (changes) to a given file over time. Git, on the hand, stores the Snapshot of the changed file. For example, if you made a change to File A, a complete snapshot of the changed file is stored. If a file has not changed between two versions, Git will keep a reference of the original file instead of copying it again in the new version. Below image summarizes how Git internally stores the changes.
To summarize this section, the three important points that we learned about Git are
- Git stores a Snapshot of a file as opposed to storing a Difference, which other Version Control Systems do.
- Git only takes a Snapshot of the changed files.
- To optimize memory, Git keeps a Reference of the file that has not changed instead of making a copy of it in the new version.
Git is a Distributed Version Control System
Git has a remote repository which is stored on a server and a local repository which is stored in the computer of each developer. This means that the code is not just stored on a central server, but the full copy of the code is present in all the developers’ computers as well. Because every node has a local copy, almost all the operations on Git are local (Exceptions being Pull and Push command). Which means that you don't have to be connected to the remote repository all the time to do your work.
On the other hand, Version Control Systems like CVS and SVN require you to be connected to the server for every operation. This gives Git a significant speed advantage. As most of the operations are done locally they have very fast. For e.g., if you want to commit a change it can be done right there on your system. Later once you have a few commits to push, you can push your changes to the central server for all the team members to use. Similarly, if you want to see the history of changes made to a project you don't need a network connection. All the changes can be viewed from your local copy only.
Because of keeping the repository locally, Git is able to provide a significant speed advantage over other Version Control Systems.
Note: For more information on Distributed VCS and why we need it, please read What is Distributed Version Control System?
Git Integrity
Git is designed to make sure it is secure and it maintains the integrity of content being version controlled. It uses Check-sum to confirm that information is not lost during transit or is tampered with on the file system. Internally Git creates a checksum value from the content of the file and then verifies it while transmitting or storing data. If the checksum is different there is a corruption in the file. The hashing algorithm used is Secure Hash Algorithm 1 (SHA -1) to calculate the hash.
The checksum is so important to Git that you will find Hash all over the place. A hash typically will look something like this
Why use Git?
Since the development and release of Git, it has gained huge popularity among the developers and being open source have incorporated many features. Today, a staggering number of projects use Git for version control, both commercial and personal. Let's see why Git has become so popular by discussing its main features
- Performance: Git provides the best performance when it comes to version control systems. Committing, branching, merging all are optimized for a better performance than other systems.
- Security: Git handles your security with cryptographic method SHA-1. The algorithm manages your versions, files, and directory securely so that your work is not corrupted.
- Branching Model: Git has a different branching model than the other VCS. Git branching model lets you have multiple local branches which are independent of each other. Having this also enables you to have friction-less context switching (switch back and forth to new commit, code and back), role-based code (a branch that always goes to production, another to testing etc) and disposable experimentation (try something out, if does not work, delete it without any loss of code).
- Staging Area: Git has an intermediate stage called "index" or "staging area" where commits can be formatted and modified before completing the commit.
- Distributed: Git is distributed in nature. Distributed means that the repository or the complete code base is mirrored onto the developer's system so that he can work on it only.
- Open Source: This is a very important feature of any software present today. Being open source invites the developers from all over the world to contribute to the software and make it more and more powerful through features and additional plugins. This has led the Linux kernel to be a software of about 15 million lines of code.
As for now, we have understood that Git is more powerful and widely used version control systems. After understanding what and why it is better to understand how to use Git.