Git ( ) is a version control system for tracking changes in computer files and coordinating work on those files among many people. It is mainly used for source code management in software development, but can be used to track changes within each set of files. As a distributed revision control system intended for speed, data integrity, and support for distributed nonlinear workflows.
Git was created by Linus Torvalds in 2005 for the development of the Linux kernel, with other kernel developers contributing to its initial development. The current manager since 2005 is Junio âââ ⬠<â ⬠Like most other distributed version control systems, and unlike most server-client systems, every Git directory on every computer is a complete repository with full history and full version tracking capabilities, regardless of network access or central server. Git is free and open source software that is distributed under the terms of the GNU General Public License version 2. Video Git
Histori
Git development began in April 2005, after many Linux kernel developers stopped access to BitKeeper, their source control management system (SCM) previously used to maintain the project. BitKeeper's copyright holder, Larry McVoy, has withdrawn the free use of the product after claiming that Andrew Tridgell has reverse engineered the BitKeeper protocol. (The same incident will also spur the creation of another version control system, Mercurial.)
Linus Torvalds wants a distributable system that can be used like a BitKeeper, but no free system is available that meets his needs. Torvalds cites examples of a source-control management system that takes 30 seconds to apply patches and update all related metadata, and notes that this will not scale for Linux kernel development needs, where synchronization with fellow managers can require 250 such actions at a time. For his design criteria, he determined that the patch should be no more than three seconds, and add three more points:
- Take the System Concurrent Version (CVS) as an example of what no should do; if in doubt, make a reverse decision
- Supports scattered workflows, such as BitKeeper
- Includes very strong protection against corruption, either accidentally or maliciously
This criterion removes any existing version control system, so immediately after the kernel 2.6.12-rc2 kernel release, Torvalds starts writing itself.
Torvalds quipped about the name git (which means unpleasant people in English slang English): "I'm a selfish bastard, and I call all my own projects. First 'Linux', now 'git'. "The man page describes Git as" stupid content tracker. "The readme file from the source code is further elaborated:
The development of Git began on April 3, 2005. Torvalds announced the project on April 6; hosting itself on April 7th. The first merger of several branches took place on 18 April. Torvalds achieves its performance targets; on April 29th, the newborn Git benchmarked the patch patch to the Linux kernel tree at a rate of 6.7 patches per second. On June 16 Git successfully runs the 2.6.12 kernel.
Torvalds handed over maintenance on July 26, 2005 to Junio âââ ⬠<â ⬠Maps Git
Design
Git design is inspired by BitKeeper and Monotone. Git was originally designed as a low-level version control system engine where others could write the front end, like Cogito or StGIT. The Git core project has become a full version control system that can be used directly. Although heavily influenced by BitKeeper, Torvalds deliberately avoids the conventional approach, leading to a unique design.
Characteristics
Another property of Git is that it's a file directory tree snapshot. The earliest systems for tracking source code versions, Source Code Control Systems (SCCS) and Revision Control Systems (RCS), work on individual files and emphasize the space savings that can be obtained from delta interleaved (SCCS) or delta encoding (RCS) (mostly similar ) version. Then the revision control system retains this idea from files that have an identity across multiple project revisions. However, Torvalds rejected this concept. As a result, Git does not explicitly record the revision relationship of files at any level under the source tree tree.
This implicitly revised relationship has several significant consequences:
- Slightly more expensive to check the history of one file change than the entire project. To get a history of changes affecting a given file, Git must run global history and then determine if any changes modify that file. This historical examination method, however, let Git produce with the same efficiency, a single history that shows changes to arbitrary sets of files. For example, the subdirectory of the source tree plus the associated global header file is a very common case.
- Renaming will be handled implicitly rather than explicitly. The common complaint with CVS is that it uses a filename to identify revision history, so moving or renaming files is not possible without interrupting its history, or changing the history name and thus making history inaccurate. Most post-CVS revision control systems solve this problem by providing a unique long name file (analogue to inode number) that survives renaming. Git does not record such identifiers, and this is claimed to be an advantage. Source code files are sometimes split or merged, or renamed, and recording this as a simple name will freeze an inaccurate description of what happened in history (can not be changed). Git fixes the problem by detecting a rename when tracing the snapshot history rather than recording it when creating a snapshot. (In short, a file given in the revision of N, file with the same name in revision N-1 is its default ancestor.However, when no such file is named in revision N-1, Git looks for files that are only in revision N-1 and very similar to new files.) However, it requires more CPU-intensive work every history times are reviewed, and some options for adjusting heuristics are available. This mechanism does not always work; sometimes renamed files with changes in the same commit are read as deletion of old files and creation of new files. Developers can overcome this limitation by renaming and changing separately.
Git implements some merging strategies; non-default strategies can be selected at time of merge:
- complete : traditional three-way merge algorithm.
- recursive : This is the default when pulling or merging a single branch, and is a variant of the three-way merge algorithm.
When there is more than one common ancestor that can be used for three-way merging, it creates a combined tree of common ancestors and uses it as a reference tree for three-way merging. This has been reported to result in fewer merge conflicts without causing incorrect errors with testing done on a composite of previous requests derived from Linux 2.6 kernel development history. Also, it can detect and handle aggregates that involve renaming.
- octopus : This is the default when combining more than two heads.
Data Structure â ⬠<â â¬
Primitive Git is not inherently a source code management system. Torvalds explained,
In many ways you can only see git as a filesystem - it's content-addressable, and has an idea of ââthe version, but I really really designed coming to the problem from the perspective of people filesystem (hey, kernel is what I do), and I really really zero are interested in creating traditional SCM systems.
From this initial design approach, Git has developed a complete set of features expected from traditional SCM, with features that are mostly made as needed, then refined and extended over time.
Git has two data structures: a fluctuating index (also called stage or cache ) that stores information about the working directory and subsequent revisions to committed; and an append-only object database that can not be changed.
Index serves as a connection point between the object database and the work tree.
The Database object contains four types of objects:
- A blob (binary large object) is the file content. Blobs do not have the correct filename, time stamp, or other metadata. (The name of the blob is internally a hash of its contents.)
- The tree object is equivalent to the directory. It contains a list of file names, each with some kind of bits and references to the blob or tree object ie the file, symbolic symbol, or directory contents. These objects are snapshots of the source tree. (Overall, it consists of a Merkle tree, which means that only one hash for the root tree is enough and actually used in the commit to precisely pinpoint the overall status of the tree structure of any number of sub-directories and files.)/Li>
- The commit object links a tree object into a history. Contains the name of the tree object (from the top-level source directory), the time stamp, the log message, and the name of the zero or more parent object.
- The tag object is a container that contains references to other objects and can store additional meta-data associated with other objects. Generally, it is used to store a digital signature of a commit object associated with a particular release of the data tracked by Git.
Each object is identified by the SHA-1 hash of its contents. Git calculates the hash, and uses this value for the object name. The object is entered into a directory that matches the first two characters of the hash. The rest of the hash is used as the file name for that object.
Git saves each file revision as a unique blob. The relationship between the blob can be found through examining trees and performing objects. Newly added objects are stored as a whole using zlib compression. It can spend a lot of disk space quickly, so objects can be incorporated into packages , which use delta compression to save space, store blobs as changes relative to other clots.
The Git server usually listens on TCP 9418 port.
References
Any object in an unreferenced Git database can be cleaned by using garbage collection command, or automatically. Objects can be referenced by other objects, or explicit references. Git knows different types of references. Orders to create, move, and delete references vary. "git show-ref" lists all references. Some types are:
- head : refers to an object locally
- remote : refers to objects that are in the remote repository
- deposits : refers to uncharted objects
- meta âââ â¬
- tags : see above
Implementations
Git is primarily developed on Linux, although it also supports most operating systems including BSD, Solaris, macOS, and Windows.
The first Windows port of Git is primarily a Linux emulation framework that hosts the Linux version. Installing Git under Windows creates the same Program Files directory as the name containing the MinGW port of the GNU Compiler Collection, Perl 5, msys2.0 (itself a Cygwin fork, a Unix-like emulation environment for Windows) and various other Windows port or emulation utilities Linux and libraries. Currently the original Windows build from Git is distributed as 32 and 64-bit installers.
The Git Jit implementation is a pure Java software library, designed to be embedded in all Java applications. JGit is used in the code review tool Gerrit and in EGit, the Git client for Eclipse IDE.
The application of Dulwich Git is a pure Python software component for Python 2.7, 3.4 and 3.5
The libgit2 implementation of Git is an ANSI C software library with no other dependencies, which can be built on various platforms including Windows, Linux, macOS, and BSD. It has bindings for many programming languages, including Ruby, Python, and Haskell.
JS-Git is a JavaScript implementation of the Git subset.
Git Server
Because Git is a distributed version control system, it can be used as a server out of the box. Custom Git server software helps, among other features, to add access controls, displays content from Git repositories over the web, and helps manage multiple repositories. Remote file storage and shell access: A Git repository can be cloned to a shared file system, and accessed by others. It can also be accessed via remote shell just by installing Git software and allowing users to login.
Open source
- gitolite, the script above git software to provide fine grained access control
- Gerrit, a git server that can be configured to support code review, and provides access via ssh, Apache MINA or OpenSSH integrated, or an integrated Jetty web server. Gerrit provides integration for LDAP, Active Directory, OpenID, OAuth, Kerberos/GSSAPI, X509, https client certificates. With Gerrit 3.0 all configurations will be saved as a git repository, no database is needed to run. Gerrit has a pull request feature that is implemented in essence but does not have a GUI for it.
- Phabricator, spin off from Facebook. Since Facebook uses mainly Mercurial, git support is not so prominent.
- Trac, supports git, Mercurial, and Subversion with a modified BSD license.
- Kallithea, supporting git and Mercurial, was developed in Python under the GPL license.
- Some other FLOSS complete solutions for their own hosting are Gogs, and Gitea are both developed in Go language with MIT license,
Open core
Some parts of the software are open source, additional features require commercial licensing.
- GitLab, similar to GitHub, does not have a code review feature like gerrit, but implements pull requests - called "combined requests"
Ownership
There are a number of exclusive solutions that can be installed on location. These include Atlassian Bitbucket, Microsoft Team Foundation Server, and more.
As a service
The best known is probably GitHub, and GitLab offers, but many others are available, like GerritForge, etc.
Adoption
The Eclipse Foundation reported in its annual community survey that in May 2014, Git is now the most widely used source code management tool, with 42.9% of professional software developers reporting that they use Git as their primary source control system compared to 36.3 % in 2013, 32% in 2012; or for Git responses except for the use of GitHub: 33.3% in 2014, 30.3% in 2013, 27.6% in 2012 and 12.8% in 2011. Open source directories Black Duck Open Hub reports the uptake the same among open source projects. Survey developer Stack Overflow reported in 2015 that 69.3% of developers use Git; 36.9% using Subversion; 12.2% using TFS; and 7.9% using Mercurial.
The UK IT job website reported that by the end of September 2016, 29.27% ââof UK permanent software development jobs have mentioned Git, ahead of 12.17% for Microsoft Team Foundation Server, 10.60% for Subversion, 1.30% for Mercurial, and 0.48% for Visual SourceSafe.
Since February 2017, Microsoft has been in the process of migrating Microsoft Windows development to Git, migrating from Perforce. To handle tree size of Windows source code, Microsoft is required to develop customizations for the software, including Git Virtual File System (GVFS), a system that allows cloned repositories to use placeholders whose content is downloaded only once files are accessed.
Extensions
There are many Git extensions , such as Git LFS, which started as an extension for Git in the Github community and are now widely used by other repositories. Both projects are developed and maintained independently by different people, but at some point in the future many widely used extensions can be merged into Git.
Security
Git does not provide access control mechanisms, but is designed for operations with other tools that specialize in access control.
On December 17, 2014, exploits were found to affect Windows and Mac versions of Git clients. An attacker can execute arbitrary code on the target computer with Git installed by creating a malicious Git tree (directory) named .git (a directory in the Git repository that stores all the repository data) in different cases (eg. GIT or.Git, required because Git does not allow the all-lowercase version of .git created manually) with malicious files in the .git/hooks subdirectory (a folder with files that can executed by Git) on the repository created by the attacker or on the repository that can be modified by the attacker. If the Windows or Mac users dragged (downloaded) the version of the repository with the malicious directory, then switched to that directory, the.git directory will be overwritten (due to case-insensitive nature of Windows and Mac filesystems) and malicious executable files in .git/hooks can be executed, which results in the attacker being executed. An attacker can also modify the .git/config config file, which allows attackers to create dangerous Git aliases (aka for Git commands or external commands) or modify existing aliases to execute malicious commands while running. The vulnerability was patched in version 2.2.1 of Git, released on December 17, 2014, and announced the following day.
Git version 2.6.1, released on September 29, 2015, contains patches for security vulnerabilities (CVE-2015-7545) that allow arbitrary code execution. The vulnerability is exploited if the attacker can convince the victim to clone a specific URL, because the arbitrary command is embedded in the URL itself. An attacker can use exploits via man-in-the-middle attacks if the connection is unencrypted, because they can redirect users to the URL of their choice. Recursive records are also vulnerable, as they allow the repository controller to define arbitrary URLs via gitmodules files.
Git uses SHA-1 hash internally. Linus Torvalds has responded that the hash was largely to guard against unintentional corruption, and the safe provided by cryptographically safe is only an unintentional side effect, with major security being signed elsewhere.
See also
- GitHub
- Comparison of version control software
- Compare source hosting facility
- List of revision control software
References
External links
- Official website
- Git in Open Hub
Source of the article : Wikipedia