As developers, we all make mistakes. Sometimes, these mistakes can be simple but have a significant impact on our work. One common mistake is committing large files to a git repository, which can be clunky to fix and prevent. However, there is a simple solution to avoid making the same mistake again: the pre-commit tool.
What is the pre-commit tool?
The pre-commit tool is a powerful tool that runs various linters, code formatters, and file checkers using git hooks. Git hooks are scripts that run before or after specific Git actions, such as committing code. The pre-commit tool, as the name suggests, runs before the commit action, ensuring that your files pass the configured checks. If any of the checks fail, your commit will be rejected.
The pre-commit tool supports a wide range of hooks for different languages and file formats commonly used by data engineers, including Docker File format validation, Python, R, SQL, and Terraform.
Why use pre-commit?
Using pre-commit has several benefits:
- Speed: By running code quality checks locally before submitting code to a CI/CD pipeline, you can catch potential issues early and avoid wasting time and resources on failed pipeline runs.
- Cost saving: If your CI/CD pipeline runs in the cloud, failed runs can incur direct costs or result in lost time. Pre-commit helps prevent unnecessary pipeline runs by catching issues locally.
- Consistency: Pre-commit ensures that code quality checks are consistently applied across the team or project, reducing the chances of introducing code style or standard violations.
- Quality enforcement: By running code quality checks before committing code, you can enforce best practices and maintain high-quality code.
- Security best practice detection: Pre-commit can also include hooks to detect security vulnerabilities or best practice violations, providing an additional layer of protection.
Getting started with pre-commit
To start using pre-commit, you need to follow a few simple steps:
- Install the pre-commit tool according to the provided installation instructions.
- Create a file called
.pre-commit-config.yaml
in your project’s root directory. This file specifies where to get git hooks from and which hooks to use. - Generate a small configuration by running the command
pre-commit sample-config > .pre-commit-config.yaml
. - Edit the
.pre-commit-config.yaml
file to include the desired hooks. For example, you can include hooks for removing trailing whitespace, ensuring new line characters at the end of files, validating YAML files, and enforcing file size limits. - Attach the pre-commit hooks to your project by running the command
pre-commit install
. This ensures that the hooks are executed whenever you run a git commit.
You can also run the pre-commit hooks for all files in your project, not just the ones in a commit, by using the command pre-commit run --all-files
. This is useful when you want to run checks without committing the changes.
Detaching pre-commit from your commits
Sometimes, you may want to make changes to the .pre-commit-config.yaml
file and test those changes without affecting other files. In such cases, you can detach pre-commit using the command pre-commit uninstall
. This allows you to commit changes to the .pre-commit-config.yaml
file separately from other changes.
Pre-commit hooks for SQL code
For linting SQL code and fixing any linting errors, SQLFluff is a popular package with two hooks: sqlfluff-lint
and sqlfluff-fix
. To add these hooks to your project, include the following configuration in your .pre-commit-config.yaml
file:
- repo: https://github.com/sqlfluff/sqlfluff rev: 1.4.5 hooks: - id: sqlfluff-lint args: [ '--dialect=tsql' ] - id: sqlfluff-fix args: [ '--dialect=tsql' ]
These hooks will check your SQL files against a set of SQL code style rules. You can configure SQLFluff to ignore specific rules using inline comments in your SQL code or by using a .sqlfluff
configuration file.
Housekeeping for pre-commit
Periodically, you may need to update the versions of the hooks used by pre-commit. You can use the command pre-commit autoupdate
to update the versions. Additionally, you can use the commands pre-commit clean
and pre-commit gc
to clean up old cached versions of the hooks.
It’s also worth mentioning that pre-commit is an open-source project, and many of the hooks are written in Python or Ruby. This means you can modify the hooks or contribute to the project if you need additional functionality or improvements.
Concluding thoughts
The pre-commit tool is a valuable asset for improving code quality and enforcing best practices. By running code quality checks locally before committing code, you can catch issues early, save time and resources, and ensure consistent and high-quality code. With a wide range of hooks available, including hooks for SQL code, pre-commit is a versatile tool that can benefit any development project.
So why not give pre-commit a try and see how it can enhance your development workflow?