This text is a work in progress—highly subject to change—and may not accurately describe any released version of the Apache™ Subversion® software. Bookmarking or otherwise referring others to this page is probably not such a smart idea. Please visit http://www.svnbook.com/ for stable versions of this book.

Creating and Configuring Your Repository

Earlier in this chapter (in the section called “Strategies for Repository Deployment”), we looked at some of the important decisions that should be made before creating and configuring your Subversion repository. Now, we finally get to get our hands dirty! In this section, we'll see how to actually create a Subversion repository and configure it to perform custom actions when special repository events occur.

Creating the Repository

Subversion repository creation is an incredibly simple task. The svnadmin utility that comes with Subversion provides a subcommand (svnadmin create) for doing just that.

$ # Create a repository
$ svnadmin create /var/svn/repos
$

Assuming that the parent directory /var/svn exists and that you have sufficient permissions to modify that directory, the previous command creates a new repository in the directory /var/svn/repos, and with the default filesystem data store (FSFS). You can explicitly choose the filesystem type using the --fs-type argument, which accepts as a parameter either fsfs or bdb.

$ # Create an FSFS-backed repository
$ svnadmin create --fs-type fsfs /var/svn/repos
$
# Create a legacy Berkeley-DB-backed repository
$ svnadmin create --fs-type bdb /var/svn/repos
$

After running this simple command, you have a Subversion repository. Depending on how users will access this new repository, you might need to fiddle with its filesystem permissions. But since basic system administration is rather outside the scope of this text, we'll leave further exploration of that topic as an exercise to the reader.

[Tip] Tip

The path argument to svnadmin is just a regular filesystem path and not a URL like the svn client program uses when referring to repositories. Both svnadmin and svnlook are considered server-side utilities—they are used on the machine where the repository resides to examine or modify aspects of the repository, and are in fact unable to perform tasks across a network. A common mistake made by Subversion newcomers is trying to pass URLs (even local file:// ones) to these two programs.

Present in the db/ subdirectory of your repository is the implementation of the versioned filesystem. Your new repository's versioned filesystem begins life at revision 0, which is defined to consist of nothing but the top-level root (/) directory. Initially, revision 0 also has a single revision property, svn:date, set to the time at which the repository was created.

Now that you have a repository, it's time to customize it.

[Warning] Warning

While some parts of a Subversion repository—such as the configuration files and hook scripts—are meant to be examined and modified manually, you shouldn't (and shouldn't need to) tamper with the other parts of the repository by hand. The svnadmin tool should be sufficient for any changes necessary to your repository, or you can look to third-party tools for tweaking relevant subsections of the repository. Do not attempt manual manipulation of your version control history by poking and prodding around in your repository's data store files!

Implementing Repository Hooks

A hook is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. Some hooks (the so-called pre hooks) run in advance of a repository operation and provide a means by which to both report what is about to happen and prevent it from happening at all. Other hooks (the post hooks) run after the completion of a repository event and are useful for performing tasks that examine—but don't modify—the repository. Each hook is handed enough information to tell what that event is (or was), the specific repository changes proposed (or completed), and the username of the person who triggered the event.

The hooks subdirectory is, by default, filled with templates for various repository hooks:

$ ls repos/hooks/
post-commit.tmpl          post-unlock.tmpl  pre-revprop-change.tmpl
post-lock.tmpl            pre-commit.tmpl   pre-unlock.tmpl
post-revprop-change.tmpl  pre-lock.tmpl     start-commit.tmpl
$

There is one template for each hook that the Subversion repository supports; by examining the contents of those template scripts, you can see what triggers each script to run and what data is passed to that script. Also present in many of these templates are examples of how one might use that script, in conjunction with other Subversion-supplied programs, to perform common useful tasks. To actually install a working hook, you need only place some executable program or script into the repos/hooks directory, which can be executed as the name (such as start-commit or post-commit) of the hook.

On Unix platforms, this means supplying a script or program (which could be a shell script, a Python program, a compiled C binary, or any number of other things) named exactly like the name of the hook. Of course, the template files are present for more than just informational purposes—the easiest way to install a hook on Unix platforms is to simply copy the appropriate template file to a new file that lacks the .tmpl extension, customize the hook's contents, and ensure that the script is executable. Windows, however, uses file extensions to determine whether a program is executable, so you would need to supply a program whose basename is the name of the hook and whose extension is one of the special extensions recognized by Windows for executable programs, such as .exe for programs and .bat for batch files.

Subversion executes hooks as the same user who owns the process that is accessing the Subversion repository. In most cases, the repository is being accessed via a Subversion server, so this user is the same user as whom the server runs on the system. The hooks themselves will need to be configured with OS-level permissions that allow that user to execute them. Also, this means that any programs or files (including the Subversion repository) accessed directly or indirectly by the hook will be accessed as the same user. In other words, be alert to potential permission-related problems that could prevent the hook from performing the tasks it is designed to perform.

There are several hooks implemented by the Subversion repository, and you can get details about each of them in Subversion Repository Hook Reference. As a repository administrator, you'll need to decide which hooks you wish to implement (by way of providing an appropriately named and permissioned hook program), and how. When you make this decision, keep in mind the big picture of how your repository is deployed. For example, if you are using server configuration to determine which users are permitted to commit changes to your repository, you don't need to do this sort of access control via the hook system.

Hook script environment configuration

By default, Subversion executes hook scripts with an empty environment—that is, no environment variables are set at all, not even $PATH (or %PATH%, under Windows). Because of this, many administrators are baffled when their hook program runs fine by hand, but doesn't work when invoked by Subversion. Administrators have historically worked around this problem by manually setting all the environment variables their hook scripts need in the scripts themselves.

Subversion 1.8 introduces a new way to manage the environment of Subversion-executed hook scripts—the hook script environment configuration file. If a Subversion server finds a file named hooks-env in the repository's conf/ subdirectory, it parses that file as an INI-formatted configuration file and applies the option names and variables found therein to the hook script's execution environment as environment variables.

The syntax of the hooks-env file is pretty straightforward: each section name is the name of a hook script (such as pre-commit or post-revprop-change), and the configuration items inside that section are treated as mappings of environment variable names to desired values. Additionally, there is a special [default] section, which can be used to configure environment variable mappings that should be applied to all hook scripts (unless explicitly overridden by per-hook-script settings). See Example 5.1, “hooks-env (custom hook script environment configuration)” for a sample hooks-env configuration file.

Example 5.1. hooks-env (custom hook script environment configuration)

# All scripts should use a UTF-8 locale and have our hook script
# utilities directory on the search path.

[default]
LANG = en_US.UTF-8
PATH = /usr/local/svn/tools:/usr/bin


# The post-commit and post-revprop-change scripts want to run
# programs from our custom synctools replication software suite, too.

[post-commit]
PATH = /usr/local/synctools-1.1/bin:%(PATH)s

[post-revprop-change]
PATH = /usr/local/synctools-1.1/bin:%(PATH)s

[Note] Note

Example 5.1, “hooks-env (custom hook script environment configuration)” also demonstrates the nifty string substitution syntax found in Subversion's configuration file parser. In this example, the value of the PATH option—pulled from the [default] section of the file—is substituted in place of the %(PATH)s placeholder text in the per-hook sections. For more about this special syntax, see the README.txt file which lives in the Subversion runtime configuration directory. (And for more information about that directory, see the section called “Runtime Configuration Area”.)

Of course, having exact duplicates of your custom hook script environment configuration files in every single repository's conf/ directory could get cumbersome, especially when you need to make changes to them all. So Subversion's servers allow you to specify an alternate (possibly shared) location for this configuration information.

Common uses for hook scripts

Repository hook scripts can offer a wide range of utility, but most tend to fall into a few basic categories: notification, validation, and replication.

Notification scripts are those which tell someone that something happened. The most common of these found in a Subversion service offering involve programs which send commit and revision property change notification emails to project members, driven by the post-commit and post-revprop-change hooks, respectively. There are numerous other notification approaches, from issue tracker integration scripts to scripts which operate as IRC bots to announce that something's changed in the repository.

On the validation side of things, the start-commit and pre-commit hooks are widely used to allow or disallow commits based on various criteria: the author of the commit, the formatting and/or content of the log message which describes the commit, and even the low-level details of the changes made to files and directories in the commit. Likewise, the pre-revprop-change hook acts as the gateway to revision property changes, which is an especially valuable role considering the fact that revision properties are not themselves versioned, and can therefore only be modified destructively.

One special class of change validation that has seen widespread use since Subversion 1.5 was released is validation of the committing client software itself. When Subversion's merge tracking feature (described extensively in Chapter 4, Branching and Merging) was introduced in that release, Subversion administrators needed a way to ensure that once users of their repositories started using the new feature that all their merges were tracked. To reduce the chance of someone committing an untracked merge to the repository, they used start-commit hooks to examine the feature capabilities string advertised by Subversion clients. If the committing client didn't advertise support for merge tracking, the commit was denied with instructions to the user to immediately update their Subversion client! Example 5.2, “start-commit hook to require merge tracking support” provides an example of a start-commit script which does precisely this.

Example 5.2. start-commit hook to require merge tracking support

#!/usr/bin/env python
import sys

# sys.argv[3] is a colon-delimited capabilities list
if 'mergeinfo' not in sys.argv[3].split(':'):
  sys.stderr.write("""\
ERROR: Commits to this repository must be made using Subversion
clients which support the merge tracking feature.  Please upgrade
your client to at least Subversion 1.5.0.
""")
  sys.exit(1)

Beginning in Subversion 1.8, clients committing against a Subversion 1.8 server will still provide the feature capabilities string, but will also provide additional information about themselves by way of ephemeral transaction properties. Ephemeral transaction properties are essentially revision properties which are set on the commit transaction by the client at the earliest opportunity while committing, but which are automatically removed by the server immediately prior to the transaction becoming a finalized revision. You can inspect these properties using the same tools with which you'd inspect other unversioned properties set on commit transactions during the timeframe between which the start-commit and pre-commit repository hook scripts would operate.

The following are the ephemeral transaction properties which Subversion currently provides and implements:

svn:txn-client-compat-version

Carries the Subversion library version string with which the committing client claims compatibility. This is useful for deciding whether the client supports the minimal feature set required for proper handling of the repository data.

svn:txn-user-agent

Carries the user agent string which describes the committing client program. Subversion's libraries define the initial portion of this string, but third-party consumers of the API (GUI clients, etc.) can append custom information to it.

[Note] Note

While most clients will transmit ephemeral transaction properties early enough in the commit process that they may be inspected by the start-commit hook script, some configurations of Subversion will cause those properties to not be set on the transaction until later in the commit process. Administrators should consider performing any validation based on ephemeral transaction properties in both the start-commit and pre-commit hooks—the former to rule out invalid clients before those clients transmit the commit payload; the latter just in case the validation checks couldn't be performed by the start-commit hook.

As noted before, ephemeral transaction properties are removed from the transaction just before it is promoted to a new revision. Some administrators may wish to preserve the information in those properties indefinitely. We suggest that you do so by using the pre-commit hook script to copy the values of those properties to new property names. In fact, the Subversion source code distribution provides a persist-ephemeral-txnprops.py script (in the tools/hook-scripts/ subdirectory) for doing precisely that.

The third common type of hook script usage is for the purpose of replication. Whether you are driving a simple backup process or a more involved remote repository mirroring scenario, hook scripts can be critical. See the section called “Repository Backup” and the section called “Repository Replication” for more information about these aspects of repository maintenance.

Finding hook scripts or rolling your own

As you might imagine, there is no shortage of Subversion hook programs and scripts that are freely available either from the Subversion community itself or elsewhere. In fact, the Subversion distribution provides several commonly used hook scripts in its tools/hook-scripts/ subdirectory. However, if you are unable to find one that meets your specific needs, you might consider writing your own. See Chapter 8, Embedding Subversion for information about developing software using Subversion's public APIs.

[Warning] Warning

Hook scripts can do almost anything, but hook script authors should show restraint. It might be tempting to, say, use hook scripts to automatically correct errors, shortcomings, or policy violations present in the files being committed. Unfortunately, doing so can cause problems. Subversion keeps client-side caches of certain bits of repository data, and if you change a commit transaction in this way, those caches become indetectably stale, leading to surprising and unexpected behavior. While it is generally okay to add new commit transaction properties via a hook script, essentially everything else about a commit transaction should be considered read-only. Instead of modifying a transaction to polish its payload, simply validate the transaction in the pre-commit hook and reject the commit if it does not meet the desired requirements. As a bonus, your users will learn the value of careful, compliance-minded work habits.

FSFS Configuration

As of Subversion 1.6, FSFS filesystems have several configurable parameters which an administrator can use to fine-tune the performance or disk usage of their repositories. You can find these options—and the documentation for them—in the db/fsfs.conf file in the repository.