This documentation was written to describe the 1.6.x series of Subversion. If you are running a different version of Subversion, you are strongly encouraged to visit http://www.svnbook.com/ and instead consult the version of this documentation appropriate for your version of Subversion.

httpd, the Apache HTTP Server

The Apache HTTP Server is a heavy-duty network server that Subversion can leverage. Via a custom module, httpd makes Subversion repositories available to clients via the WebDAV/DeltaV[47] protocol, which is an extension to HTTP 1.1. This protocol takes the ubiquitous HTTP protocol that is the core of the World Wide Web, and adds writing—specifically, versioned writing—capabilities. The result is a standardized, robust system that is conveniently packaged as part of the Apache 2.0 software, supported by numerous operating systems and third-party products, and doesn't require network administrators to open up yet another custom port.[48] While an Apache-Subversion server has more features than svnserve, it's also a bit more difficult to set up. With flexibility often comes more complexity.

Much of the following discussion includes references to Apache configuration directives. While some examples are given of the use of these directives, describing them in full is outside the scope of this chapter. The Apache team maintains excellent documentation, publicly available on their web site at http://httpd.apache.org. For example, a general reference for the configuration directives is located at http://httpd.apache.org/docs-2.0/mod/directives.html.

Also, as you make changes to your Apache setup, it is likely that somewhere along the way a mistake will be made. If you are not already familiar with Apache's logging subsystem, you should become aware of it. In your httpd.conf file are directives that specify the on-disk locations of the access and error logs generated by Apache (the CustomLog and ErrorLog directives, respectively). Subversion's mod_dav_svn uses Apache's error logging interface as well. You can always browse the contents of those files for information that might reveal the source of a problem that is not clearly noticeable otherwise.

Prerequisites

To network your repository over HTTP, you basically need four components, available in two packages. You'll need Apache httpd 2.0 or newer, the mod_dav DAV module that comes with it, Subversion, and the mod_dav_svn filesystem provider module distributed with Subversion. Once you have all of those components, the process of networking your repository is as simple as:

  • Getting httpd up and running with the mod_dav module

  • Installing the mod_dav_svn backend to mod_dav, which uses Subversion's libraries to access the repository

  • Configuring your httpd.conf file to export (or expose) the repository

You can accomplish the first two items either by compiling httpd and Subversion from source code or by installing prebuilt binary packages of them on your system. For the most up-to-date information on how to compile Subversion for use with the Apache HTTP Server, as well as how to compile and configure Apache itself for this purpose, see the INSTALL file in the top level of the Subversion source code tree.

Basic Apache Configuration

Once you have all the necessary components installed on your system, all that remains is the configuration of Apache via its httpd.conf file. Instruct Apache to load the mod_dav_svn module using the LoadModule directive. This directive must precede any other Subversion-related configuration items. If your Apache was installed using the default layout, your mod_dav_svn module should have been installed in the modules subdirectory of the Apache install location (often /usr/local/apache2). The LoadModule directive has a simple syntax, mapping a named module to the location of a shared library on disk:

LoadModule dav_svn_module     modules/mod_dav_svn.so

Apache interprets the LoadModule configuration item's library path as relative to its own server root. If configured as previously shown, Apache will look for the Subversion DAV module shared library in its own modules/ subdirectory. Depending on how Subversion was installed on your system, you might need to specify a different path for this library altogether, perhaps even an absolute path such as in the following example:

LoadModule dav_svn_module     C:/Subversion/lib/mod_dav_svn.so

Note that if mod_dav was compiled as a shared object (instead of statically linked directly to the httpd binary), you'll need a similar LoadModule statement for it, too. Be sure that it comes before the mod_dav_svn line:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so

At a later location in your configuration file, you now need to tell Apache where you keep your Subversion repository (or repositories). The Location directive has an XML-like notation, starting with an opening tag and ending with a closing tag, with various other configuration directives in the middle. The purpose of the Location directive is to instruct Apache to do something special when handling requests that are directed at a given URL or one of its children. In the case of Subversion, you want Apache to simply hand off support for URLs that point at versioned resources to the DAV layer. You can instruct Apache to delegate the handling of all URLs whose path portions (the part of the URL that follows the server's name and the optional port number) begin with /repos/ to a DAV provider whose repository is located at /var/svn/repository using the following httpd.conf syntax:

<Location /repos>
  DAV svn
  SVNPath /var/svn/repository
</Location>

If you plan to support multiple Subversion repositories that will reside in the same parent directory on your local disk, you can use an alternative directive—SVNParentPath—to indicate that common parent directory. For example, if you know you will be creating multiple Subversion repositories in a directory /var/svn that would be accessed via URLs such as http://my.server.com/svn/repos1, http://my.server.com/svn/repos2, and so on, you could use the httpd.conf configuration syntax in the following example:

<Location /svn>
  DAV svn

  # Automatically map any "/svn/foo" URL to repository /var/svn/foo
  SVNParentPath /var/svn
</Location>

Using this syntax, Apache will delegate the handling of all URLs whose path portions begin with /svn/ to the Subversion DAV provider, which will then assume that any items in the directory specified by the SVNParentPath directive are actually Subversion repositories. This is a particularly convenient syntax in that, unlike the use of the SVNPath directive, you don't have to restart Apache to add or remove hosted repositories.

Be sure that when you define your new Location, it doesn't overlap with other exported locations. For example, if your main DocumentRoot is exported to /www, do not export a Subversion repository in <Location /www/repos>. If a request comes in for the URI /www/repos/foo.c, Apache won't know whether to look for a file repos/foo.c in the DocumentRoot, or whether to delegate mod_dav_svn to return foo.c from the Subversion repository. The result is often an error from the server of the form 301 Moved Permanently.

At this stage, you should strongly consider the question of permissions. If you've been running Apache for some time now as your regular web server, you probably already have a collection of content—web pages, scripts, and such. These items have already been configured with a set of permissions that allows them to work with Apache, or more appropriately, that allows Apache to work with those files. Apache, when used as a Subversion server, will also need the correct permissions to read and write to your Subversion repository.

You will need to determine a permission system setup that satisfies Subversion's requirements without messing up any previously existing web page or script installations. This might mean changing the permissions on your Subversion repository to match those in use by other things that Apache serves for you, or it could mean using the User and Group directives in httpd.conf to specify that Apache should run as the user and group that owns your Subversion repository. There is no single correct way to set up your permissions, and each administrator will have different reasons for doing things a certain way. Just be aware that permission-related problems are perhaps the most common oversight when configuring a Subversion repository for use with Apache.

Authentication Options

At this point, if you configured httpd.conf to contain something such as the following:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
</Location>

your repository is anonymously accessible to the world. Until you configure some authentication and authorization policies, the Subversion repositories that you make available via the Location directive will be generally accessible to everyone. In other words:

  • Anyone can use a Subversion client to check out a working copy of a repository URL (or any of its subdirectories).

  • Anyone can interactively browse the repository's latest revision simply by pointing a web browser to the repository URL.

  • Anyone can commit to the repository.

Of course, you might have already set up a pre-commit hook script to prevent commits (see the section called “Implementing Repository Hooks”). But as you read on, you'll see that it's also possible to use Apache's built-in methods to restrict access in specific ways.

[Tip] Tip

Requiring authentication defends against invalid users directly accessing the repository, but does not guard the privacy of valid users' network activity. See the section called “Protecting network traffic with SSL” for how to configure your server to support SSL encryption, which can provide that extra layer of protection.

Basic authentication

The easiest way to authenticate a client is via the HTTP Basic authentication mechanism, which simply uses a username and password to verify a user's identity. Apache provides the htpasswd utility[49] for managing files containing usernames and passwords.

[Warning] Warning

Basic authentication is extremely insecure, because it sends passwords over the network in very nearly plain text. See the section called “Digest authentication” for details on using the much safer Digest mechanism.

First, create a password file and grant access to users Harry and Sally:

$ ### First time: use -c to create the file
$ ### Use -m to use MD5 encryption of the password, which is more secure
$ htpasswd -c -m /etc/svn-auth.htpasswd harry
New password: *****
Re-type new password: *****
Adding password for user harry
$ htpasswd -m /etc/svn-auth.htpasswd sally
New password: *******
Re-type new password: *******
Adding password for user sally
$

Next, ensure that Apache has access to the modules which provide the Basic authentication and related functionality: mod_auth_basic, mod_authn_file, and mod_authz_user. In many cases, these modules are compiled into httpd itself, but if not, you might need to explicitly load one or more of them using the LoadModule directive:

LoadModule auth_basic_module   modules/mod_auth_basic.so
LoadModule authn_file_module   modules/mod_authn_file.so
LoadModule authz_user_module   moduels/mod_authz_user.so

After ensuring the Apache has access to the required functionality, you'll need to add some more directives inside the <Location> block to tell Apache what type of authentication you wish to use, and just how to do so:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Basic
  AuthName "Subversion repository"
  AuthType Basic
  AuthBasicProvider file
  AuthUserFile /etc/svn-auth.htpasswd
</Location>

These directives work as follows:

  • AuthName is an arbitrary name that you choose for the authentication domain. Most browsers display this name in the dialog box when prompting for username and password.

  • AuthType specifies the type of authentication to use.

  • AuthBasicProvider specifies the Basic authentication provider to use for the location. In our example, we wish to consult a local password file.

  • AuthUserFile specifies the location of the password file to use.

However, this <Location> block doesn't yet do anything useful. It merely tells Apache that if authorization were required, it should challenge the Subversion client for a username and password. (When authorization is required, Apache requires authentication as well.) What's missing here, however, are directives that tell Apache which sorts of client requests require authorization; currently, none do. The simplest thing is to specify that all requests require authorization by adding Require valid-user to the block:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Basic
  AuthName "Subversion repository"
  AuthType Basic
  AuthBasicProvider file
  AuthUserFile /etc/svn-auth.htpasswd

  # Authorization: Authenticated users only
  Require valid-user
</Location>

Refer to the section called “Authorization Options” for more detail on the Require directive and other ways to set authorization policies.

[Note] Note

The default value of the AuthBasicProvider option is file, so we won't bother including it in future examples. Just know that if in a broader context you've set this value to something else, you'll need to explicitly reset it to file within your Subversion <Location> block in order to get that behavior.

Digest authentication

Digest authentication is an improvement on Basic authentication which allows the server to verify a client's identity without sending the password over the network unprotected. Both client and server create a non-reversible MD5 hash of the username, password, requested URI, and a nonce (number used once) provided by the server and changed each time authentication is required. The client sends its hash to the server, and the server then verifies that the hashes match.

Configuring Apache to use Digest authentication is straightforward. You'll need to ensure that the mod_auth_digest module is available (instead of mod_auth_basic), and then make a few small variations on our prior example:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Digest
  AuthName "Subversion repository"
  AuthType Digest
  AuthDigestProvider file
  AuthUserFile /etc/svn-auth.htdigest

  # Authorization: Authenticated users only
  Require valid-user
</Location>

Notice that AuthType is now set to Digest, and we specify a different path for AuthUserFile. Digest authentication uses a different file format than Basic authentication, created and managed using Apache's htdigest utility[50] rather than htpasswd. Digest authentication also has the additional concept of a realm, which must match the value of the AuthName directive.

[Note] Note

For digest authentication, the authentication provider is selected using the AuthDigestProvider as shown in the previous example. As was the case with the AuthBasicProvider directive, file is the default value of the AuthDigestProvider option, so this line is not strictly required unless you need to override a different value thereof inherited from a broader configuration context.

The password file can be created as follows:

$ ### First time: use -c to create the file
$ htdigest -c /etc/svn-auth.htdigest "Subversion repository" harry
Adding password for harry in realm Subversion repository.
New password: *****
Re-type new password: *****
$ htdigest /etc/svn-auth.htdigest "Subversion repository" sally
Adding user sally in realm Subversion repository
New password: *******
Re-type new password: *******
$

Authorization Options

At this point, you've configured authentication, but not authorization. Apache is able to challenge clients and confirm identities, but it has not been told how to allow or restrict access to the clients bearing those identities. This section describes two strategies for controlling access to your repositories.

Blanket access control

The simplest form of access control is to authorize certain users for either read-only access to a repository or read/write access to a repository.

You can restrict access on all repository operations by adding Require valid-user directly inside the <Location> block. The example from the section called “Digest authentication” allows only clients that successfully authenticate to do anything with the Subversion repository:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Digest
  AuthName "Subversion repository"
  AuthType Digest
  AuthUserFile /etc/svn-auth.htdigest

  # Authorization: Authenticated users only
  Require valid-user
</Location>

Sometimes you don't need to run such a tight ship. For example, Subversion's own source code repository at http://svn.collab.net/repos/svn allows anyone in the world to perform read-only repository tasks (such as checking out working copies and browsing the repository), but restricts write operations to authenticated users. The Limit and LimitExcept directives allow for this type of selective restriction. Like the Location directive, these blocks have starting and ending tags, and you would nest them inside your <Location> block.

The parameters present on the Limit and LimitExcept directives are HTTP request types that are affected by that block. For example, to allow anonymous read-only operations, you would use the LimitExcept directive (passing the GET, PROPFIND, OPTIONS, and REPORT request type parameters) and place the previously mentioned Require valid-user directive inside the <LimitExcept> block instead of just inside the <Location> block.

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Digest
  AuthName "Subversion repository"
  AuthType Digest
  AuthUserFile /etc/svn-auth.htdigest

  # Authorization: Authenticated users only for non-read-only
  #                (write) operations; allow anonymous reads
  <LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
  </LimitExcept>
</Location>

These are only a few simple examples. For more in-depth information about Apache access control and the Require directive, take a look at the Security section of the Apache documentation's tutorials collection at http://httpd.apache.org/docs-2.0/misc/tutorials.html.

Per-directory access control

It's possible to set up finer-grained permissions using mod_authz_svn. This Apache module grabs the various opaque URLs passing from client to server, asks mod_dav_svn to decode them, and then possibly vetoes requests based on access policies defined in a configuration file.

If you've built Subversion from source code, mod_authz_svn is automatically built and installed alongside mod_dav_svn. Many binary distributions install it automatically as well. To verify that it's installed correctly, make sure it comes right after mod_dav_svn's LoadModule directive in httpd.conf:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so

To activate this module, you need to configure your <Location> block to use the AuthzSVNAccessFile directive, which specifies a file containing the permissions policy for paths within your repositories. (In a moment, we'll discuss the format of that file.)

Apache is flexible, so you have the option to configure your block in one of three general patterns. To begin, choose one of these basic configuration patterns. (The following examples are very simple; look at Apache's own documentation for much more detail on Apache authentication and authorization options.)

The most open approach is to allow access to everyone. This means Apache never sends authentication challenges, and all users are treated as anonymous. (See Example 6.2, “A sample configuration for anonymous access”.)

Example 6.2. A sample configuration for anonymous access

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: None

  # Authorization: Path-based access control
  AuthzSVNAccessFile /path/to/access/file
</Location>

On the opposite end of the paranoia scale, you can configure Apache to authenticate all clients. This block unconditionally requires authentication via the Require valid-user directive, and defines a means to authenticate valid users. (See Example 6.3, “A sample configuration for authenticated access”.)

Example 6.3. A sample configuration for authenticated access

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Digest
  AuthName "Subversion repository"
  AuthType Digest
  AuthUserFile /etc/svn-auth.htdigest

  # Authorization: Path-based access control; authenticated users only
  AuthzSVNAccessFile /path/to/access/file
  Require valid-user
</Location>

A third very popular pattern is to allow a combination of authenticated and anonymous access. For example, many administrators want to allow anonymous users to read certain repository directories, but restrict access to more sensitive areas to authenticated users. In this setup, all users start out accessing the repository anonymously. If your access control policy demands a real username at any point, Apache will demand authentication from the client. To do this, use both the Satisfy Any and Require valid-user directives. (See Example 6.4, “A sample configuration for mixed authenticated/anonymous access”.)

Example 6.4. A sample configuration for mixed authenticated/anonymous access

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # Authentication: Digest
  AuthName "Subversion repository"
  AuthType Digest
  AuthUserFile /etc/svn-auth.htdigest

  # Authorization: Path-based access control; try anonymous access
  #                first, but authenticate if necessary
  AuthzSVNAccessFile /path/to/access/file
  Satisfy Any
  Require valid-user
</Location>

The next step is to create the authorization file containing access rules for particular paths within the repository. We describe how later in this chapter, in the section called “Path-Based Authorization”.

Disabling path-based checks

The mod_dav_svn module goes through a lot of work to make sure that data you've marked unreadable doesn't get accidentally leaked. This means it needs to closely monitor all of the paths and file-contents returned by commands such as svn checkout and svn update. If these commands encounter a path that isn't readable according to some authorization policy, the path is typically omitted altogether. In the case of history or rename tracing—for example, running a command such as svn cat -r OLD foo.c on a file that was renamed long ago—the rename tracking will simply halt if one of the object's former names is determined to be read-restricted.

All of this path checking can sometimes be quite expensive, especially in the case of svn log. When retrieving a list of revisions, the server looks at every changed path in each revision and checks it for readability. If an unreadable path is discovered, it's omitted from the list of the revision's changed paths (normally seen with the --verbose (-v) option), and the whole log message is suppressed. Needless to say, this can be time-consuming on revisions that affect a large number of files. This is the cost of security: even if you haven't configured a module such as mod_authz_svn at all, the mod_dav_svn module is still asking Apache httpd to run authorization checks on every path. The mod_dav_svn module has no idea what authorization modules have been installed, so all it can do is ask Apache to invoke whatever might be present.

On the other hand, there's also an escape hatch of sorts, which allows you to trade security features for speed. If you're not enforcing any sort of per-directory authorization (i.e., not using mod_authz_svn or similar module), you can disable all of this path checking. In your httpd.conf file, use the SVNPathAuthz directive as shown in Example 6.5, “Disabling path checks altogether”.

Example 6.5. Disabling path checks altogether

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  SVNPathAuthz off
</Location>

The SVNPathAuthz directive is on by default. When set to off, all path-based authorization checking is disabled; mod_dav_svn stops invoking authorization checks on every path it discovers.

Protecting network traffic with SSL

Connecting to a repository via http:// means that all Subversion activity is sent across the network in the clear. This means that actions such as checkouts, commits, and updates could potentially be intercepted by an unauthorized party sniffing network traffic. Encrypting traffic using SSL is a good way to protect potentially sensitive information over the network.

If a Subversion client is compiled to use OpenSSL, it gains the ability to speak to an Apache server via https:// URLs, so all traffic is encrypted with a per-connection session key. The WebDAV library used by the Subversion client is not only able to verify server certificates, but can also supply client certificates when challenged by the server.

Subversion server SSL certificate configuration

It's beyond the scope of this book to describe how to generate client and server SSL certificates and how to configure Apache to use them. Many other references, including Apache's own documentation, describe the process.

[Tip] Tip

SSL certificates from well-known entities generally cost money, but at a bare minimum, you can configure Apache to use a self-signed certificate generated with a tool such as OpenSSL (http://openssl.org).[51]

Subversion client SSL certificate management

When connecting to Apache via https://, a Subversion client can receive two different types of responses:

  • A server certificate

  • A challenge for a client certificate

Server certificate

When the client receives a server certificate, it needs to verify that the server is who it claims to be. OpenSSL does this by examining the signer of the server certificate, or certificate authority (CA). If OpenSSL is unable to automatically trust the CA, or if some other problem occurs (such as an expired certificate or hostname mismatch), the Subversion command-line client will ask you whether you want to trust the server certificate anyway:

$ svn list https://host.example.com/repos/project

Error validating server certificate for 'https://host.example.com:443':
 - The certificate is not issued by a trusted authority.  Use the
   fingerprint to validate the certificate manually!
Certificate information:
 - Hostname: host.example.com
 - Valid: from Jan 30 19:23:56 2004 GMT until Jan 30 19:23:56 2006 GMT
 - Issuer: CA, example.com, Sometown, California, US
 - Fingerprint: 7d:e1:a9:34:33:39:ba:6a:e9:a5:c4:22:98:7b:76:5c:92:a0:9c:7b

(R)eject, accept (t)emporarily or accept (p)ermanently?

This dialogue is essentially the same question you may have seen coming from your web browser (which is just another HTTP client like Subversion). If you choose the (p)ermanent option, Subversion will cache the server certificate in your private runtime auth/ area, just as your username and password are cached (see the section called “Caching credentials”), and will automatically trust the certificate in the future.

Your runtime servers file also gives you the ability to make your Subversion client automatically trust specific CAs, either globally or on a per-host basis. Simply set the ssl-authority-files variable to a semicolon-separated list of PEM-encoded CA certificates:

[global]
ssl-authority-files = /path/to/CAcert1.pem;/path/to/CAcert2.pem

Many OpenSSL installations also have a predefined set of default CAs that are nearly universally trusted. To make the Subversion client automatically trust these standard authorities, set the ssl-trust-default-ca variable to true.

Client certificate challenge

If the client receives a challenge for a certificate, the server is asking the client to prove its identity. The client must send back a certificate signed by a CA that the server trusts, along with a challenge response which proves that the client owns the private key associated with the certificate. The private key and certificate are usually stored in an encrypted format on disk, protected by a passphrase. When Subversion receives this challenge, it will ask you for the path to the encrypted file and the passphrase that protects it:

$ svn list https://host.example.com/repos/project

Authentication realm: https://host.example.com:443
Client certificate filename: /path/to/my/cert.p12
Passphrase for '/path/to/my/cert.p12':  ********

Notice that the client credentials are stored in a .p12 file. To use a client certificate with Subversion, it must be in PKCS#12 format, which is a portable standard. Most web browsers are able to import and export certificates in that format. Another option is to use the OpenSSL command-line tools to convert existing certificates into PKCS#12.

The runtime servers file also allows you to automate this challenge on a per-host basis. If you set the ssl-client-cert-file and ssl-client-cert-password variables, Subversion can automatically respond to a client certificate challenge without prompting you:

[groups]
examplehost = host.example.com

[examplehost]
ssl-client-cert-file = /path/to/my/cert.p12
ssl-client-cert-password = somepassword

More security-conscious folk might want to exclude ssl-client-cert-password to avoid storing the passphrase in the clear on disk.

Extra Goodies

We've covered most of the authentication and authorization options for Apache and mod_dav_svn. But there are a few other nice features that Apache provides.

Repository browsing

One of the most useful benefits of an Apache/WebDAV configuration for your Subversion repository is that your versioned files and directories are immediately available for viewing via a regular web browser. Since Subversion uses URLs to identify versioned resources, those URLs used for HTTP-based repository access can be typed directly into a web browser. Your browser will issue an HTTP GET request for that URL; based on whether that URL represents a versioned directory or file, mod_dav_svn will respond with a directory listing or with file contents.

URL syntax

If the URLs do not contain any information about which version of the resource you wish to see, mod_dav_svn will answer with the youngest version. This functionality has the wonderful side effect that you can pass around Subversion URLs to your peers as references to documents, and those URLs will always point at the latest manifestation of that document. Of course, you can even use the URLs as hyperlinks from other web sites, too.

As of Subversion 1.6, mod_dav_svn supports a public URI syntax for examining older revisions of both files and directories. The syntax uses the query string portion of the URL to specify either or both of a peg revision and operative revision, which Subversion will then use to determine which version of the file or directory to display to your web browser. Add the query string name/value pair p=PEGREV, where PEGREV is a revision number, to specify the peg revision you wish to apply to the request. Use r=REV, where REV is a revision number, to specify an operative revision.

For example, if you wish to see the latest version of a README.txt file located in your project's /trunk, point your web browser to that file's repository URL, which might look something like the following:

http://host.example.com/repos/project/trunk/README.txt

If you now wish to see some older version of that file, add an operative revision to the URL's query string:

http://host.example.com/repos/project/trunk/README.txt?r=1234

What if the thing you're trying to view no longer exists in the youngest revision of the repository? That's where a peg revision is handy:

http://host.example.com/repos/project/trunk/deleted-thing.txt?p=321

And of course, you can combine peg revision and operative revision specifiers to fine-tune the exact item you wish to view:

http://host.example.com/repos/project/trunk/renamed-thing.txt?p=123&r=21

The previous URL would display revision 21 of the object which, in revision 123, was located at /trunk/renamed-thing.txt in the repository. See the section called “Peg and Operative Revisions” for a detailed explanation of these peg revision and operative revision concepts. They can be a bit tricky to wrap your head around.

As a reminder, this feature of mod_dav_svn offers only a limited repository browsing experience. You can see directory listings and file contents, but no revision properties (such as commit log messages) or file/directory properties. For folks who require more extensive browsing of repositories and their history, there are several third-party software packages which offer this. Some examples include ViewVC (http://viewvc.tigris.org), Trac (http://trac.edgewall.org) and WebSVN (http://websvn.info). These third-party tools don't affect mod_dav_svn's built-in browseability, and generally offer a much wider set of features, including the display of the aforementioned property sets, display of content differences between file revisions, and so on.

Proper MIME type

When browsing a Subversion repository, the web browser gets a clue about how to render a file's contents by looking at the Content-Type: header returned in Apache's response to the HTTP GET request. The value of this header is some sort of MIME type. By default, Apache will tell the web browsers that all repository files are of the default MIME type, typically text/plain. This can be frustrating, however, if a user wishes repository files to render as something more meaningful—for example, it might be nice to have a foo.html file in the repository actually render as HTML when browsing.

To make this happen, you need only to make sure that your files have the proper svn:mime-type set. We discuss this in more detail in the section called “File Content Type”, and you can even configure your client to automatically attach proper svn:mime-type properties to files entering the repository for the first time; see the section called “Automatic Property Setting”.

Continuing our example, if one were to set the svn:mime-type property to text/html on file foo.html, Apache would properly tell your web browser to render the file as HTML. One could also attach proper image/* MIME-type properties to image files and ultimately get an entire web site to be viewable directly from a repository! There's generally no problem with this, as long as the web site doesn't contain any dynamically generated content.

Customizing the look

You generally will get more use out of URLs to versioned files—after all, that's where the interesting content tends to lie. But you might have occasion to browse a Subversion directory listing, where you'll quickly note that the generated HTML used to display that listing is very basic, and certainly not intended to be aesthetically pleasing (or even interesting). To enable customization of these directory displays, Subversion provides an XML index feature. A single SVNIndexXSLT directive in your repository's Location block of httpd.conf will instruct mod_dav_svn to generate XML output when displaying a directory listing, and to reference the XSLT stylesheet of your choice:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNIndexXSLT "/svnindex.xsl"
  …
</Location>

Using the SVNIndexXSLT directive and a creative XSLT stylesheet, you can make your directory listings match the color schemes and imagery used in other parts of your web site. Or, if you'd prefer, you can use the sample stylesheets provided in the Subversion source distribution's tools/xslt/ directory. Keep in mind that the path provided to the SVNIndexXSLT directory is actually a URL path—browsers need to be able to read your stylesheets to make use of them!

Listing repositories

If you're serving a collection of repositories from a single URL via the SVNParentPath directive, then it's also possible to have Apache display all available repositories to a web browser. Just activate the SVNListParentPath directive:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNListParentPath on
  …
</Location>

If a user now points her web browser to the URL http://host.example.com/svn/, she'll see a list of all Subversion repositories sitting in /var/svn. Obviously, this can be a security problem, so this feature is turned off by default.

Apache logging

Because Apache is an HTTP server at heart, it contains fantastically flexible logging features. It's beyond the scope of this book to discuss all of the ways logging can be configured, but we should point out that even the most generic httpd.conf file will cause Apache to produce two logs: error_log and access_log. These logs may appear in different places, but are typically created in the logging area of your Apache installation. (On Unix, they often live in /usr/local/apache2/logs/.)

The error_log describes any internal errors that Apache runs into as it works. The access_log file records every incoming HTTP request received by Apache. This makes it easy to see, for example, which IP addresses Subversion clients are coming from, how often particular clients use the server, which users are authenticating properly, and which requests succeed or fail.

Unfortunately, because HTTP is a stateless protocol, even the simplest Subversion client operation generates multiple network requests. It's very difficult to look at the access_log and deduce what the client was doing—most operations look like a series of cryptic PROPPATCH, GET, PUT, and REPORT requests. To make things worse, many client operations send nearly identical series of requests, so it's even harder to tell them apart.

mod_dav_svn, however, can come to your aid. By activating an operational logging feature, you can ask mod_dav_svn to create a separate log file describing what sort of high-level operations your clients are performing.

To do this, you need to make use of Apache's CustomLog directive (which is explained in more detail in Apache's own documentation). Be sure to invoke this directive outside your Subversion Location block:

<Location /svn>
  DAV svn
  …
</Location>

CustomLog logs/svn_logfile "%t %u %{SVN-ACTION}e" env=SVN-ACTION

In this example, we're asking Apache to create a special logfile, svn_logfile, in the standard Apache logs directory. The %t and %u variables are replaced by the time and username of the request, respectively. The really important parts are the two instances of SVN-ACTION. When Apache sees that variable, it substitutes the value of the SVN-ACTION environment variable, which is automatically set by mod_dav_svn whenever it detects a high-level client action.

So, instead of having to interpret a traditional access_log like this:

[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/vcc/default HTTP/1.1" 207 398
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/bln/59 HTTP/1.1" 207 449
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc HTTP/1.1" 207 647
[26/Jan/2007:22:25:29 -0600] "REPORT /svn/calc/!svn/vcc/default HTTP/1.1" 200 607
[26/Jan/2007:22:25:31 -0600] "OPTIONS /svn/calc HTTP/1.1" 200 188
[26/Jan/2007:22:25:31 -0600] "MKACTIVITY /svn/calc/!svn/act/e6035ef7-5df0-4ac0-b811-4be7c823f998 HTTP/1.1" 201 227
…

you can peruse a much more intelligible svn_logfile like this:

[26/Jan/2007:22:24:20 -0600] - get-dir /tags r1729 props
[26/Jan/2007:22:24:27 -0600] - update /trunk r1729 depth=infinity
[26/Jan/2007:22:25:29 -0600] - status /trunk/foo r1729 depth=infinity
[26/Jan/2007:22:25:31 -0600] sally commit r1730

In addition to the SVN-ACTION environment variable, mod_dav_svn also populates the SVN-REPOS and SVN-REPOS-NAME variables, which carry the filesystem path to the repository and the basename thereof, respectively. You might wish to include references to one or both of these variables in your CustomLog format string, too, especially if you are combining usage information from multiple repositories into a single log file.

For an exhaustive list of all actions logged, see the section called “High-level Logging”.

Write-through proxying

One of the nice advantages of using Apache as a Subversion server is that it can be set up for simple replication. For example, suppose that your team is distributed across four offices around the globe. The Subversion repository can exist only in one of those offices, which means the other three offices will not enjoy accessing it—they're likely to experience significantly slower traffic and response times when updating and committing code. A powerful solution is to set up a system consisting of one master Apache server and several slave Apache servers. If you place a slave server in each office, users can check out a working copy from whichever slave is closest to them. All read requests go to their local slave. Write requests get automatically routed to the single master server. When the commit completes, the master then automatically pushes the new revision to each slave server using the svnsync replication tool.

This configuration creates a huge perceptual speed increase for your users, because Subversion client traffic is typically 80–90% read requests. And if those requests are coming from a local server, it's a huge win.

In this section, we'll walk you through a standard setup of this single-master/multiple-slave system. However, keep in mind that your servers must be running at least Apache 2.2.0 (with mod_proxy loaded) and Subversion 1.5 (mod_dav_svn).

Configure the servers

First, configure your master server's httpd.conf file in the usual way. Make the repository available at a certain URI location, and configure authentication and authorization however you'd like. After that's done, configure each of your slave servers in the exact same way, but add the special SVNMasterURI directive to the block:

<Location /svn>
  DAV svn
  SVNPath /var/svn/repos
  SVNMasterURI http://master.example.com/svn
  …
</Location>

This new directive tells a slave server to redirect all write requests to the master. (This is done automatically via Apache's mod_proxy module.) Ordinary read requests, however, are still serviced by the slaves. Be sure that your master and slave servers all have matching authentication and authorization configurations; if they fall out of sync, it can lead to big headaches.

Next, we need to deal with the problem of infinite recursion. With the current configuration, imagine what will happen when a Subversion client performs a commit to the master server. After the commit completes, the server uses svnsync to replicate the new revision to each slave. But because svnsync appears to be just another Subversion client performing a commit, the slave will immediately attempt to proxy the incoming write request back to the master! Hilarity ensues.

The solution to this problem is to have the master push revisions to a different <Location> on the slaves. This location is configured to not proxy write requests at all, but to accept normal commits from (and only from) the master's IP address:

<Location /svn-proxy-sync>
  DAV svn
  SVNPath /var/svn/repos
  Order deny,allow
  Deny from all
  # Only let the server's IP address access this Location:
  Allow from 10.20.30.40
  …
</Location>
Set up replication

Now that you've configured your Location blocks on master and slaves, you need to configure the master to replicate to the slaves. This is done the usual way— using svnsync. If you're not familiar with this tool, see the section called “Repository Replication” for details.

First, make sure that each slave repository has a pre-revprop-change hook script which allows remote revision property changes. (This is standard procedure for being on the receiving end of svnsync.) Then log into the master server and configure each of the slave repository URIs to receive data from the master repository on the local disk:

$ svnsync init http://slave1.example.com/svn-proxy-sync file:///var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave2.example.com/svn-proxy-sync file:///var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave3.example.com/svn-proxy-sync file:///var/svn/repos
Copied properties for revision 0.

# Perform the initial replication

$ svnsync sync http://slave1.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave2.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave3.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

After this is done, we configure the master server's post-commit hook script to invoke svnsync on each slave server:

#!/bin/sh
# Post-commit script to replicate newly committed revision to slaves

svnsync sync http://slave1.example.com/svn-proxy-sync > /dev/null 2>&1 &
svnsync sync http://slave2.example.com/svn-proxy-sync > /dev/null 2>&1 &
svnsync sync http://slave3.example.com/svn-proxy-sync > /dev/null 2>&1 &

The extra bits on the end of each line aren't necessary, but they're a sneaky way to allow the sync commands to run in the background so that the Subversion client isn't left waiting forever for the commit to finish. In addition to this post-commit hook, you'll need a post-revprop-change hook as well so that when a user, say, modifies a log message, the slave servers get that change also:

#!/bin/sh
# Post-revprop-change script to replicate revprop-changes to slaves

REV=${2}
svnsync copy-revprops http://slave1.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1 &
svnsync copy-revprops http://slave2.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1 &
svnsync copy-revprops http://slave3.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1 &

The only thing we've left out here is what to do about user-level locks (of the svn lock variety). Locks are enforced by the master server during commit operations; but all the information about locks is distributed during read operations such as svn update and svn status by the slave server. As such, a fully functioning proxy setup needs to perfectly replicate lock information from the master server to the slave servers. Unfortunately, most of the mechanisms that one might employ to accomplish this replication fall short in one way or another[52]. Many teams don't use Subversion's locking features at all, so this may be a nonissue for you. Sadly, for those teams which do use locks, we have no recommendations on how to gracefully work around this shortcoming.

Caveats

Your master/slave replication system should now be ready to use. A couple of words of warning are in order, however. Remember that this replication isn't entirely robust in the face of computer or network crashes. For example, if one of the automated svnsync commands fails to complete for some reason, the slaves will begin to fall behind. For example, your remote users will see that they've committed revision 100, but then when they run svn update, their local server will tell them that revision 100 doesn't yet exist! Of course, the problem will be automatically fixed the next time another commit happens and the subsequent svnsync is successful—the sync will replicate all waiting revisions. But still, you may want to set up some sort of out-of-band monitoring to notice synchronization failures and force svnsync to run when things go wrong.

Other Apache features

Several of the features already provided by Apache in its role as a robust web server can be leveraged for increased functionality or security in Subversion as well. The Subversion client is able to use SSL (the Secure Sockets Layer, discussed earlier). If your Subversion client is built to support SSL, it can access your Apache server using https:// and enjoy a high-quality encrypted network session.

Equally useful are other features of the Apache and Subversion relationship, such as the ability to specify a custom port (instead of the default HTTP port 80) or a virtual domain name by which the Subversion repository should be accessed, or the ability to access the repository through an HTTP proxy.

Finally, because mod_dav_svn is speaking a subset of the WebDAV/DeltaV protocol, it's possible to access the repository via third-party DAV clients. Most modern operating systems (Win32, OS X, and Linux) have the built-in ability to mount a DAV server as a standard network shared folder. This is a complicated topic, but also wondrous when implemented. For details, read Appendix C, WebDAV and Autoversioning.

Note that there are a number of other small tweaks one can make to mod_dav_svn that are too obscure to mention in this chapter. For a complete list of all httpd.conf directives that mod_dav_svn responds to, see the section called “Directives”.



[48] They really hate doing that.

[51] While self-signed certificates are still vulnerable to a man-in-the-middle attack, such an attack is much more difficult for a casual observer to pull off, compared to sniffing unprotected passwords.