The ASF Content Management System (ASF CMS) is used to edit, manage, and deploy web content for many Apache projects. Please see ASF Content Management System for general information about the design, layout, and motivation for the CMS, and read on here for a CMS Reference.

A video tutorial on how to use the CMS is available at http://s.apache.org/cms-tutorial.

There is a list of Apache projects using the CMS to manage their websites. The CMS wiki page contains additional useful information.

Markdown format used by CMS pages

The mdtext files CMS users edit are converted to html for publishing with python markdown.

The general markup syntax is described in http://daringfireball.net/projects/markdown/syntax

The following extensions are enabled for the CMS, which extend the markdown syntax available:

The extensions are part of the 'Markdown Extra' syntax documented in http://michelf.com/projects/php-markdown/extra/

Information flow

A user creates a mdtext file with the webclient in his/her own workspace, or in a working copy of the site's source. After a commit the following stages occur:

Staging

Staging is initiated by a commit to the svn repository.

  • the mdtext file is converted to html by the build and written to the staging site.
  • the converted html file is committed from the staging site to the staging repo.

Publishing

Publishing is initiated by clicking the [publish] link in the webclient (refresh page if using back button from your webclient), or by running http://s.apache.org/cms-cli from the command line.

  • the changes from the staging site are cheap-copied into the production URL by the POST to the publish link.
  • subsequently svnpubsub notifies the main webserver of the change and the webserver svn up's the site.

Control Flow of the Standard Build System

At a high level, the CMS is driven by a number of "views" that generate content, a process for selecting a "view" for a given changed file, and an optional set of additional named parameters to pass to the "view" so that "view" logic can be reused more easily.

Simply put, a "view" is a Perl method that is given the name of the changed file and expected to generate the actual HTML. All view methods are defined in the lib/view.pm file or in a base class thereof. Yes, essentially this means projects "bring their own code" to generate content in this CMS- all by itself the CMS doesn't "know" how to do that for you. Fortunately this isn't a difficult task and many other projects have managed to carry it out successfully, so a basic familiarity with Perl and some study of the code of existing sites will get you what you need.

The process of selecting the correct view method is achieved through setting up an array of regular expressions (patterns) to match the desired paths that should be used for a particular view method. This is done in the lib/path.pm file, which should be thought of as the "build's config file" for the site.

When configuring the pattern-to-view pair in the lib/path.pm file, it is also possible to supply an optional Perl hashref of named parameters which will be passed to the view method allowing you to alter the view method logic as desired and more easily reuse view methods.

Detailed walkthrough

When a commit triggers a build, the script that gets invoked is build_svn.pl which will update the local source tree checkout and keep track of what files have changed. Depending on what has changed, that script will call out to either build_site.pl or build_file.pl which will build the corresponding resources. In either case, the core logic for building changed source files within content/ is documented in the next section.

Note: none of the build scripts run with taint checks -T enabled, because typically none of the code is being exposed to third-party data. Everything other than ASF::Value generated objects revolves around committer-approved material, not end-user input, so taint-checks are unnecessary. As those ASF::Value objects aren't typically evaluated for their data until template-execution time, they should not pose any additional risks to committer-authored Perl code. Furthermore the CMS build host is tightly locked down with both egress and ingress firewalls to ensure no uncontrolled data winds up being processed by a build.

Core logic for building files

The core logic for building individual files, common to both build_site.pl and build_file.pl, is contained in the following block of Perl:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
for my $p (@path::patterns) {
    my ($re, $method, $args) = @$p;
    next unless $path =~ $re;
    my $sub = view->can($method) or die "Can't locate method: $method\n";
    my ($content, $ext) = $sub->(path => $path, %$args);
    $src = "$target_file.$ext";
    open my $fh, ">", "$target_base/$src"
        or die "Can't open $target_base/$src: $!\n";
    print $fh $content;
    $matched = 1;
    last;
}
unless ($matched) {
    copy_if_newer $src, "$target_base/$src";
}

The overall gist of this code block is that it will loop over the @patterns array in lib/path.pm looking for an $re (first item in a @patterns entry) regex match for the given source $path (rooted in content/). When it finds a matching entry (on line 3), it does an OO-lookup (on line 4) for the corresponding $method (second item in a @patterns entry) name in lib/view.pm and subsequently invokes that subroutine (on line 5) with the matching $path (a Perl => is called a 'fat-comma' and simply wraps quotes around its left-side argument, converting barewords into strings) and additional %$args (third item in a @patterns entry) configured from the matching line in @patterns. The generated $content is then written to the target file (on lines 7-9) with the adjusted extension (line 6). It stops looping after the first match, but if no matching @patterns entry is found, the $src file is simply copied verbatim (on line 14) to the target tree. Since the @patterns array is processed in order, it is important that more specific patterns appear earlier than less specific ones.

Typically the view $method will pass along the %$args to the template responsible for rendering the resulting content - in fact most view methods do nothing else other than enrich their arguments with suitably generated data to pass along to the template. The upshot of this is that it makes reuse of the $method code in lib/view.pm much more feasible as different paths within content/ can share the same $method by supplying different %$args in the @patterns array in lib/path.pm- all you'd need to do is ensure the underlying template (which itself can be configured as an argument in %$args) recognizes the various arguments. In fact the whole point of the ASF::Value:: packages is to provide Apache-related arguments, representing cheaply-constructed (lib/path.pm is expected to load itself quickly since it is required for all builds), lazily-evaluated, dynamic template object variables, from an %$args named parameter in @patterns.

Examples

First example

our @patterns = (
    [ qr!^/foo/.*\.mdtext$!, single_narrative => {
         header => "foo-top.html",
         footer => "foo-bottom.html",
       template => "skeleton.html",
    } ],
...
);

This lib/path.pm array specifies that for a given content/foo dir, all markdown files will use the single_narrative view method with the following arguments passed to it: header, footer, and template. Typically the single_narrative view is a generic view that can be cribbed from various active view.pm files, else it can be derived from the ASF::View base class. The point here is that all the action is specified in the arguments: custom header/footer template files that are recognized by templates/skeleton.html perhaps via {% include header %} and {% include footer %} directives. The header and footer templates would naturally reside within templates/ alongside the skeleton.html main template. Similar functionality can provide for custom navigation.

Second example

in view.pm:
sub dir_wrapper {
    my %args = @_;
    my $dir = $1; # here we expect the regexp in @patterns to capture the proper path component
    my ($nav, $hdr, $ftr) = map "$dir/$_.html", qw/navigation header footer/;
    return view->can($args{view})->(dir => $dir, navigation => $nav, header => $hdr, footer => $ftr, %args);
}

in path.pm:
our @patterns = (
   [ qr!/([^/]+)/.*\.mdtext$!, dir_wrapper => {
        view => "single_narrative",
        template => "subdir_skeleton.html", # perhaps this simply extends "skeleton.html"?
        footer => "footer.html", # overrides the dir_wrapper-generated "foo/footer.html" with a base footer.html for all subdirs
    } ]
);

This code specifies a dir_wrapper view method that augments the %args with additional variables to the specified view method. The way it's being used here is to invoke the single_narrative view, which will simply pass those arguments along to the configured template. The dir argument simply specifies the top-level subdirectory of a given path within content/, whereas navigation, header, and footer will be interpreted as corresponding paths one-subdir deep within templates. These arguments are overridable by specifying them in the original %$args hashref in @patterns as is done here with the footer.

Setting up your content

Easiest way to get setup quickly is to use the boilerplate site setup. It has the right files in the right structure. Just export it and add it to svn.

svn export https://svn.apache.org/repos/infra/websites/cms/template site
svn add site
svn ci site -m "initial setup for the cms"

The location of your 'site' dir in svn is what Infra needs to get you setup in the CMS. Make sure to include this in your JIRA when requesting to use the CMS.

Once setup, you'll want to change the site/trunk/templates/skeleton.html file to use the look and feel you want for your project.

Incubator podlings will need to make adjustments to the setup (mainly placing all content within content/$project) to comply with the layout for incubating projects.

Note that you will need to file a JIRA on the INFRA project requesting that your micro-site be incorporated into the main web-site or under the incubator site. You need to include a pointer to your site in SVN and the desired base URL.

Once your site has been setup, you need to followup by making a trivial change to one of your perl modules in /lib to trigger a full site build. After a successful full build, proceed to publishing the site using the cms webgui.

Organization of the webgui pages

For those of you who intend to go poking around in the CMS webgui, a little primer on the link organization is in order. There are essentially 2 main sets of links- those that come from the browse view and those that come from the edit view. The browse links are there to give you different perspectives on the content, whereas the edit links are there to let you alter it somehow. When viewing a directory the only differences between the browse and edit views are the sets of links presented, so it can sometimes be confusing which of those two views is operational. Just look for browse or edit in the title to keep them straight- eventually you will get used to recognizing the different link collections.

When you act on some content by POSTing form data, you will be given a brief list of links to select from. The links are ordered loosely in the order you are expected to click on them, and the UI isn't shy about expecting you to use your browser's back button to return to the CMS. One of the upsides of REST compliance is that the site is designed to facilitate state management in your web browser, not through server-side contraptions. Feel free to navigate back and forth to various pages you have visited and reuse them (with whatever embedded state they contain) at will.

Single REST exception regarding commits

There is a single exception here that you should be aware of: if you POST form data to commit something, and your server-side working copy has a stale version of that resource, the CMS will automatically update your working copy and retry the commit- if this happens your prior editing session will be out of sync with the content you actually committed as some external commit information may have been merged in the process.

The CMS will indicate this has happened by notifying you it had to merge external content on the post-commit page. Hence avoid going back to a prior editing session once you've committed and simply use the [Edit] link provided on the post-commit view to ensure you are working with the latest version of the resource. If you are really in the habit of going back anyway at this point, when you see the merge notification message do yourself and your project a favor and refresh the edit page before you start hacking on the content again.

Note also that anonymous working copies will be updated daily between 0:00 and 0:15 UTC which can cause non-REST like behavior if the files on the server get updated while an anonymous user is working on them. However this tradeoff is worthwhile from the standpoint of anonymous users who, while being infrequent users, tend to expect the content is current over expecting it to be fully REST compliant.

FAQ - Anonymous/Non-Committer Use

I'm not a committer on $project, which uses the CMS. Can I still use the CMS?

If you are currently an Apache committer and want to submit a patch to the $project dev list using the CMS, you can do that right now. Simply use the bookmarklet and pull up an Edit session for the page in question and submit your changes (avoiding Quick Commit). Then view the subsequent Diff link, which will provide a link to the Mail Diff feature. Fill in the form and your changes will be mailed to any apache.org address you choose (but only one address per mailout please).

More enlightened projects like Apache Lucy already grant any Apache committer write-access to their site's source tree (but maintain tighter controls for publication), so go ahead and try committing first before bothering with the Mail Diff feature.

Non-committers can still checkout the site in svn and submit patches to the list. Otherwise they may access the CMS using username anonymous and an empty password and use the Mail Diff feature (or Quick Mail) mentioned above to send patches to the list from the CMS. Anonymous working copies created in the CMS may be cloned by any project committer, so it is relatively trivial for committers to apply such diffs using the CMS webgui. The workflow of an anonymous user follows the typical workflow of a committer- the only difference is that they'll be mailing off diffs to the dev list instead of directly committing their changes. In other words they follow the Review-Then-Commit model, whereas committers follow Commit-Then-Review. In particular the first thing a non-committer should do is install the bookmarklet using the anonymous passwordless account and start browsing the project's live site looking for pages to change.

A video tutorial by Rob Weir for anonymous users is available at http://s.apache.org/cms-anonymous-tutorial.

NOTICE: unless indicated otherwise on the pages in question, all editable content available from cms.apache.org is presumed to be licensed under the Apache License (AL) version 2.0 and hence all submissions to cms.apache.org treated as formal Contributions under the license terms.

Information for committers working with anonymous contributions

Do keep in mind however that cloning (snapshots of) working copies creates an unmanaged dependency on that working copy, so if there comes a time to destroy that working copy, a committer's clone will be destroyed as well. If you are a committer planning to do extensive editing, have recently (within the past 3 days) worked on a cloned working copy, and would like to avoid this situation, you can "force" a new working copy (hence destroying an existing clone) by visiting https://cms.apache.org/ and clicking on the appropriate link. You can tell if your working copy was cloned from another user if the url has another user's id in it right after yours. Committers should also be aware of the fact that cloning a working copy necessarily involves XSS risks, so you should carefully review the relevant diffs (e.g. by searching diff pages for "script", "src" and "href" entries) to ensure nothing nefarious is being introduced into your browser (though there is little available to a hacker to exploit as none of the cookies from cms.apache.org contain sensitive information). In particular be wary of any unusual or unexpected password prompts appearing while visiting a cloned working copy- you won't be prompted for your password again by the CMS once you have first logged in, unless you have changed your LDAP password elsewhere during your browser session. If you do see an anomalous password prompt, it probably wasn't from us, so be sure to report it to your dev list and/or infrastructure@ - the normal cleanup procedure involves nuking the bogus working copy either by sysadmin intervention or by someone logging in as anonymous and forcing a new working copy (which achieves the same effect).

I'm having trouble committing changes from an anonymous clone after applying changes which added files or directories.

The workflow is slightly more complicated when dealing with anonymous user additions to the site, as opposed to modifications of existing pages. Once the anonymous user notices their additions have been applied, they need to [Revert] and [Update] those resources (in either order) before continuing on with any further edits, otherwise svn will mark their tree as conflicted and prevent future commits to that subtree from any subsequent clones.

FAQ - Editing Tools

What is Quick Commit and why would I use it?

If you are making a simple edit to a single page, Quick Commit will save you the trouble, after you've submitted your edits, of pulling up the Commit screen, filling in the Log Message, and hitting Submit. It is a convenient feature that probably should be used for 80% of your work in the CMS.

Why doesn't the bookmarklet take me directly to the Edit screen for a page?

The default CMS bookmarklet takes you to a Browse screen for 2 reasons:

  1. to get people familiar with the features of the CMS
  2. because the bookmarklet sometimes doesn't take you to the resource you want.

An example of 2 is the following url: http://www.apache.org/licenses/LICENSE-2.0. Instead of taking you to the .mdtext representation it takes you to the .txt one, because httpd's mod_negotiation prefers the shorter representation when other factors are equal.

However if you still prefer to have the bookmarklet take you to the Edit screen, simply hack the bookmarklet to add an action=edit parameter to the query string. Like this for example: ASF CMS.

Non-committers can still use the bookmarklet.

How come the publish link doesn't show diffs of newly added files?

svn diff doesn't show diffs of copied files, which is what the [Publish Site] link does with added files. There is an option available in subversion 1.7 that enables diffs of copied files.

As the CMS is a REST application, no HTTP GET operations change server state- you will have to POST form data for that to happen. Hence any links presented in the webgui are safe to click on. The Static links just provide you a raw view of the underlying source files. Staged shows the current view of the file on the staging site, and Production shows the current view of the file on the live production site.

How do I force the CMS to do a full site build?

Simply make a trivial change to a file in the templates/ or lib/ directory.

How do I add a new file or directory to my site?

To add a resource you first edit a directory by clicking on the Edit link to a directory. Then you add the name of the new file or directory to the form input field at the top of the page; directories are signified by a trailing '/' in their name. Hit enter and a form will be provided to create the new file or directory.

What revision of the site is currently being served?

This answer pertains to any svnpubsub site, not only to CMS site, and as such is given in the ASF svnpubsub instance documentation.

We've completely restructured our site and now there are a lot of leftover files on the live site. How do we clean things up?

Simply delete the content and cgi-bin subdirs of your staging tree within https://svn.apache.org/repos/infra/websites/staging and THEN trigger a full site build. Order counts- please do not delete those directories while a build is underway! Publishing your changes will clean things up on the live site.

FAQ - Markdown and Content Questions

What version of Django's template library does the CMS use?

The CMS uses a fork of Dotiac::DTL version 0.8, which seems to implement this version of django templates.

How do I use the code highlighter?

Some questions have arisen regarding the syntax hilighter on code blocks. Basically the guessing algorithm isn't too great, so to give it help explicitly tell it which lexer to use from the list of short names here:

http://pygments.org/docs/lexers/

What you do is prefix your code/text block with an indented ":::shortname" or "#!shortname", where shortname is the name of the lexer to use. Often you want to use ":::text" just to tell the codehilighter to do nothing special to the indented block. The ":::shortname" variant will not supply line numbers whereas the "#!shortname" variant will.

How come my nested lists don't render properly?

The python implementation of markdown insists on 4-space indents per nesting level.

How do I verify the markdown -> html conversion by the python implementation, before committing my changes?

All you need to do is Browse your changes after submitting your edits (in this case you want to avoid using Quick Commit).

The main idea is to {% include navigation %} in your site template, where navigation is a Django variable name. How you construct this variable and pass it along to the site template is up to you- you could break out separate paths in path.pm and add the per-path navigation to the argument list of each path. Alternately you could use view.pm to process the path and generate a navigation argument using regexp captures from path.pm. See the first and second examples for additional details.

Note: %path::dependencies provides support for page dependencies that may be used to reflect navigation links for things like directory index pages. These are made available as a sorted arrayref in the deps template argument for the standard ASF::View views . Loops in the dependency graph are supported with careful application of the quick_deps argument to short-circuit the full dependency builds. Complex dependencies can be created through the walk_content_tree sub in ASF::Util.

The /mail URL is implemented by a symlink to /home/apmail/public-arch/$project.apache.org. As the CMS does not natively support the use of symlinks, you will need to add this symlink to your production site and add a corresponding line: "mail", to the extpaths.txt file in your source tree (mentioned earlier). Here are the manual steps:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
% svn co --depth=immediates https://svn.apache.org/repos/infra/websites/production/$project/content
% cd content
% ln -s /home/apmail/public-arch/$project.apache.org mail
% svn add mail
% svn commit -m 'add mail link' mail
% cd ..
% rm -rf content
% svn co --depth=immediates https://svn.apache.org/repos/asf/$project/site/trunk/content
% cd content
% echo mail >> extpaths.txt
% svn add extpaths.txt
% svn commit -m 'add mail to extpaths.txt' extpaths.txt

If you have time you may continue on to publish the extpaths.txt file, but it isn't strictly required so long as the document correctly appears on the staging site.

FAQ - Build Tools

How do I run the CMS build scripts?

There are two separate folders to be aware of when running the build scripts. The first is the root folder of your CMS site that stores your content, the folder which has "content" and "cgi-bin" as immediate child directories. For example, for the incubating Apache JSPWiki, it would be http://svn.apache.org/repos/asf/incubator/jspwiki/site/trunk/. The second folder is the one housing the build scripts used to build your site, which can be checked out from https://svn.apache.org/repos/infra/websites/cms/build/ and stored wherever desired on your local computer.

The Perl script that generates the *.html site files from *.mdtext is build_site.pl in the build scripts folder. The build_site.pl script has some Python dependencies. The easiest way to install these is to use the Python setuptools. First check your version of Python python --version and follow the instructions to install the appropriate set of tools from here http://pypi.python.org/pypi/setuptools.

After the python setuptools are installed, other python dependencies can be installed as follows:

$ sudo easy_install Pygments
$ sudo easy_install ElementTree (Not completely sure that this is needed)
$ sudo easy_install Markdown

Depending on your Perl installation, you may need to install some CPAN modules first. A few users have had to install the following CPAN modules due to the use of the ASF::Value module:

  • install XML::Atom::Feed
  • install XML::RSS::Parser::Lite
  • install XML::Parser::Lite
  • install YAML::XS
  • install SVN::Client

You may also need to do a force install depending on your setup.

Once everything is installed, navigate to the root folder of your CMS site and run the following command:

$ export MARKDOWN_SOCKET=`pwd`/markdown.socket PYTHONPATH=`pwd`

Next, navigate back to the build script directory (preferably in the same command window, as the above environment variables must be set) and run:

$ python markdownd.py

to get the markdown processing daemon up. Note: the local mdx_elementid.py markdown extension we use has now been updated to py-markdown 2.x, so earlier versions of the markdown module won't work any more.

Next, you'll need to update your PERL5LIB environment variable to point to the lib subdirectory of the CMS content root folder (not the lib in the build scripts directory). (If this is not set properly you may get "Can't locate path.pm in @INC" error messages when running build_site.pl.) Enter perl -V from a command line to make sure this directory is available.

Then finally run build_site.pl from the scripts folder, for example:

$ perl build_site.pl --source-base /content/root/folder --target-base /desired/output/folder

How do I use the CMS with an external build system?

Ensure your build system is supported on the cms.apache.org host and supply the proper @system_args to build_external.pl. The following build systems are already supported (@system_args for them has been determined): Ant, Forrest, Maven, and arbitrary build_cms.sh shell scripts.

The default Perl-based build system (view.pm/path.pm) is not considered an external build system.

How do I verify compatibility if I write a client app for the CMS?

Assuming you're using json, you can verify API compatibility by checking https://cms.apache.org/compat?version=$VERSION where $VERSION is the API version of the CMS that your code was written against.

How do I publish generated docs (eg. doxygen)?

First place all your generated docs into a local directory, let's call it foo/. Next create a compressed archive of foo/ by running

$ tar -czf foo.tar.gz foo

Then in the CMS navigate to the directory you'd like to add that directory to. Pull up the Edit screen and type "foo/" into the form field. Then hit enter. Fill in the file upload widget by pointing it at foo.tar.gz, click on "Quick Commit", add a log message, and hit Submit. Wait while the staging site builds the docs, and when that process is completed click Publish Site and Submit. Please note that the name of the local directory you archived must match the name of the directory to be added to the CMS.

The way to maintain this setup is to use the CMS to delete stale doc trees and replace them with fresh versions.

More sophisticated projects should be capable of scripting this into their release packaging process. The requests to the CMS would look something like this (in Perl):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ua = LWP::UserAgent->new(requests_redirectable => ["GET"]);  
$ua->default_header(Accept => "application/json");
$ua->credentials("cms.apache.org:443", "ASF Committers", $user, $passwd);
$response = $ua->head("https://cms.apache.org/redirect?action=delete;uri=$urlencoded_uri_of_website_doctree");
$delete_uri = $response->header("Location")
    or die "Missing Location header";
$delete_uri .= "/" unless $delete_uri =~ m!/$!;
## be careful here, if you screwed up the uri above you could wind up deleting your entire site.
$delete_uri =~ m!/trunk/content/$! and die "Won't delete entire site!";
$response = $ua->post($delete_uri, [ submit => "Submit" ]);
die $response->status_line unless $response->is_success;
$commit_uri = $delete_uri;
$commit_uri =~ s!/delete/!/commit/!;
$response = $ua->post($commit_uri, [ submit => "Submit", message => "deleting stale doc tree" ]);
die $response->status_line unless $response->is_success;
$add_uri = $delete_uri;
$add_uri =~ s!/delete/!/add/!;
## put new doctree into foo.tar.gz
$response = $ua->post($add_uri, Content_Type => 'form-data', Content => [
   submit => "Submit",
   message => "new doctree",
   commit => 1,
   file => [ "foo.tar.gz" ],
]);

Then use the http://s.apache.org/cms-cli script to publish the site (once the staging build has completed).

On the other hand, if you find that after a while of doing this your CMS source tree seems bogged down in generated documentation trees, making dealing with the site unwieldy, there is a handy gadget here: simply create a content/extpaths.txt (or content/resources/extpaths.txt for maven builds) file which lists the paths (relative to your docroot) of the bases of the documentation trees, then after committing that file to your tree you may purge your source tree of those documentation trees and publish your site. The publishing process will keep track of those trees to ensure they remain accessible on your site. If you ever tire of publishing one of those trees simply remove the corresponding line from extpaths.txt, commit and republish. See http://www.apache.org/extpaths.txt for an example+description of the file format. Note that once added to the file, such urls are essentially frozen unless/until either the entry is removed or someone commits a change directly to its corresponding production tree in svn.

What is extpaths.txt?

There are 3 trees in play in the CMS: the source tree in the asf/project/site, the staging tree in infra/websites/staging/project and the production tree in infra/websites/production/project. extpaths.txt lists paths to preserve in the production tree, after they are deleted from the source tree and then subsequently deleted from the staging tree by buildbot.

Is there any way to turn off all these annoying buildbot commit messages appearing on our commit lists?

No- they are a core part of the CMS's internal bookkeeping and cannot be routed elsewhere. However they are not essential reading for projects whose site builds are well-understood, so feel free to route those messages to /dev/null.

Help! I made a massive commit to my CMS source tree and now I cannot use the CMS webgui, not even to publish the site!

The CMS's internal working copies are updated daily; if your working copy has not pulled in the changes you will need to Update your tree, which will take some time depending on the nature of the changes you need to pull in. (Note: you do not need to update your working copy in order to publish the site.)

The CMS's publish operation has been recoded to be based on svnmucc and is now much faster. If you think having massive trees is a problem for svn, investigate svn checkout --depth=immediates and svn update --parents --set-depth=infinity. You can checkout the base dir of a very large tree quickly and simply and modify/commit any files directly contained therein. Let the CMS's build system do its job and worry about managing the full site sources so you don't have to.

Running svn log on a file in the production tree gives too many results!

Don't do that. Run log against either the source site (consulting the cms:source-revision property to determine the associated revision) or the staging/ tree (consulting svn log -v on the production tree to determine the associated staging revision).

Running svn log on the production tree shows entries from staging.

svn log crosses copies, including the copy operation that publish is now implemented by. Run log one level up --- on production/${project} rather than production/${project}/content. (The log entries won't be very interesting; consider running log on the staging tree instead.)