7140764 pkg.sysrepo should support p5p files
authorTim Foster <tim.s.foster@oracle.com>
Wed, 23 May 2012 09:49:43 +1200
changeset 2677 7f1c7dd5254f
parent 2676 4ea2dbd3337e
child 2678 5386f65ff099
7140764 pkg.sysrepo should support p5p files
doc/system_repository.txt
src/man/pkg.sysrepo.1m
src/modules/p5p.py
src/pkg/external_deps.txt
src/pkg/manifests/developer:opensolaris:pkg5.p5m
src/pkg/manifests/package:pkg.p5m
src/pkg/manifests/package:pkg:system-repository.p5m
src/setup.py
src/sysrepo.py
src/tests/api/t_p5p.py
src/tests/cli/t_sysrepo.py
src/util/apache2/sysrepo/README.txt
src/util/apache2/sysrepo/sysrepo_httpd.conf.mako
src/util/apache2/sysrepo/sysrepo_p5p.py
--- a/doc/system_repository.txt	Thu May 17 19:00:24 2012 +0100
+++ b/doc/system_repository.txt	Wed May 23 09:49:43 2012 +1200
@@ -46,21 +46,22 @@
 configuration and image configuration.  Currently, the only image configuration
 it contains is the publisher search order for the provided publishers, but other
 information may be added to the response as needed.  In addition to the basic
-collection of publisher information, the p5s file also contains a list of urls
+collection of publisher information, the p5s file also contains a list of URIs
 which the pkg client should proxy to via the system depot instead of contacting
-them directly.  When creating a p5s file, the urls for origins and mirrors can
-be transformed.  HTTPS urls are transformed to HTTP urls since the system depot
-will be doing the SSL communication, not the pkg client.  File urls are
-transformed into HTTP urls with a special format.  The urls contain the special
-token "<sysrepo>" which the p5s parser knows to replace with the url of the zone
-proxy client.  The rest of the url contains the prefix of the publisher, then
+them directly.  When creating a p5s file, the URI for origins and mirrors can
+be transformed.  HTTPS URIs are transformed to HTTP URIs since the system depot
+will be doing the SSL communication, not the pkg client.  File URIs are
+transformed into HTTP URIs with a special format.  The URIs contain the special
+token "<sysrepo>" which the p5s parser knows to replace with the URI of the zone
+proxy client.  The rest of the URI contains the prefix of the publisher, then
 the sha1 hash of the global zone path to the file repository.
 
 The information for the syspub/0 response comes from the global zone's image's
-configuration.  The pkg/sysrepo service is responsible transforming the image
-configuration into an Apache configuration file and causing the system depot to
-reread its configuration.  The global zone pkg client restarts the pkg/sysrepo
-service whenever the image's publisher configuration changes.
+configuration.  The application/pkg/system-repository service is responsible for
+transforming the image configuration into an Apache configuration file and
+causing the system depot to reread its configuration.  The global zone
+pkg client restarts the application/pkg/system-repository service whenever the
+image's publisher configuration changes.
 
 The Apache configuration file, written by sysrepo.py using the Mako template
 sysrepo_httpd.conf.mako does two things:
@@ -72,10 +73,10 @@
  * adds a series of mod_rewrite RewriteRule and Alias directives to allow
    the Apache instance to gain access to configured file:// publishers.
 
-The Apache urls are accessible from the system repository configured at
+The Apache URIs are accessible from the system repository configured at
 '<sysrepo>' for a given publisher '<publisher>' are below.
 
-We serve static responses to the following URLs, the content being either
+We serve static responses to the following URIs, the content being either
 a known versions file, or the p5s file mentioned above:
 
   http://<sysrepo>/versions/0/
@@ -84,8 +85,8 @@
 For access to file:// repositories, we use rewrite rules and Alias directives to
 access the file repository contents.  In order to allow repositories with
 different paths, yet prevent exposing those paths to system repository clients,
-we use an SHA-1 hash of the path, and include that in the URL (below, this hash
-is represented by "HASH")  The URLs we accept for file:// repositories are:
+we use an SHA-1 hash of the path, and include that in the URI (below, this hash
+is represented by "HASH")  The URIs we accept for file:// repositories are:
 
   http://<sysrepo>/<publisher>/HASH/file/1/<file hash>
   http://<sysrepo>/<publisher>/HASH/manifest/0/<package name@version>
@@ -93,3 +94,13 @@
   http://<sysrepo>/<publisher>/HASH/versions/0
 
 The system publisher also responds to 'OPTIONS * HTTP/1.0' requests.
+
+When we detect that a request to a given <publisher>/HASH refers to a GZ
+publisher within a .p5p archive, that request is rewritten so that a custom WSGI
+application, sysrepo_p5p.py handles the request.  That application accepts URIs
+of the form:
+
+  http://<sysrepo>/wsgi_p5p?pub=<publisher>&hash=HASH&path=<path>
+
+where path is the remainder of the original system repository file:// URI, after
+the <publisher> and HASH components have been removed.
--- a/src/man/pkg.sysrepo.1m	Thu May 17 19:00:24 2012 +0100
+++ b/src/man/pkg.sysrepo.1m	Wed May 23 09:49:43 2012 +1200
@@ -23,7 +23,7 @@
 The system repository is primarily used in the global zone to allow non-global zones to access the repositories configured in the global zone. The SMF services \fBsvc:/application/pkg/zones-proxyd\fR and \fBsvc:/application/pkg/zones-proxy-client\fR are responsible for providing the transport between non-global zones and the global zone. This transport is only used by \fBpkg\fR(5).
 .sp
 .LP
-Note that only http, https, and v4 file repositories are supported. p5p-based file repositories or older file repository formats are not supported. See \fBpkgrepo\fR(1) for more information about repository versions.
+Note that only http, https, p5p and v4 file repositories are supported. Older file repository formats are not supported. See \fBpkgrepo\fR(1) for more information about repository versions.
 .SH OPTIONS
 .sp
 .LP
--- a/src/modules/p5p.py	Thu May 17 19:00:24 2012 +0100
+++ b/src/modules/p5p.py	Wed May 23 09:49:43 2012 +1200
@@ -35,6 +35,7 @@
 import pkg.client.api_errors as apx
 import pkg.client.publisher
 import pkg.fmri
+import pkg.manifest
 import pkg.misc
 import pkg.portable
 import pkg.p5i
@@ -293,7 +294,7 @@
         CURRENT_VERSION = 0
         COMPATIBLE_VERSIONS = (0,)
 
-        def __init__(self, pathname, mode="r"):
+        def __init__(self, pathname, mode="r", archive_index=None):
                 """'pathname' is the absolute path of the archive file to create
                 or read from.
 
@@ -301,6 +302,11 @@
                 opened for reading or writing, which is indicated by 'r' and 'w'
                 respectively.  An archive opened for writing may not be used for
                 any extraction operations, and must not already exist.
+
+                'archive_index', if supplied is the dictionary returned by
+                self.get_index(), allowing multiple Archive objects to be open,
+                sharing the same index object, for efficient use of memory.
+                Using an existing archive_index requires mode='r'.
                 """
 
                 assert os.path.isabs(pathname)
@@ -323,6 +329,8 @@
                 if "w" in mode:
                         # Don't allow overwrite of existing archive.
                         assert not os.path.exists(self.__arc_name)
+                        # Ensure we're not sharing an index object.
+                        assert not archive_index
 
                 try:
                         self.__arc_file = open(self.__arc_name, arc_mode,
@@ -363,6 +371,14 @@
                                 # Archive is empty.
                                 raise InvalidArchive(self.__arc_name)
 
+                        # If we have an archive_index use that and return
+                        # immediately.  We assume that the caller has obtained
+                        # the index from an exising Archive object,
+                        # and will have validated the version of that archive.
+                        if archive_index:
+                                self.__extract_offsets = archive_index
+                                return
+
                         if not member.name.startswith(self.__idx_pfx) or \
                             not member.name.endswith(self.__idx_sfx):
                                 return
@@ -402,9 +418,10 @@
 
                         # Load archive index.
                         try:
-                                self.__index = ArchiveIndex(idxfn, mode="r",
-                                    version=self.__idx_ver)
-                                for name, offset in self.__index.offsets():
+                                self.__index = ArchiveIndex(idxfn,
+                                    mode="r", version=self.__idx_ver)
+                                for name, offset in \
+                                    self.__index.offsets():
                                         self.__extract_offsets[name] = \
                                             index_offset + offset
                         except InvalidArchiveIndex:
@@ -1008,8 +1025,13 @@
                         self.__arc_file.seek(offset)
                         tfile.offset = offset
 
-                        # Get the tarinfo object needed to extract the file.
-                        member = tf.TarInfo.fromtarfile(tfile)
+                        try:
+                                # Get the tarinfo object needed to extract the
+                                # file.
+                                member = tf.TarInfo.fromtarfile(tfile)
+                        except tf.TarError:
+                                # Read error encountered.
+                                raise InvalidArchive(self.__arc_name)
                 elif self.__extract_offsets:
                         # Assume there is no such archive member if extract
                         # offsets are known, but the item can't be found.
@@ -1024,6 +1046,17 @@
                 except KeyError:
                         raise UnknownArchiveFiles(self.__arc_name, [src])
 
+        def get_index(self):
+                """Returns the index, and extract_offsets from an Archive
+                opened in read-only mode, allowing additional Archive objects
+                to reuse the index, in a memory-efficient manner."""
+                assert not self.__closed and "r" in self.__mode
+                if not self.__extract_offsets:
+                        # If the extraction index doesn't exist, scan the
+                        # complete archive and build one.
+                        self.__find_extract_offsets()
+                return self.__extract_offsets
+
         def get_package_file(self, fhash, pub=None):
                 """Returns the first package file matching the given hash as a
                 file-like object. The file-like object is read-only and provides
--- a/src/pkg/external_deps.txt	Thu May 17 19:00:24 2012 +0100
+++ b/src/pkg/external_deps.txt	Wed May 23 09:49:43 2012 +1200
@@ -43,4 +43,5 @@
     pkg:/text/locale
     pkg:/text/tidy
     pkg:/web/server/apache-22
+    pkg:/web/server/apache-22/module/apache-wsgi-26
     pkg:/web/wget
--- a/src/pkg/manifests/developer:opensolaris:pkg5.p5m	Thu May 17 19:00:24 2012 +0100
+++ b/src/pkg/manifests/developer:opensolaris:pkg5.p5m	Wed May 23 09:49:43 2012 +1200
@@ -50,3 +50,4 @@
 depend type=require fmri=pkg:/text/locale
 depend type=require fmri=pkg:/text/tidy
 depend type=require fmri=pkg:/web/server/apache-22
+depend type=require fmri=pkg:/web/server/apache-22/module/apache-wsgi-26
--- a/src/pkg/manifests/package:pkg.p5m	Thu May 17 19:00:24 2012 +0100
+++ b/src/pkg/manifests/package:pkg.p5m	Wed May 23 09:49:43 2012 +1200
@@ -34,9 +34,11 @@
 dir  path=$(PYDIR)
 dir  path=$(PYDIRVP)
 dir  path=$(PYDIRVP)/pkg
+dir  path=$(PYDIRVP)/pkg/64
 file path=$(PYDIRVP)/pkg-0.1-py2.6.egg-info
 file path=$(PYDIRVP)/pkg/__init__.py
 file path=$(PYDIRVP)/pkg/_varcet.so
+file path=$(PYDIRVP)/pkg/64/_varcet.so
 dir  path=$(PYDIRVP)/pkg/actions
 file path=$(PYDIRVP)/pkg/actions/__init__.py
 file path=$(PYDIRVP)/pkg/actions/_actions.so
@@ -55,6 +57,12 @@
 file path=$(PYDIRVP)/pkg/actions/signature.py
 file path=$(PYDIRVP)/pkg/actions/unknown.py
 file path=$(PYDIRVP)/pkg/actions/user.py
+# We ship a 64-bit version of _actions.so because it's
+# needed by the sysrepo_p5p which is a mod_wsgi application
+# that runs in a 64-bit apache instance.
+dir  path=$(PYDIRVP)/pkg/actions/64
+file path=$(PYDIRVP)/pkg/actions/64/_actions.so
+file path=$(PYDIRVP)/pkg/actions/64/_common.so
 file path=$(PYDIRVP)/pkg/altroot.py
 file path=$(PYDIRVP)/pkg/api_common.py
 file path=$(PYDIRVP)/pkg/arch.so
--- a/src/pkg/manifests/package:pkg:system-repository.p5m	Thu May 17 19:00:24 2012 +0100
+++ b/src/pkg/manifests/package:pkg:system-repository.p5m	Wed May 23 09:49:43 2012 +1200
@@ -18,7 +18,7 @@
 #
 # CDDL HEADER END
 #
-# Copyright (c) 2010, 2011, Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2010, 2012, Oracle and/or its affiliates. All rights reserved.
 #
 set name=pkg.fmri value=pkg:/package/pkg/system-repository@$(PKGVERS)
 set name=pkg.summary value="IPS System Repository"
@@ -32,6 +32,7 @@
 dir  path=etc/pkg/sysrepo
 file path=etc/pkg/sysrepo/sysrepo_httpd.conf.mako
 file path=etc/pkg/sysrepo/sysrepo_publisher_response.mako
+file path=etc/pkg/sysrepo/sysrepo_p5p.py pkg.tmp.autopyc=false
 dir  path=lib
 dir  path=lib/svc
 dir  path=lib/svc/manifest
@@ -61,3 +62,5 @@
 # The manual dependency on apache results from our calling apachectl from
 # our method script, and can't be detected by pkgdepend.
 depend type=require fmri=web/server/apache-22
+# p5p support in the system repository requires mod_wsgi
+depend type=require fmri=web/server/apache-22/module/apache-wsgi-26
--- a/src/setup.py	Thu May 17 19:00:24 2012 +0100
+++ b/src/setup.py	Wed May 23 09:49:43 2012 +1200
@@ -335,6 +335,7 @@
         'util/publish/transforms/smf-manifests'
         ]
 sysrepo_files = [
+        'util/apache2/sysrepo/sysrepo_p5p.py',
         'util/apache2/sysrepo/sysrepo_httpd.conf.mako',
         'util/apache2/sysrepo/sysrepo_publisher_response.mako',
         ]
@@ -893,6 +894,8 @@
                 output_dir = os.path.join(cwd, os.path.dirname(output_filename))
                 output_filename = os.path.basename(output_filename)
                 nargs = args[:2] + (output_filename,) + args[3:]
+                if not os.path.exists(output_dir):
+                        os.mkdir(output_dir, 0755)
                 os.chdir(output_dir)
 
                 UnixCCompiler.link(self, *nargs, **kwargs)
@@ -914,9 +917,46 @@
 
         def initialize_options(self):
                 _build_ext.initialize_options(self)
+                self.build64 = False
+
                 if osname == 'sunos':
                         self.compiler = 'myunix'
 
+        def build_extension(self, ext):
+                # Build 32-bit
+                log.info("building 32-bit extension")
+                _build_ext.build_extension(self, ext)
+
+                # Set up for 64-bit
+                old_build_temp = self.build_temp
+                d, f = os.path.split(self.build_temp)
+
+                # store our 64-bit extensions elsewhere
+                self.build_temp = d + "/temp64.%s" % \
+                    os.path.basename(self.build_temp).replace("temp.", "")
+                ext.extra_compile_args += ["-m64"]
+                ext.extra_link_args += ["-m64"]
+                self.build64 = True
+
+                # Build 64-bit
+                log.info("building 64-bit extension")
+                _build_ext.build_extension(self, ext)
+
+                # Reset to 32-bit
+                self.build_temp = old_build_temp
+                ext.extra_compile_args.remove("-m64")
+                ext.extra_link_args.remove("-m64")
+                self.build64 = False
+
+        def get_ext_fullpath(self, ext_name):
+                path = _build_ext.get_ext_fullpath(self, ext_name)
+                if not self.build64:
+                        return path
+
+                dpath, fpath = os.path.split(path)
+                return os.path.join(dpath, "64", fpath)
+
+
 class build_py_func(_build_py):
 
         def __init__(self, dist):
@@ -1378,3 +1418,19 @@
     ext_package = 'pkg',
     ext_modules = ext_modules,
     )
+
+# We don't support 64-bit yet, but 64-bit _actions.so, _common.so and _varcet.so
+# are needed for a system repository mod_wsgi application, sysrepo_p5p.py.
+# Remove the others.
+remove_libs = ["arch.so",
+    "elf.so",
+    "pspawn.so",
+    "solver.so",
+    "syscallat.so"
+]
+pkg_64_path = os.path.join(root_dir, "usr/lib/python2.6/vendor-packages/pkg/64")
+for lib in remove_libs:
+        rm_path = os.path.join(pkg_64_path, lib)
+        if os.path.exists(rm_path):
+                log.info("Removing unnecessary 64-bit library: %s" % lib)
+                os.unlink(rm_path)
--- a/src/sysrepo.py	Thu May 17 19:00:24 2012 +0100
+++ b/src/sysrepo.py	Wed May 23 09:49:43 2012 +1200
@@ -48,6 +48,7 @@
 import pkg.client.api_errors as apx
 import pkg.misc as misc
 import pkg.portable as portable
+import pkg.p5p as p5p
 
 logger = global_settings.logger
 orig_cwd = None
@@ -252,26 +253,30 @@
                     http_timeout)
 
                 for uri in uri_list:
-                        # we don't support p5p archives, only directory-based
-                        # repositories.  We also don't support file repositories
-                        # of < version 4.
+                        # we only support p5p files and directory-based
+                        # repositories of >= version 4.
                         if uri.startswith("file:"):
                                 urlresult = urllib2.urlparse.urlparse(uri)
                                 if not os.path.exists(urlresult.path):
                                         raise SysrepoException(
                                             _("file repository %s does not "
                                             "exist or is not accessible") % uri)
-                                if not os.path.isdir(urlresult.path):
-                                        raise SysrepoException(
-                                            _("p5p-based file repository %s "
-                                            "cannot be proxied.") % uri)
-                                if not os.path.exists(os.path.join(
+                                if os.path.isdir(urlresult.path) and \
+                                    not os.path.exists(os.path.join(
                                     urlresult.path, "pkg5.repository")):
                                         raise SysrepoException(
                                             _("file repository %s cannot be "
                                             "proxied. Only file "
                                             "repositories of version 4 or "
                                             "later are supported.") % uri)
+                                if not os.path.isdir(urlresult.path):
+                                        try:
+                                                p5p.Archive(urlresult.path)
+                                        except p5p.InvalidArchive:
+                                                raise SysrepoException(
+                                                    _("unable to read p5p "
+                                                    "archive file at %s") %
+                                                    urlresult.path)
 
                         hash = _uri_hash(uri)
                         cert = repo_uri.ssl_cert
@@ -368,13 +373,19 @@
 
                 httpd_conf_template_path = os.path.join(template_dir,
                     SYSREPO_HTTP_TEMPLATE)
+
+                # we're disabling unicode here because we want Mako to
+                # passthrough any filesystem path names, whatever the
+                # original encoding.
                 httpd_conf_template = Template(
-                    filename=httpd_conf_template_path)
+                    filename=httpd_conf_template_path,
+                    disable_unicode=True)
 
                 # our template expects cache size expressed in Kb
                 httpd_conf_text = httpd_conf_template.render(
                     sysrepo_log_dir=log_dir,
                     sysrepo_runtime_dir=runtime_dir,
+                    sysrepo_template_dir=template_dir,
                     uri_pub_map=uri_pub_map,
                     ipv6_addr="::1",
                     host=host,
@@ -385,7 +396,7 @@
                     https_proxy=https_proxy)
                 httpd_conf_path = os.path.join(runtime_dir,
                     SYSREPO_HTTP_FILENAME)
-                httpd_conf_file = file(httpd_conf_path, "w")
+                httpd_conf_file = file(httpd_conf_path, "wb")
                 httpd_conf_file.write(httpd_conf_text)
                 httpd_conf_file.close()
         except socket.gaierror, err:
@@ -436,10 +447,10 @@
                 # build a version of our uri_pub_map, keyed by publisher
                 pub_uri_map = {}
                 for uri in uri_pub_map:
-                        for (pub, key, cert, hash) in uri_pub_map[uri]:
+                        for (pub, cert, key, hash) in uri_pub_map[uri]:
                                 if pub not in pub_uri_map:
                                         pub_uri_map[pub] = []
-                                pub_uri_map[pub].append((uri, key, cert, hash))
+                                pub_uri_map[pub].append((uri, cert, key, hash))
 
                 publisher_template_path = os.path.join(template_dir,
                     SYSREPO_PUB_TEMPLATE)
--- a/src/tests/api/t_p5p.py	Thu May 17 19:00:24 2012 +0100
+++ b/src/tests/api/t_p5p.py	Wed May 23 09:49:43 2012 +1200
@@ -21,7 +21,7 @@
 #
 
 #
-# Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2011, 2012, Oracle and/or its affiliates. All rights reserved.
 #
 
 import testutils
@@ -486,11 +486,13 @@
                 dm.set_content(pathname=target, signatures=True)
                 self.assertEqualDiff(sm.signatures, dm.signatures)
 
-        def __verify_extract(self, repo, arc_path, hashes, ext_dir):
+        def __verify_extract(self, repo, arc_path, hashes, ext_dir,
+            archive_index=None):
                 """Helper method to test extraction and retrieval functionality.
                 """
 
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
 
                 #
                 # Verify behaviour of extract_package_manifest().
@@ -539,7 +541,8 @@
                 # Verify behaviour of extract_package_files().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
                 shutil.rmtree(ext_dir)
 
                 # Test unknown hashes.
@@ -571,7 +574,8 @@
                 # Verify behaviour of extract_to().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
                 shutil.rmtree(ext_dir)
 
                 # Test unknown file.
@@ -626,7 +630,8 @@
                 # Verify behaviour of get_file().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                     archive_index=archive_index)
 
                 # Test behaviour for non-existent file.
                 self.assertRaisesStringify(pkg.p5p.UnknownArchiveFiles,
@@ -643,7 +648,8 @@
                 # Verify behaviour of get_package_file().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
 
                 # Test behaviour when specifying publisher.
                 nullf = open(os.devnull, "wb")
@@ -664,7 +670,8 @@
                 # Verify behaviour of get_package_manifest().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
 
                 # Test bad FMRI.
                 self.assertRaises(pkg.fmri.IllegalFmri,
@@ -689,7 +696,8 @@
                 # Verify behaviour of extract_catalog1().
                 #
                 arc.close()
-                arc = pkg.p5p.Archive(arc_path, mode="r")
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index=archive_index)
                 ext_tmp_dir = tempfile.mkdtemp(dir=self.test_root)
                 def verify_catalog(pub, pfmris):
                         for pname in ("catalog.attrs", "catalog.base.C",
@@ -815,7 +823,7 @@
                 arc = ptf.PkgTarFile(name=arc_path, mode="r")
                 arc.extractall(ext_dir)
                 arc.close()
-       
+
                 # Now verify archive can still be used when index file
                 # is omitted.
                 os.unlink(arc_path)
@@ -831,6 +839,30 @@
                 arc = self.__verify_extract(repo, arc_path, hashes, ext_dir)
                 arc.close()
 
+                # Save an index for later.
+                arc = pkg.p5p.Archive(arc_path, mode="r")
+                saved_index = arc.get_index()
+                arc.close()
+
+                # Verify we can extract the archive reusing an index.
+                arc = self.__verify_extract(repo, arc_path, hashes, ext_dir,
+                    archive_index=saved_index)
+                arc.close()
+
+                # Verify we throw an assert when opening a p5p in write mode.
+                self.assertRaisesStringify(AssertionError, pkg.p5p.Archive,
+                    arc_path, mode="w", archive_index=saved_index)
+
+                # Verify we can't extract archive members using a corrupted
+                # index.
+                arc = pkg.p5p.Archive(arc_path, mode="r",
+                    archive_index={"cats": 1234L})
+                self.assertRaisesStringify(pkg.p5p.ArchiveErrors,
+                    arc.extract_catalog1, "catalog.attrs", ext_dir)
+                self.assertRaisesStringify(pkg.p5p.ArchiveErrors,
+                    arc.extract_package_files, hashes, ext_dir)
+                arc.close()
+
         def test_05_invalid(self):
                 """Verify that pkg(5) archive class handles broken archives
                 and items that aren't archives as expected."""
@@ -983,6 +1015,14 @@
                     pkg.p5p.Archive, arc_path, mode="r")
                 os.unlink(arc_path)
 
+        def test_06_get_index(self):
+                """Verify we can't retrieve an index from an archive opened
+                in write-mode."""
+                arc_path = os.path.join(self.test_root, "index.p5p")
+                arc = pkg.p5p.Archive(arc_path, mode="w")
+                self.assertRaisesStringify(AssertionError, arc.get_index)
+                arc.close()
+                os.unlink(arc_path)
 
 if __name__ == "__main__":
         unittest.main()
--- a/src/tests/cli/t_sysrepo.py	Thu May 17 19:00:24 2012 +0100
+++ b/src/tests/cli/t_sysrepo.py	Wed May 23 09:49:43 2012 +1200
@@ -1,4 +1,5 @@
 #!/usr/bin/python
+# -*- coding: utf-8 -*-
 #
 # CDDL HEADER START
 #
@@ -29,8 +30,11 @@
 
 import errno
 import hashlib
+import imp
 import os
 import os.path
+import pkg.p5p
+import shutil
 import unittest
 import urllib2
 import shutil
@@ -207,6 +211,11 @@
             add file tmp/sample_file mode=0444 owner=root group=bin path=/usr/bin/sample
             close"""
 
+        new_pkg = """
+            open [email protected],5.11-0
+            add file tmp/sample_file mode=0444 owner=root group=bin path=/usr/bin/new
+            close"""
+
         misc_files = ["tmp/sample_file"]
 
         def setUp(self):
@@ -385,7 +394,8 @@
                 self.sc.stop()
 
         def test_8_file_publisher(self):
-                """A proxied file publisher works as a normal file publisher."""
+                """A proxied file publisher works as a normal file publisher,
+                including package archives"""
                 #
                 # The standard system publisher client code does not use the
                 # "publisher/0" response, so we need this test to exercise that.
@@ -397,9 +407,16 @@
                 urlresult = urllib2.urlparse.urlparse(self.rurl1)
                 symlink_path = os.path.join(self.test_root, "repo_symlink")
                 os.symlink(urlresult.path, symlink_path)
-                symlinked_url="file://%s" % symlink_path
+                symlinked_url = "file://%s" % symlink_path
 
-                for file_url in [self.rurl1, symlinked_url]:
+                # create a p5p archive
+                p5p_path = os.path.join(self.test_root,
+                    "test_8_file_publisher_archive.p5p")
+                p5p_url = "file://%s" % p5p_path
+                self.pkgrecv(server_url=self.durl1, command="-a -d %s sample" %
+                    p5p_path)
+
+                for file_url in [self.rurl1, symlinked_url, p5p_url]:
                         self.image_create(prefix="test1", repourl=self.durl1)
                         self.pkg("set-publisher -g %s test1" % file_url)
                         self.sysrepo("")
@@ -411,23 +428,20 @@
                         self.pkg_image_create(prefix="test1", repourl=url)
                         self.pkg("install sample")
                         self.pkg("contents -rm sample")
-                        # the sysrepo doesn't support search operations for file repos
+                        # the sysrepo doesn't support search ops for file repos
                         self.pkg("search -r sample", exit=1)
                         self.sc.stop()
 
         def test_9_unsupported_publishers(self):
-                """Ensure we fail when asked to proxy p5p or < v4 file repos"""
+                """Ensure we fail when asked to proxy < v4 file repos"""
 
                 v3_repo_root = os.path.join(self.test_root, "sysrepo_test_9")
                 os.mkdir(v3_repo_root)
                 v3_repo_path = os.path.join(v3_repo_root, "repo")
-                p5a_path = os.path.join(v3_repo_root, "archive.p5p")
-                self.pkgrecv(server_url=self.durl1, command="-a -d %s sample" %
-                    p5a_path)
 
                 self.pkgrepo("create --version 3 %s" % v3_repo_path)
                 self.pkgrepo("-s %s set publisher/prefix=foo" % v3_repo_path)
-                for path in [p5a_path, v3_repo_path]:
+                for path in [v3_repo_path]:
                         self.image_create(repourl="file://%s" % path)
                         self.sysrepo("-R %s" % self.img_path(), exit=1)
 
@@ -519,5 +533,319 @@
 
                 self.sc.stop()
 
+        def test_13_changing_p5p(self):
+                """Ensure that when a p5p file changes from beneath us, or
+                disappears, the system repository and any pkg(5) clients
+                react correctly."""
+
+                # create a p5p archive
+                p5p_path = os.path.join(self.test_root,
+                    "test_12_changing_p5p_archive.p5p")
+                p5p_url = "file://%s" % p5p_path
+                self.pkgrecv(server_url=self.durl1, command="-a -d %s sample" %
+                    p5p_path)
+
+                # configure an image from which to generate a sysrepo config
+                self.image_create(prefix="test1", repourl=self.durl1)
+                self.pkg("set-publisher -g %s test1" % p5p_url)
+                self.sysrepo("")
+                self._start_sysrepo()
+
+                # create an image which uses the system publisher
+                hash = hashlib.sha1(p5p_url.rstrip("/")).hexdigest()
+                url = "http://localhost:%(port)s/test1/%(hash)s/" % \
+                    {"port": self.sysrepo_port, "hash": hash}
+
+                self.debug("using %s as repo url" % url)
+                self.pkg_image_create(prefix="test1", repourl=url)
+                self.pkg("install sample")
+
+                # modify the p5p file - publish a new package and an
+                # update of the existing package, then recreate the p5p file.
+                self.pkgsend_bulk(self.durl1, self.new_pkg)
+                self.pkgsend_bulk(self.durl1, self.sample_pkg)
+                os.unlink(p5p_path)
+                self.pkgrecv(server_url=self.durl1,
+                    command="-a -d %s sample new" % p5p_path)
+
+                # ensure we can install our new packages through the system
+                # publisher url
+                self.pkg("install new")
+                self.pkg("publisher")
+
+                # remove the p5p file, which should still allow us to uninstall
+                renamed_p5p_path = p5p_path + ".renamed"
+                os.rename(p5p_path, renamed_p5p_path)
+                self.pkg("uninstall new")
+
+                # ensure we can't install the packages or perform operations
+                # that require the p5p file to be present
+                self.pkg("install new", exit=1)
+                self.pkg("contents -rm new", exit=1)
+
+                # replace the p5p file, and ensure the client can install again
+                os.rename(renamed_p5p_path, p5p_path)
+                self.pkg("install new")
+                self.pkg("contents -rm new")
+
+                self.sc.stop()
+
+        def test_13_bad_input(self):
+                """Tests the system repository with some bad input: wrong
+                paths, unicode in urls, and some very long urls to ensure
+                the responses are as expected."""
+                # create a p5p archive
+                p5p_path = os.path.join(self.test_root,
+                    "test_13_bad_input.p5p")
+                p5p_url = "file://%s" % p5p_path
+                self.pkgrecv(server_url=self.durl1, command="-a -d %s sample" %
+                    p5p_path)
+                p5p_hash = hashlib.sha1(p5p_url.rstrip("/")).hexdigest()
+                file_url = self.dcs[2].get_repo_url()
+                file_hash = hashlib.sha1(file_url.rstrip("/")).hexdigest()
+
+                # configure an image from which to generate a sysrepo config
+                self.image_create(prefix="test1", repourl=self.durl1)
+
+                self.pkg("set-publisher -p %s" % file_url)
+                self.pkg("set-publisher -g %s test1" % p5p_url)
+                self.sysrepo("")
+                self._start_sysrepo()
+
+                # some incorrect urls
+                queries_404 = [
+                    "noodles"
+                    "/versions/1"
+                    "/"
+                ]
+
+                # a place to store some long urls
+                queries_414 = []
+
+                # add urls and some unicode.  We test a file repository,
+                # which makes sure Apache can deal with the URLs appropriately,
+                # as well as a p5p repository, exercising our mod_wsgi app.
+                for hsh, pub in [("test1", p5p_hash), ("test2", file_hash)]:
+                        queries_404.append("%s/%s/catalog/1/ΰŇﺇ⊂⏣⊅ℇ" %
+                            (pub, hsh))
+                        queries_404.append("%s/%s/catalog/1/%s" %
+                            (pub, hsh, "f" + "u" * 1000))
+                        queries_414.append("%s/%s/catalog/1/%s" %
+                            (pub, hsh, "f" * 900000 + "u"))
+
+                def test_response(part, code):
+                        """Given a url substring and an expected error code,
+                        check that the system repository returns that code
+                        for a url constructed from that part."""
+                        url = "http://localhost:%s/%s" % \
+                            (self.sysrepo_port, part)
+                        try:
+                                resp =  urllib2.urlopen(url, None, None)
+                        except urllib2.HTTPError, e:
+                                if e.code != code:
+                                        self.assert_(False,
+                                            "url %s returned: %s" % (url, e))
+
+                for url_part in queries_404:
+                        test_response(url_part, 404)
+                for url_part in queries_414:
+                        test_response(url_part, 414)
+                self.sc.stop()
+
+        def test_14_unicode(self):
+                """Tests the system repository with some unicode paths to p5p
+                files."""
+                unicode_str = "ΰŇﺇ⊂⏣⊅ℇ"
+                unicode_dir = os.path.join(self.test_root, unicode_str)
+                os.mkdir(unicode_dir)
+
+                # create paths to p5p files, using unicode dir or file names
+                p5p_unicode_dir = os.path.join(unicode_dir,
+                    "test_14_unicode.p5p")
+                p5p_unicode_file = os.path.join(self.test_root,
+                    "%s.p5p" % unicode_str)
+
+                for p5p_path in [p5p_unicode_dir, p5p_unicode_file]:
+                        p5p_url = "file://%s" % p5p_path
+                        self.pkgrecv(server_url=self.durl1,
+                            command="-a -d %s sample" % p5p_path)
+                        p5p_hash = hashlib.sha1(p5p_url.rstrip("/")).hexdigest()
+
+                        self.image_create()
+                        self.pkg("set-publisher -p %s" % p5p_url)
+
+                        self.sysrepo("")
+                        self._start_sysrepo()
+
+                        # ensure we can get content from the p5p file
+                        for path in ["catalog/1/catalog.attrs",
+                            "catalog/1/catalog.base.C",
+                            "file/1/f5da841b7c3601be5629bb8aef928437de7d534e"]:
+                                url = "http://localhost:%s/test1/%s/%s" % \
+                                    (self.sysrepo_port, p5p_hash, path)
+                                resp = urllib2.urlopen(url, None, None)
+                                self.debug(resp.readlines())
+
+                        self.sc.stop()
+
+class TestP5pWsgi(pkg5unittest.SingleDepotTestCase):
+        """A class to directly exercise the p4p mod_wsgi application outside
+        of Apache and the system repository itself.
+
+        By calling the web application directly, we have a little more
+        flexibility when writing tests.  Other system-repository tests will
+        exercise much of the mod_wsgi configuration and framework, but these
+        tests will be easier to debug and faster to run.
+
+        Note that since we call the web application directly, the web app can
+        intentionally emit some tracebacks to stderr, which will be seen by
+        the test framework."""
+
+        persistent_setup = False
+
+        sample_pkg = """
+            open [email protected],5.11-0
+            add file tmp/sample_file mode=0444 owner=root group=bin path=/usr/bin/sample
+            close"""
+
+        new_pkg = """
+            open [email protected],5.11-0
+            add file tmp/sample_file mode=0444 owner=root group=bin path=/usr/bin/new
+            close"""
+
+        misc_files = { "tmp/sample_file": "carrots" }
+
+        def setUp(self):
+                pkg5unittest.SingleDepotTestCase.setUp(self, start_depot=True)
+                self.image_create()
+
+                # we have to dynamically load the mod_wsgi webapp, since it
+                # lives outside our normal search path
+                mod_name = "sysrepo_p5p"
+                src_name = "%s.py" % mod_name
+                sysrepo_p5p_file = file(os.path.join(self.template_dir,
+                    src_name))
+                self.sysrepo_p5p = imp.load_module(mod_name, sysrepo_p5p_file,
+                    src_name, ("py", "r", imp.PY_SOURCE))
+
+                # now create a simple p5p file that we can use in our tests
+                self.make_misc_files(self.misc_files)
+                self.pkgsend_bulk(self.durl, self.sample_pkg)
+                self.pkgsend_bulk(self.durl, self.new_pkg)
+
+                self.p5p_path = os.path.join(self.test_root,
+                    "mod_wsgi_archive.p5p")
+
+                self.pkgrecv(server_url=self.durl,
+                    command="-a -d %s sample new" % self.p5p_path)
+                self.http_status = ""
+
+        def test_queries(self):
+                """Ensure that we return proper HTTP response codes."""
+
+                def start_response(status, response_headers, exc_info=None):
+                        """A dummy response function, used to capture output"""
+                        self.http_status = status
+
+                environ = {}
+                hsh = "123abcdef"
+                environ["SYSREPO_RUNTIME_DIR"] = self.test_root
+                environ["PKG5_TEST_ENV"] = "True"
+                environ[hsh] = self.p5p_path
+
+                def test_query_responses(queries, code, expect_content=False):
+                        """Given a list of queries, and a string we expect to
+                        appear in each response, invoke the wsgi application
+                        with each query and check response codes.  Also check
+                        that content was returned or not."""
+
+                        for query in queries:
+                                seen_content = False
+                                environ["QUERY_STRING"] = urllib2.unquote(query)
+                                self.http_status = ""
+                                for item in self.sysrepo_p5p.application(
+                                    environ, start_response):
+                                        seen_content = item
+
+                                self.assert_(code in self.http_status,
+                                    "Query %s response did not contain %s: %s" %
+                                    (query, code, self.http_status))
+                                if expect_content:
+                                        self.assert_(seen_content,
+                                            "No content returned for %s" %
+                                            query)
+                                else:
+                                        self.assertFalse(seen_content,
+                                            "Unexpected content for %s" % query)
+
+                # the easiest way to get the name of one of the manifests
+                # in the archive is to look for it in the index
+                archive = pkg.p5p.Archive(self.p5p_path)
+                idx = archive.get_index()
+                mf = None
+                for item in idx.keys():
+                        if item.startswith("publisher/test/pkg/new/"):
+                                mf = item.replace(
+                                    "publisher/test/pkg/new/", "new@")
+                archive.close()
+
+                queries_200 = [
+                    # valid file, matches the hash of the content in misc_files
+                    "pub=test&hash=%s&path=file/1/f890d49474e943dc07a766c21d2bf35d6e527e89" % hsh,
+                    # valid catalog parts
+                    "pub=test&hash=%s&path=catalog/1/catalog.attrs" % hsh,
+                    "pub=test&hash=%s&path=catalog/1/catalog.base.C" % hsh,
+                    # valid manifest
+                    "pub=test&hash=%s&path=manifest/0/%s" % (hsh, mf)
+                ]
+
+                queries_404 = [
+                    # wrong path
+                    "pub=test&hash=%s&path=catalog/1/catalog.attrsX" % hsh,
+                    # invalid publisher
+                    "pub=WRONG&hash=%s&path=catalog/1/catalog.attrs" % hsh,
+                    # incorrect path
+                    "pub=test&hash=%s&path=file/1/12u3yt123123" % hsh,
+                    # incorrect path (where the first path component is unknown)
+                    "pub=test&hash=%s&path=carrots/1/12u3yt123123" % hsh,
+                    # incorrect manifest, with an unknown package name
+                    "pub=test&hash=%s&path=manifest/0/foo%s" % (hsh, mf),
+                    # incorrect manifest, with an illegal FMRI
+                    "pub=test&hash=%s&path=manifest/0/%sfoo" % (hsh, mf)
+                ]
+
+                queries_400 = [
+                    # missing publisher (while p5p files can return content
+                    # despite no publisher, our mod_wsgi app requires a
+                    # publisher)
+                    "hash=%s&path=catalog/1/catalog.attrs" % hsh,
+                    # missing path
+                    "pub=test&hash=%s" % hsh,
+                    # malformed query
+                    "&&???&&&",
+                    # no hash key
+                    "pub=test&hashX=%s&path=catalog/1/catalog.attrs" % hsh,
+                    # unknown hash value
+                    "pub=test&hash=carrots&path=catalog/1/catalog.attrs"
+                ]
+
+                test_query_responses(queries_200, "200", expect_content=True)
+                test_query_responses(queries_400, "400")
+                test_query_responses(queries_404, "404")
+
+                # generally we try to shield users from internal server errors,
+                # however in the case of a missing p5p file on the server
+                # this seems like the right thing to do, rather than to return
+                # a 404.
+                # The end result for pkg client with 500 or a 404 code is the
+                # same, but the former will result in more useful information
+                # in the system-repository error_log.
+                os.unlink(self.p5p_path)
+                queries_500 = queries_200 + queries_404
+                test_query_responses(queries_500, "500")
+                # despite the missing p5p file, we should still get 400 errors
+                test_query_responses(queries_400, "400")
+
+
 if __name__ == "__main__":
         unittest.main()
--- a/src/util/apache2/sysrepo/README.txt	Thu May 17 19:00:24 2012 +0100
+++ b/src/util/apache2/sysrepo/README.txt	Wed May 23 09:49:43 2012 +1200
@@ -33,3 +33,7 @@
 					reference_httpd.conf file in this
 					directory.
 
+./sysrepo_p5p.py			A WSGI application, used to serve
+                                        the contents of .p5p archives to
+                                        system_repository clients.
+
--- a/src/util/apache2/sysrepo/sysrepo_httpd.conf.mako	Thu May 17 19:00:24 2012 +0100
+++ b/src/util/apache2/sysrepo/sysrepo_httpd.conf.mako	Wed May 23 09:49:43 2012 +1200
@@ -4,7 +4,9 @@
 # file.
 #
 </%doc>
-<%      context.write("""
+<%
+      import os.path
+      context.write("""
 #
 # This is an automatically generated file for the IPS system publisher, and
 # should not be modified directly.  Changes made to this file will be
@@ -58,6 +60,18 @@
 LoadModule alias_module libexec/64/mod_alias.so
 LoadModule rewrite_module libexec/64/mod_rewrite.so
 
+LoadModule env_module libexec/64/mod_env.so
+LoadModule wsgi_module libexec/64/mod_wsgi-2.6.so
+# We only alias a specific script, not all files in ${sysrepo_template_dir}
+WSGIScriptAlias /wsgi_p5p ${sysrepo_template_dir}/sysrepo_p5p.py
+WSGIDaemonProcess sysrepo processes=1 threads=21 user=pkg5srv group=pkg5srv display-name=pkg5_sysrepo inactivity-timeout=120
+WSGIProcessGroup sysrepo
+WSGISocketPrefix ${sysrepo_runtime_dir}/wsgi
+# don't accept requests over 100k
+LimitRequestBody 102400
+# ensure our wsgi application can get its runtime directory
+SetEnv SYSREPO_RUNTIME_DIR ${sysrepo_runtime_dir}
+
 #
 # If you wish httpd to run as a different user or group, you must run
 # httpd as root initially and it will switch.
@@ -159,6 +173,14 @@
 
 </Directory>
 
+# Allow access to wsgi scripts under ${sysrepo_template_dir}
+<Directory ${sysrepo_template_dir}>
+    SetHandler wsgi-script
+    WSGIProcessGroup sysrepo
+    Options ExecCGI
+    Allow from 127.0.0.1
+</Directory>
+
 #
 # DirectoryIndex: sets the file that Apache will serve if a directory
 # is requested.
@@ -338,12 +360,27 @@
                         # publisher-specific publisher/0, response, then stop.
                         </%doc>
 <%
+                        # File and p5p-based repositories get our static
+                        # versions and publisher responses
                         context.write("RewriteRule ^/%(pub)s/%(hash)s/versions/0 "
                             "/versions/0/index.html [L,NE]\n" % locals())
                         context.write("RewriteRule ^/%(pub)s/%(hash)s/publisher/0 "
-                            "/%(pub)s/%(hash)s/publisher/0/index.html [L,NE]" % locals())
+                            "/%(pub)s/%(hash)s/publisher/0/index.html [L,NE]\n" % locals())
+                        # A p5p archive repository
+                        if os.path.isfile(uri.replace("file:", "")):
+
+                                repo_path = "/%s" % uri.replace("file:", "").lstrip("/")
+                                context.write("# %s %s\n" % (uri, hash))
+                                # We 'passthrough' (PT), letting our
+                                # WSGIScriptAlias pick up the request from here.
+                                context.write("RewriteRule /%(pub)s/%(hash)s/(.*) "
+                                    "/wsgi_p5p?pub=%(pub)s&hash=%(hash)s&path=$1 [NE,PT]\n" %
+                                    locals())
+                                context.write("SetEnv %(hash)s %(repo_path)s\n" %
+                                    locals())
+                                continue
 %><%doc>
-
+                        # We have a file-based repository
                         # Modify the catalog and manifest URLs, then
                         # 'passthrough' (PT), letting the Alias below rewrite
                         # the URL instead.
@@ -401,34 +438,35 @@
         % endfor uri
 % endfor pub
 
-# any non-file-based repositories get our local versions and syspub responses
-RewriteRule ^.*/versions/0/?$ - [L]
-RewriteRule ^.*/syspub/0/?$ - [L]
-# allow for 'OPTIONS * HTTP/1.0' requests
-RewriteCond %{REQUEST_METHOD} OPTIONS [NC]
-RewriteRule \* - [L] 
-# catch all, denying everything
-RewriteRule ^.*$ - [R=404]
-
 % for uri in reversed(sorted(uri_pub_map.keys())):
         % for pub, cert_path, key_path, hash in uri_pub_map[uri]:
                 <%doc>
                 # Create an alias for the file repository under ${pub}
                 </%doc>
-                % if uri.startswith("file:"):
-                        <% repo_path = uri.replace("file:", "") %>
-# a file repository alias to serve ${uri} content.
-<Directory "${repo_path}">
-    AllowOverride None
-    Order allow,deny
-    Allow from 127.0.0.1
-</Directory>
-                                % if cache_dir != None:
+                % if uri.startswith("file:") and os.path.isdir(uri.replace("file:", "")):
+<%
+                      repo_path = "/%s" % uri.replace("file:", "").lstrip("/")
+                      context.write("# a file repository alias to serve %(uri)s content.\n"
+                          "<Directory \"%(repo_path)s\">\n"
+                          "    AllowOverride None\n"
+                          "    Order allow,deny\n"
+                          "    Allow from 127.0.0.1\n"
+                          "</Directory>\n" % locals())
+%>
+                      % if cache_dir != None:
 CacheDisable /${pub}/${hash}/publisher/0
 CacheDisable /${pub}/${hash}/versions/0
-                                % endif
+                      % endif
 Alias /${pub}/${hash} ${repo_path}
                 % endif
         % endfor uri
 % endfor pub
 
+# any non-file-based repositories get our local versions and syspub responses
+RewriteRule ^.*/versions/0/?$ - [L]
+RewriteRule ^.*/syspub/0/?$ - [L]
+# allow for 'OPTIONS * HTTP/1.0' requests
+RewriteCond %{REQUEST_METHOD} OPTIONS [NC]
+RewriteRule \* - [L]
+# catch all, denying everything
+RewriteRule ^.*$ - [R=404]
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/src/util/apache2/sysrepo/sysrepo_p5p.py	Wed May 23 09:49:43 2012 +1200
@@ -0,0 +1,452 @@
+#!/usr/bin/python2.6
+#
+# CDDL HEADER START
+#
+# The contents of this file are subject to the terms of the
+# Common Development and Distribution License (the "License").
+# You may not use this file except in compliance with the License.
+#
+# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
+# or http://www.opensolaris.org/os/licensing.
+# See the License for the specific language governing permissions
+# and limitations under the License.
+#
+# When distributing Covered Code, include this CDDL HEADER in each
+# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
+# If applicable, add the following below this CDDL HEADER, with the
+# fields enclosed by brackets "[]" replaced with your own identifying
+# information: Portions Copyright [yyyy] [name of copyright owner]
+#
+# CDDL HEADER END
+#
+# Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.
+
+import pkg.p5p
+
+import httplib
+import os
+import shutil
+import simplejson
+import sys
+import threading
+import traceback
+
+# redirecting stdout for proper WSGI portability
+sys.stdout = sys.stderr
+
+SERVER_OK_STATUS = "%s %s" % (httplib.OK, httplib.responses[httplib.OK])
+SERVER_ERROR_STATUS = "%s %s" % (httplib.INTERNAL_SERVER_ERROR,
+    httplib.responses[httplib.INTERNAL_SERVER_ERROR])
+SERVER_NOTFOUND_STATUS = "%s %s" % (httplib.NOT_FOUND,
+    httplib.responses[httplib.NOT_FOUND])
+SERVER_BADREQUEST_STATUS = "%s %s" % (httplib.BAD_REQUEST,
+    httplib.responses[httplib.BAD_REQUEST])
+
+response_headers = [("content-type", "application/binary")]
+
+p5p_indices = {}
+
+# A lock to prevent two threads from rebuilding our catalog parts cache
+# at the same time.
+p5p_update_lock = threading.Lock()
+
+class UnknownPathException(Exception):
+        """An exception thrown when a client requests a path within a p5p file
+        which does not exist."""
+        def __init__(self, path):
+                self.path = path
+
+        def __str__(self):
+                return "Unknown path: %s" % self.path
+
+
+class MalformedQueryException(Exception):
+        """An exception thrown when this wsgi application cannot parse a query
+        from the client."""
+        def __init__(self, query, reason):
+                self.query = query
+                self.reason = reason
+
+        def __str__(self):
+                return "Malformed query %s: %s" % (self.query, self.reason)
+
+
+class MissingArchiveException(Exception):
+        """An exception thrown when the p5p file referred to by the
+        configuration does not exist."""
+        def __init__(self, path):
+                self.path = path
+
+        def __str__(self):
+                return "Missing p5p archive: %s" % (self.path)
+
+
+class SysrepoP5p(object):
+        """An object to handle a request for p5p file contents from the
+        system repository."""
+
+        def __init__(self, environ, start_response):
+                self.environ = environ
+                self.start_response = start_response
+                self.p5p_path = None
+                self.p5p = None
+
+                self.query = self.environ["QUERY_STRING"]
+                self.runtime_dir = self.environ["SYSREPO_RUNTIME_DIR"]
+
+        def close(self):
+                """Release any resources we have used."""
+                if self.p5p:
+                        self.p5p.close()
+
+        def log_exception(self, status=SERVER_ERROR_STATUS):
+                """Print some information in the Apache log that will help
+                determine what went wrong as well as updating the client
+                response code.  The WSGI spec says we can call
+                start_response multiple times, but must include exc_info
+                if we do so."""
+
+                # we only want error_log output if our status is not 4xx
+                if status != SERVER_NOTFOUND_STATUS and \
+                    status != SERVER_BADREQUEST_STATUS:
+                        print traceback.format_exc()
+                self.start_response(status, response_headers,
+                    sys.exc_info())
+
+        def need_update(self, pub, hsh):
+                """Determine if we need to update our cached catalog and
+                reload the index by comparing the last modification time of a
+                file we create per p5p archive, and the p5p archive itself."""
+
+                htdocs_path = os.path.join(self.runtime_dir, "htdocs")
+                timestamp_path = \
+                    "%(htdocs_path)s/%(pub)s/%(hsh)s/sysrepo.timestamp" % \
+                    locals()
+
+                update = False
+
+                # Locking here is quite basic: we want to ensure that no two
+                # threads simultaneously decide that they need to rebuild our
+                # local catalog cache, stepping on each others toes.  It is
+                # possible that while processing a single query, a user will
+                # replace the p5p file on the server after this method has been
+                # called, causing stale data to be returned at best, and a HTTP
+                # 500 response at worst (as the p5p index used by this web
+                # application will not match the one in the new archive)
+                p5p_update_lock.acquire()
+                try:
+                        # don't write a timestamp if we're testing
+                        if self.environ.get("PKG5_TEST_ENV") == "True":
+                                return True
+
+                        try:
+                                st_p5p = os.stat(self.p5p_path)
+                        except OSError, e:
+                                if e.errno == os.errno.ENOENT:
+                                        raise MissingArchiveException(
+                                            self.p5p_path)
+                        try:
+                                st_ts = os.stat(timestamp_path)
+                                if st_ts.st_mtime < st_p5p.st_mtime:
+                                        open(timestamp_path, "wb").close()
+                                        update = True
+                        except OSError, e:
+                                if e.errno == os.errno.ENOENT:
+                                        open(timestamp_path, "wb").close()
+                                        update = True
+
+                except MissingArchiveException, e:
+                        raise
+                except Exception, e:
+                        self.log_exception()
+                finally:
+                        p5p_update_lock.release()
+                return update
+
+        def _file_response(self, path, pub):
+                """Process our file query."""
+
+                # use the basename of the path, which is the pkg(5) hash
+                self.start_response(SERVER_OK_STATUS, response_headers)
+                try:
+                        return self.p5p.get_package_file(os.path.basename(path),
+                            pub=pub)
+                except pkg.p5p.UnknownArchiveFiles, e:
+                        self.log_exception(status=SERVER_NOTFOUND_STATUS)
+                except Exception, e:
+                        self.log_exception()
+
+        def _catalog_response(self, path, pub, hsh):
+                """Process our catalog query"""
+
+                cat_part = os.path.basename(path)
+                htdocs_path = os.path.join(self.runtime_dir, "htdocs")
+                cat_path = \
+                    "%(htdocs_path)s/%(pub)s/%(hsh)s/catalog/1/%(cat_part)s" % \
+                    locals()
+                self.start_response(SERVER_OK_STATUS, response_headers)
+                if os.path.exists(cat_path):
+                        return open(cat_path, "rb")
+
+                # this is unlikely to happen: it implies a catalog part has been
+                # requested that wasn't listed in the catalog.attrs file
+                # extracted during _precache_catalog() or the file has been
+                # removed on the server.  Do our best to return the content.
+                try:
+                        cat_dir = os.path.dirname(cat_path)
+                        p5p_update_lock.acquire()
+                        try:
+                                if not os.path.exists(cat_dir):
+                                        os.makedirs(cat_dir, 0755)
+                                self.p5p.extract_catalog1(cat_part, cat_dir,
+                                    pub=pub)
+                                return open(cat_path, "rb")
+                        except (pkg.p5p.UnknownArchiveFiles, IOError), e:
+                                self.log_exception(
+                                    status=SERVER_NOTFOUND_STATUS)
+                        except Exception, e:
+                                self.log_exception()
+                        finally:
+                                p5p_update_lock.release()
+                except OSError, e:
+                        if e.errno == os.errno.ENOENT:
+                                return open(cat_path, "rb")
+                        else:
+                                raise
+
+        def _manifest_response(self, path, pub):
+                """Return our manifest_response. """
+
+                pkg_name = path.replace("manifest/0/", "")
+                fmri = "pkg://%s/%s" % (pub, pkg_name)
+                mf = None
+                self.start_response(SERVER_OK_STATUS, response_headers)
+                try:
+                        mf = self.p5p.get_package_manifest(fmri, raw=True)
+                        return mf
+                except pkg.p5p.UnknownPackageManifest, e:
+                        self.log_exception(status=SERVER_NOTFOUND_STATUS)
+                except pkg.fmri.IllegalFmri, e:
+                        self.log_exception(status=SERVER_NOTFOUND_STATUS)
+                except Exception, e:
+                        self.log_exception()
+
+        def _precache_catalog(self, pub, hsh):
+                """Extract the parts from the catalog_dir to the given path."""
+
+                htdocs_path = os.path.join(self.runtime_dir, "htdocs")
+                cat_dir = "%(htdocs_path)s/%(pub)s/%(hsh)s/catalog/1" % \
+                    locals()
+
+                if os.path.exists(cat_dir):
+                        shutil.rmtree(cat_dir)
+
+                os.makedirs(cat_dir)
+                try:
+                        self.p5p.extract_catalog1("catalog.attrs", cat_dir,
+                            pub=pub)
+                        with open(os.path.join(cat_dir, "catalog.attrs"),
+                            "rb") as catalog_attrs:
+                                json = simplejson.load(catalog_attrs)
+                                for part in json["parts"]:
+                                        self.p5p.extract_catalog1(part, cat_dir,
+                                            pub=pub)
+
+                except pkg.p5p.UnknownArchiveFiles, e:
+                        # if the catalog part is unavailable,
+                        # we ignore this for now.  It will be
+                        # reported later anyway.
+                        pass
+
+        def _parse_query(self):
+                """Parse our query, returning publisher, hash, and path
+                values."""
+
+                keyvals = self.query.split("&")
+                attrs = {}
+                for keyval in keyvals:
+                        try:
+                                key, val = keyval.split("=", 1)
+                                attrs[key] = val
+                        except ValueError:
+                                raise MalformedQueryException(self.query,
+                                    "missing key=value pair for %s." % keyval)
+
+                pub = attrs.get("pub")
+                hsh = attrs.get("hash")
+                path = attrs.get("path")
+
+                if not hsh:
+                        raise MalformedQueryException(self.query,
+                            "missing hash.")
+                if hsh not in self.environ:
+                        raise MalformedQueryException(self.query,
+                            "unknown hash %s." % hsh)
+                if not pub:
+                        raise MalformedQueryException(self.query,
+                            "missing publisher.")
+                if not path:
+                        raise MalformedQueryException(self.query,
+                            "missing path.")
+                return pub, hsh, path
+
+        def execute(self):
+                """Process a query of the form:
+
+                pub=<publisher>&hash=<hash>&path=<path>
+
+                where:
+                    <publisher>    the name of the publisher from the p5p file
+                    <hash>         the sha1 hash of the location of the p5p file
+                    <path>         the path of the pkg(5) client request
+
+                In the environment of this WSGI application, apart from the
+                default WSGI values, defined in PEP333, we expect:
+
+                "SYSREPO_RUNTIME_DIR", a location pointing to the runtime
+                directory, allowing us to serve static html from beneath a
+                "htdocs" subdir.
+
+                <hash>, which maps the sha1 hash of the p5p archive path, to the
+                path itself, which is not visible to clients.
+                """
+
+                buf = []
+                try:
+                        pub, hsh, path = self._parse_query()
+                        self.p5p_path = self.environ[hsh]
+                        # In order to keep only one copy of the p5p index in
+                        # memory, we cache it locally, and reuse it any time
+                        # we're opening the same p5p file.  Before doing
+                        # so, we need to ensure the p5p file hasn't been
+                        # modified since we last looked at it.
+                        if self.need_update(pub, hsh) or \
+                            self.p5p_path not in p5p_indices:
+                                p5p_update_lock.acquire()
+                                try:
+                                        self.p5p = pkg.p5p.Archive(
+                                            self.p5p_path)
+                                        p5p_indices[self.p5p_path] = \
+                                            self.p5p.get_index()
+                                        self._precache_catalog(pub, hsh)
+                                except:
+                                        raise
+                                finally:
+                                        p5p_update_lock.release()
+                        else:
+                                self.p5p = pkg.p5p.Archive(self.p5p_path,
+                                    archive_index=p5p_indices[self.p5p_path])
+
+                        if path.startswith("file"):
+                                buf = self._file_response(path, pub)
+                        elif path.startswith("catalog/1/"):
+                                buf = self._catalog_response(path, pub, hsh)
+                        elif path.startswith("manifest/0"):
+                                buf = self._manifest_response(path, pub)
+                        else:
+                                raise UnknownPathException(path)
+                except OSError, e:
+                        print e.errno
+                        if e.errno == os.errno.ENOENT:
+                                self.log_Exception(
+                                    status=SERVER_NOTFOUND_STATUS)
+                except UnknownPathException, e:
+                        self.log_exception(status=SERVER_NOTFOUND_STATUS)
+                except MalformedQueryException, e:
+                        self.log_exception(status=SERVER_BADREQUEST_STATUS)
+                except MissingArchiveException, e:
+                        self.log_exception()
+                except Exception, e:
+                        self.log_exception()
+                return buf
+
+
+#
+# CloseGenerator,  AppWrapper and _application as an idiom together
+# are described at
+# http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
+# and exist to ensure that we close any server-side resources used by
+# our application at the end of the request (i.e. after the client has
+# received it)
+#
+
+def _application(environ, start_response):
+        sysrepo = SysrepoP5p(environ, start_response)
+        result = sysrepo.execute()
+        return result, sysrepo
+
+
+class CloseGenerator(object):
+        """A wrapper class to ensure we have a close() method on the iterable
+        returned from the mod_wsgi application, see PEP333."""
+
+        def __init__(self, iterable, closeable):
+                self.__iterable = iterable
+                self.__closeable = closeable
+
+        def __iter__(self):
+                # if we haven't produced an iterable, that's
+                # likely because of an exception. Do nothing.
+                if not self.__iterable:
+                        return
+                for item in self.__iterable:
+                        yield item
+
+        def close(self):
+                try:
+                        if hasattr(self.__iterable, "close"):
+                                self.__iterable.close()
+                finally:
+                        self.__closeable.close()
+
+
+class AppWrapper(object):
+        """Wrap a callable application with this class in order for its results
+        to be handled by CloseGenerator when that callable is called."""
+
+        def __init__(self, application):
+                self.__application = application
+
+        def __call__(self, environ, start_response):
+                result, closeable = self.__application(environ, start_response)
+                return CloseGenerator(result, closeable)
+
+
+application = AppWrapper(_application)
+
+if __name__ == "__main__":
+        """A simple main function to allows us to test any given query/env"""
+        import urllib
+
+        def start_response(status, response_headers, exc_info=None):
+                """A dummy response function."""
+                print "responding with %s" % status
+                if exc_info:
+                        print traceback.format_exc(exc_info)
+
+        if len(sys.argv) != 3:
+                query = \
+                ("'pub=test&hash=de5acae11333890c457665379eec812a67f78dd3"
+                "&path=manifest/0/[email protected]%2C5.11-1%3A20110617T204846Z'")
+                alias = \
+                "de5acae11333890c457665379eec812a67f78dd3=/tmp/archive.p5p"
+                print "usage: sysrepo_p5p <query> <hash>=<path to p5p file>"
+                print "eg: ./sysrepo_p5p.py %s %s" % (query, alias)
+                sys.exit(2)
+
+        environ = {}
+
+        # unquote the url, so that we can easily copy/paste entries from
+        # Apache logs when testing.
+        environ["QUERY_STRING"] = urllib.unquote(sys.argv[1])
+        environ["SYSREPO_RUNTIME_DIR"] = os.environ["PWD"]
+        environ["PKG5_TEST_ENV"] = "True"
+        hsh, path = sys.argv[2].split("=")
+        environ[hsh] = path
+
+        for response in application(environ, start_response):
+                if isinstance(response, basestring):
+                        print response.rstrip()
+                elif response:
+                        for line in response.readlines():
+                                print line.rstrip()