src/modules/file_layout/layout.py
author Rich Burridge <rich.burridge@sun.com>
Mon, 30 Nov 2009 13:01:40 -0800
changeset 1516 8c950a3b4171
parent 1452 bd6ffa78fed9
permissions -rw-r--r--
10485 move pkg(5) to Python 2.6 10482 upgrade to cherrypy 3.1.2 11836 shebang line for python modules should be python version-agnostic 11950 ldtp used by pkg build process not setup to easily use Python 2.6 11989 pkg python dependency analysis tests fail
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1516
8c950a3b4171 10485 move pkg(5) to Python 2.6
Rich Burridge <rich.burridge@sun.com>
parents: 1452
diff changeset
     1
#!/usr/bin/python
1452
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     2
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     3
# CDDL HEADER START
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     4
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     5
# The contents of this file are subject to the terms of the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     6
# Common Development and Distribution License (the "License").
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     7
# You may not use this file except in compliance with the License.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     8
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     9
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    10
# or http://www.opensolaris.org/os/licensing.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    11
# See the License for the specific language governing permissions
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    12
# and limitations under the License.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    13
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    14
# When distributing Covered Code, include this CDDL HEADER in each
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    15
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    16
# If applicable, add the following below this CDDL HEADER, with the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    17
# fields enclosed by brackets "[]" replaced with your own identifying
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    18
# information: Portions Copyright [yyyy] [name of copyright owner]
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    19
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    20
# CDDL HEADER END
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    21
#
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    22
# Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    23
# Use is subject to license terms.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    24
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    25
"""object to map content hashes to file paths
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    26
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    27
The Layout class hierarchy encapsulates bijective mappings between a hash
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    28
(or file name since those are equivalent in our system) and a relative path
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    29
that describes where to place that file in the file system.  This bijective
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    30
relation should hold when the union of all layouts is considered as a single
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    31
set of mappings.  In practical terms, this means that only one layout may
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    32
potentially deposit a hash into any particular location.  This is not a
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    33
difficult requirement to satisfy since each layout may append a unique
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    34
identifier to the file name or choose to carve out its own namespace at some
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    35
level of directory hierarchy.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    36
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    37
The V1Layout places each file into a single layer of 256 directories.  A
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    38
fanout of 256 provides good performance compared to the other layouts
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    39
tested.  It also allows over 8M files to be stored even with filesystems
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    40
which limit the number of files in a directory to 65k.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    41
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    42
The V0Layout layout uses two layers of directories; the first has a fanout
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    43
of 256 while the second has a fanout of 16M.  This layout has the problem
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    44
that for the sizes of images (on the order of 300-500k files) and repos (on
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    45
the order of 1M files), the second director level usually contains a single
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    46
file.  This imposes a substantial penalty for removing or resyncing the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    47
directories because a readdir(3C) must be done for each directory and
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    48
readdir is two orders of magnitude slower than the open or read ZFS
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    49
operations, and one order of magnitude slower than ZFS remove.  Reducing
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    50
the number of directories used to hold the downloaded files was a goal for
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    51
the next layout.
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    52
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    53
To evaluate a layout, it is necessary to measure the insertion time, the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    54
removal time, and the time to open a random file.  The insertion time
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    55
affects the publication speed.  The removal time effects the time a client
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    56
may take to clear its download cache.  The access time effects how quickly
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    57
a server can open a file to serve it.  File sizes from 1 to 10M were used
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    58
to asses the scalability of the different layouts."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    59
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    60
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    61
import os
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    62
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    63
class Layout(object):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    64
        """This class is the parent class to all layouts. It defines the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    65
        interface which those subclasses must satisfy."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    66
        
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    67
        def lookup(self, hashval):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    68
                """Return the path to the file with name "hashval"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    69
                raise NotImplementedError
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    70
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    71
        def path_to_hash(self, path):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    72
                """Return the hash which would map to "path"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    73
                raise NotImplementedError
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    74
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    75
        def contains(self, rel_path, file_name):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    76
                """Returns whether this layout would place a file named
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    77
                "file_name" at "rel_path"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    78
                return self.lookup(file_name) == rel_path
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    79
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    80
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    81
class V0Layout(Layout):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    82
        """This class implements the original layout used.  It uses a 256 way
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    83
        split (2 hex digits) followed by a 16.7M way split (6 hex digits)."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    84
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    85
        def lookup(self, hashval):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    86
                """Return the path to the file with name "hashval"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    87
                return os.path.join(hashval[0:2], hashval[2:8], hashval)
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    88
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    89
        def path_to_hash(self, path):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    90
                """Return the hash which would map to "path"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    91
                return os.path.basename(path)
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    92
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    93
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    94
class V1Layout(Layout):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    95
        """This class implements the new layout approach which is a single 256
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    96
        way fanout using the first two digits of the hash."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    97
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    98
        def lookup(self, hashval):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    99
                """Return the path to the file with name "hashval"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   100
                return os.path.join(hashval[0:2], hashval)
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   101
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   102
        def path_to_hash(self, path):
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   103
                """Return the hash which would map to "path"."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   104
                return os.path.basename(path)
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   105
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   106
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   107
def get_default_layouts():
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   108
        """This function describes the default order in which to use the
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   109
        layouts defined above."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   110
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   111
        return [V1Layout(), V0Layout()]
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   112
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   113
def get_preferred_layout():
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   114
        """This function returns the single preferred layout to use."""
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   115
bd6ffa78fed9 7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
   116
        return V1Layout()