author | Rich Burridge <rich.burridge@sun.com> |
Mon, 30 Nov 2009 13:01:40 -0800 | |
changeset 1516 | 8c950a3b4171 |
parent 1452 | bd6ffa78fed9 |
permissions | -rw-r--r-- |
1516
8c950a3b4171
10485 move pkg(5) to Python 2.6
Rich Burridge <rich.burridge@sun.com>
parents:
1452
diff
changeset
|
1 |
#!/usr/bin/python |
1452
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
2 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
3 |
# CDDL HEADER START |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
4 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
5 |
# The contents of this file are subject to the terms of the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
6 |
# Common Development and Distribution License (the "License"). |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
7 |
# You may not use this file except in compliance with the License. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
8 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
9 |
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
10 |
# or http://www.opensolaris.org/os/licensing. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
11 |
# See the License for the specific language governing permissions |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
12 |
# and limitations under the License. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
13 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
14 |
# When distributing Covered Code, include this CDDL HEADER in each |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
15 |
# file and include the License file at usr/src/OPENSOLARIS.LICENSE. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
16 |
# If applicable, add the following below this CDDL HEADER, with the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
17 |
# fields enclosed by brackets "[]" replaced with your own identifying |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
18 |
# information: Portions Copyright [yyyy] [name of copyright owner] |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
19 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
20 |
# CDDL HEADER END |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
21 |
# |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
22 |
# Copyright 2009 Sun Microsystems, Inc. All rights reserved. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
23 |
# Use is subject to license terms. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
24 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
25 |
"""object to map content hashes to file paths |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
26 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
27 |
The Layout class hierarchy encapsulates bijective mappings between a hash |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
28 |
(or file name since those are equivalent in our system) and a relative path |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
29 |
that describes where to place that file in the file system. This bijective |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
30 |
relation should hold when the union of all layouts is considered as a single |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
31 |
set of mappings. In practical terms, this means that only one layout may |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
32 |
potentially deposit a hash into any particular location. This is not a |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
33 |
difficult requirement to satisfy since each layout may append a unique |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
34 |
identifier to the file name or choose to carve out its own namespace at some |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
35 |
level of directory hierarchy. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
36 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
37 |
The V1Layout places each file into a single layer of 256 directories. A |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
38 |
fanout of 256 provides good performance compared to the other layouts |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
39 |
tested. It also allows over 8M files to be stored even with filesystems |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
40 |
which limit the number of files in a directory to 65k. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
41 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
42 |
The V0Layout layout uses two layers of directories; the first has a fanout |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
43 |
of 256 while the second has a fanout of 16M. This layout has the problem |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
44 |
that for the sizes of images (on the order of 300-500k files) and repos (on |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
45 |
the order of 1M files), the second director level usually contains a single |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
46 |
file. This imposes a substantial penalty for removing or resyncing the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
47 |
directories because a readdir(3C) must be done for each directory and |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
48 |
readdir is two orders of magnitude slower than the open or read ZFS |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
49 |
operations, and one order of magnitude slower than ZFS remove. Reducing |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
50 |
the number of directories used to hold the downloaded files was a goal for |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
51 |
the next layout. |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
52 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
53 |
To evaluate a layout, it is necessary to measure the insertion time, the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
54 |
removal time, and the time to open a random file. The insertion time |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
55 |
affects the publication speed. The removal time effects the time a client |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
56 |
may take to clear its download cache. The access time effects how quickly |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
57 |
a server can open a file to serve it. File sizes from 1 to 10M were used |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
58 |
to asses the scalability of the different layouts.""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
59 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
60 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
61 |
import os |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
62 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
63 |
class Layout(object): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
64 |
"""This class is the parent class to all layouts. It defines the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
65 |
interface which those subclasses must satisfy.""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
66 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
67 |
def lookup(self, hashval): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
68 |
"""Return the path to the file with name "hashval".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
69 |
raise NotImplementedError |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
70 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
71 |
def path_to_hash(self, path): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
72 |
"""Return the hash which would map to "path".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
73 |
raise NotImplementedError |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
74 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
75 |
def contains(self, rel_path, file_name): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
76 |
"""Returns whether this layout would place a file named |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
77 |
"file_name" at "rel_path".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
78 |
return self.lookup(file_name) == rel_path |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
79 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
80 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
81 |
class V0Layout(Layout): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
82 |
"""This class implements the original layout used. It uses a 256 way |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
83 |
split (2 hex digits) followed by a 16.7M way split (6 hex digits).""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
84 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
85 |
def lookup(self, hashval): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
86 |
"""Return the path to the file with name "hashval".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
87 |
return os.path.join(hashval[0:2], hashval[2:8], hashval) |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
88 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
89 |
def path_to_hash(self, path): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
90 |
"""Return the hash which would map to "path".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
91 |
return os.path.basename(path) |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
92 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
93 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
94 |
class V1Layout(Layout): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
95 |
"""This class implements the new layout approach which is a single 256 |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
96 |
way fanout using the first two digits of the hash.""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
97 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
98 |
def lookup(self, hashval): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
99 |
"""Return the path to the file with name "hashval".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
100 |
return os.path.join(hashval[0:2], hashval) |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
101 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
102 |
def path_to_hash(self, path): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
103 |
"""Return the hash which would map to "path".""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
104 |
return os.path.basename(path) |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
105 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
106 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
107 |
def get_default_layouts(): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
108 |
"""This function describes the default order in which to use the |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
109 |
layouts defined above.""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
110 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
111 |
return [V1Layout(), V0Layout()] |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
112 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
113 |
def get_preferred_layout(): |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
114 |
"""This function returns the single preferred layout to use.""" |
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
115 |
|
bd6ffa78fed9
7960 client and depot need different organization of files
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
116 |
return V1Layout() |