author | Shawn Walker <shawn.walker@oracle.com> |
Sat, 16 Jul 2011 08:45:13 -0700 | |
changeset 2468 | ce77b64883c4 |
parent 429 | 6c9cbb6e6600 |
permissions | -rw-r--r-- |
429
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
1 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
2 |
pkg |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
3 |
SEARCH |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
4 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
5 |
1. Goals |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
6 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
7 |
i. Provide relevant information |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
8 |
ii. Provide a consistently fast response |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
9 |
iii. Make responses consistent between local and remote search |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
10 |
iv. Provide the user with a good interface to the information |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
11 |
v. Allow seamless recovery when search fails |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
12 |
vi. Ensure the index is (almost) always in a consistent state |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
13 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
14 |
2. Approach |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
15 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
16 |
From a high level, there are two components to search: the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
17 |
indexer, which maintains the information needed for search; the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
18 |
query engine, which actually performs a search of the information |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
19 |
provided. The indexer is responsible for creating and updating the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
20 |
indexes and ensuring they're always in a consistent state. It does this |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
21 |
by maintaining a set of inverted indexes as text files (details of which |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
22 |
can be found in the comments at the top of indexer.py). On the server |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
23 |
side, it's hooked into the publishing code so that the index is updated |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
24 |
each time a package is published. If indexing is already happening when |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
25 |
packages are published, they're queued and another update to the indexes |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
26 |
happens once the current run is finished. On the client side, it's |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
27 |
hooked into the install, image-update, and uninstall code so that each |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
28 |
of those actions are reflected in the index. |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
29 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
30 |
The query engine is responsible for processing the text from the user, |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
31 |
searching for that token in its information, and giving the client code |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
32 |
the information needed for a reasonable response to the user. It must |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
33 |
ensure that the information it uses is in a consistent state. On the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
34 |
server, an engine is created during the server initialization. It reads |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
35 |
in the files it needs and stores the data internally. When the server gets |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
36 |
a search request from a client, it hands the search token to the query |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
37 |
engine. The query engine ensures that it has the most recent information |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
38 |
(locking and rereading the files from disk if necessary) and then searches |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
39 |
for the token in its dictionaries. On the client, the process is the same |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
40 |
except that the indexes are read from disk each time instead of being stored |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
41 |
because a new instance of pkg is started for each search. |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
42 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
43 |
3. Details |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
44 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
45 |
Search reserves the $ROOT/index directory for its use on both the client |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
46 |
and the server. It also creates a TMP directory inside index which it stores |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
47 |
indexes in until it's ready to migrate them to the the proper directory. |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
48 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
49 |
indexer.py contains detailed information about the files used to store the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
50 |
index and their formats. |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
51 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
52 |
3.1 Locking |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
53 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
54 |
The indexes use a version locking protocol. The requirements for the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
55 |
protocol are: |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
56 |
the writer never blocks on readers |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
57 |
any number of readers are allowed |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
58 |
readers must always have consistent data regardless the |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
59 |
writer's actions |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
60 |
To implement these features, several conventions must be observed. The |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
61 |
writer is responsible for updating these files in another location, |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
62 |
then moving them on top of existing files so that from a reader's |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
63 |
perspective, file updates are always atomic. Each file in the index has |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
64 |
a version in the first line. The writer is responsible for ensuring that |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
65 |
each time it updates the index, the files all have the same version |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
66 |
number and that version number has not been previously used. The writer |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
67 |
is not responsible for moving multiple files atomically, but it should |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
68 |
make an effort to have files in $ROOT/index be out of sync for as short |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
69 |
a time as is possible. |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
70 |
|
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
71 |
The readers are responsible for ensuring that the files their reading |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
72 |
the indexes from are a consistent set (have identical version |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
73 |
numbers). consistent_open in search_storage takes care of this |
6c9cbb6e6600
983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff
changeset
|
74 |
functionality. |