doc/search.txt
author Shawn Walker <shawn.walker@oracle.com>
Sat, 16 Jul 2011 08:45:13 -0700
changeset 2468 ce77b64883c4
parent 429 6c9cbb6e6600
permissions -rw-r--r--
18710 conditional dependencies can cause install and uninstall failure when dependency cannot be installed
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
429
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     1
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     2
pkg
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     3
SEARCH
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     4
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     5
1. Goals
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     6
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     7
   i.   Provide relevant information
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     8
   ii.  Provide a consistently fast response
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
     9
   iii. Make responses consistent between local and remote search
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    10
   iv.  Provide the user with a good interface to the information
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    11
   v.   Allow seamless recovery when search fails
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    12
   vi.  Ensure the index is (almost) always in a consistent state
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    13
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    14
2. Approach
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    15
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    16
   From a high level, there are two components to search: the 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    17
   indexer, which maintains the information needed for search; the 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    18
   query engine, which actually performs a search of the information 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    19
   provided. The indexer is responsible for creating and updating the 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    20
   indexes and ensuring they're always in a consistent state. It does this 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    21
   by maintaining a set of inverted indexes as text files (details of which
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    22
   can be found in the comments at the top of indexer.py). On the server 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    23
   side, it's hooked into the publishing code so that the index is updated 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    24
   each  time a package is published. If indexing is already happening when 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    25
   packages are published, they're queued and another update to the indexes 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    26
   happens once the current run is finished. On the client side, it's 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    27
   hooked into the install, image-update, and uninstall code so that each 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    28
   of those actions are reflected in the index.
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    29
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    30
   The query engine is responsible for processing the text from the user, 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    31
   searching for that token in its information, and giving the client code 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    32
   the information needed for a reasonable response to the user. It must 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    33
   ensure that the information it uses is in a consistent state. On the 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    34
   server, an engine is created during the server initialization. It reads 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    35
   in the files it needs and stores the data internally. When the server gets
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    36
   a search request from a client, it hands the search token to the query
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    37
   engine. The query engine ensures that it has the most recent information
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    38
   (locking and rereading the files from disk if necessary) and then searches
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    39
   for the token in its dictionaries. On the client, the process is the same
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    40
   except that the indexes are read from disk each time instead of being stored
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    41
   because a new instance of pkg is started for each search.
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    42
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    43
3. Details
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    44
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    45
   Search reserves the $ROOT/index directory for its use on both the client
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    46
   and the server. It also creates a TMP directory inside index which it stores
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    47
   indexes in until it's ready to migrate them to the the proper directory.
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    48
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    49
   indexer.py contains detailed information about the files used to store the
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    50
   index and their formats. 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    51
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    52
   3.1 Locking
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    53
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    54
       The indexes use a version locking protocol. The requirements for the
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    55
       protocol are: 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    56
		the writer never blocks on readers
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    57
		any number of readers are allowed
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    58
		readers must always have consistent data regardless the
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    59
			writer's actions
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    60
       To implement these features, several conventions must be observed. The
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    61
       writer is responsible for updating these files in another location,
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    62
       then moving them on top of existing files so that from a reader's
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    63
       perspective, file updates are always atomic. Each file in the index has
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    64
       a version in the first line. The writer is responsible for ensuring that
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    65
       each time it updates the index, the files all have the same version
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    66
       number and that version number has not been previously used. The writer
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    67
       is not responsible for moving multiple files atomically, but it should
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    68
       make an effort to have files in $ROOT/index be out of sync for as short
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    69
       a time as is possible.
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    70
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    71
       The readers are responsible for ensuring that the files their reading 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    72
       the indexes from are a consistent set (have identical version 
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    73
       numbers). consistent_open in search_storage takes care of this
6c9cbb6e6600 983 pkg search returns just one action per package/token-type combo
Brock Pytlik <bpytlik@sun.com>
parents:
diff changeset
    74
       functionality.