08-05-2009, 11:16 PM
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington, November 06 - 08, 2006). 205-218.
* * *
Bigtable, a key component in Google's software hierarchy, constructs an abstraction similar to tables above the underlying file system (GFS).
To handle large data sets, Bigtable partitions data into shards managed by tablets. The data shards are stored as SSTables (String to String tables) on GFS. Chubby (http://baijia.info/showthread.php?tid=59) manages some metadata (schemas) and arbitrates a few crucial concurrent operations. A "cluster management system for scheduling jobs" controls the computation tasks in Bigtable operations. This cluster management system should have a similar purpose to the Workqueue mentioned in Sawzall (http://baijia.info/showthread.php?tid=60).
Bigtable supports atomicity on a single row, but not complex multirow operations, as discussed in Section 3.
Table 2 lists a few Bigtable instances, with the largest being a 800TB table for Crawl.
* * *
Bigtable, a key component in Google's software hierarchy, constructs an abstraction similar to tables above the underlying file system (GFS).
To handle large data sets, Bigtable partitions data into shards managed by tablets. The data shards are stored as SSTables (String to String tables) on GFS. Chubby (http://baijia.info/showthread.php?tid=59) manages some metadata (schemas) and arbitrates a few crucial concurrent operations. A "cluster management system for scheduling jobs" controls the computation tasks in Bigtable operations. This cluster management system should have a similar purpose to the Workqueue mentioned in Sawzall (http://baijia.info/showthread.php?tid=60).
Bigtable supports atomicity on a single row, but not complex multirow operations, as discussed in Section 3.
Table 2 lists a few Bigtable instances, with the largest being a 800TB table for Crawl.