The OtterServer includes two pattern databases:
- Reference Data – These patterns are dynamically determined from the individuals defined in the OWL documents loaded when the server comes up. These patterns are stored in memory and are read-only.
- Master Data – These patterns are also created dynamically. They are created as updates are applied to the repository.
Reference Data
The Reference Data may only be updated by changing the individuals within their respective OWL documents. Also, object properties cannot be made between Reference Data and Master Data. This is not an option, since the domain individuals and range individuals of an object property are both updated when creating an object property link. Reference Data is read-only and cannot be directly updated.
Master Data
The Master Data patterns resemble tables in a relational database. Each pattern has a fixed set of columns, with each containing an object property link or data property value.
There are some significant differences between patterns and relational database tables.
- All columns in a pattern contain a value where relational database tables may have missing values.
- In a relational database, database administrators define the tables, monitor performance, and apply changes as the business requires. In the pattern-based DB of Otter, the patterns are automatically created and removed as the content directs.
Storage & Update
The data storage for each set of individuals in a pattern is a sequential file. For queries, this form allows for asynchronous access of individuals from multiple processes. As found in the IBM report referenced in the post, “Grouping Individuals in Patterns”, large files are cut down in size. The resulting files in most cases are relatively small. These files are accessed with minimal IO in comparison to reading a relational database table.
The file is organized in sequential blocks. When the file is first created, the odd numbered blocks are used to store the individuals. When an update is performed, the odd numbered blocks are read and copied to the even numbered blocks when an update is processed. After the update, all access reads the even numbered blocks. And, when the next updated is applied, the even numbered blocks are read sequentially and the odd numbered blocks are written out sequentially. The odd numbered blocks are then accessed for read. After an update, the file to be read by the reads that follow is switched.
The effective use of the space allocated for a pattern is to support two files as describe above. Each file is given alternate blocks for storing data as shown in the example below:
In this diagram, File 1 processes sequentially block 1, block 3, and block 5. File 2 processes sequentially block 2, block 4, and block 6.
The individual records are considered to be variable in length. Consequentially, individual data will not fit evenly within a block and may be split across multiple blocks. Also, the individuals are ordered by their unique ids, so the update process is actually a simple match / merge operation.
During an update, an individual may change so that its pattern no longer fits its current storage pattern. When this occurs, the individual is removed from its current pattern and moved to its new pattern. If the new pattern does not exist, then the new pattern is created. If all of the individuals within a pattern are deleted, then the pattern is removed.
This dynamic change of Master Data patterns allows the database to automatically adapt to changes in the individuals stored. Not only does this significantly reduce administration, it also can provide insight into a better understanding of the data. The patterns may in fact represent architypes within the domain of the patterns.
ACID Transactions
Master Data updates meet the requirements of ACID:
- Atomicity – A single graph update can apply all changes needed to complete a transaction.
- Consistency – All patterns in an update must be successful before changes are committed. If one pattern is inconsistent, then the entire update is rolled back.
- Isolation – The database locks patterns for both read and update. A pattern locked for read cannot be updated, yet there may be multiple parallel reads. A pattern locked for update cannot be read or updated by another process.
- Durability – All database status information is stored on external storage. If there is a system failure, all data persists.
Summary
The OtterServer separates static information, Reference Data, and the dynamic information, Master Data. Static information is derived from the OWL documents where the dynamic information is accommodated on external storage.
Master Data may be accessed simultaneously by multiple queries and updates. Updates meet all the requirements of ACID transactions.
As a note: The current implementation of both the Reference Data and the Master Data utilize a common DB interface. This allows for the addition of other forms of databases to be used with the OtterServer.