![]() |
![]() |
ALLBASE/SQL Database Administration Guide: HP 9000 Computer Systems > Chapter 2 Logical Design![]() Arranging Data in Tables |
|
The basic data structure in an ALLBASE/SQL database is a table. Data is stored within a table as rows divided into columns. Indexes and views are created using the columns of a table. To design the tables in your database from normalized data, you must:
To design tables you need to know how the applications use data in SELECT, DELETE, INSERT, and UPDATE operations. Data that is deleted, inserted, or updated at the same time should be put in the same table. Use these criteria to start grouping your data into tables. To determine the composition of your tables, keep the following guidelines in mind:
The following paragraphs are a few more guidelines to help you determine table design. A row is the smallest unit you can delete or insert at a time. This means two columns should be placed in different tables if they can be inserted or deleted at different times. Assume a part name is never inserted or deleted unless a corresponding part number is inserted or deleted. Part name and part number should be in the same table. On the other hand, a vendor name can be inserted or deleted independent of, say, the order number. Therefore, the vendor name and order number should be placed in separate tables. Interactively you can only use the UPDATE statement on one table at a time. This means if you use a single statement to update one column based on the value of another column, both columns must be in the same table. The unit price of a part is updated as it pertains to a particular part number column; therefore, the unit price and part number columns should be in the same table. A join is a query that selects columns from at least two tables. If many applications request information that can logically reside in two tables, you may want to place the information in a single table to improve performance. A table with a large number of columns can impair performance as will several smaller tables that are joined frequently. The WHERE clause of the SELECT statement is used to specify the condition(s) under which rows are joined. ALLBASE/SQL allows joins on compatible data types, but for maximum efficiency, joins should be performed on identical columns.
Tables can be joined as Cartesian products where each row of one table is joined with every row in another table. However, in order for tables to be meaningfully joined, they must share a common column. If two tables do not share a common column, then a third table containing common columns for both tables must be introduced into the join, as shown in Figure 2-6 “Common Columns for Joins”. Several of the sample database tables must be joined to provide the required data to the users and applications. However, the normalized data does not allow some tables to be joined directly. Refer to Figure 2-3. The OrderItems table cannot be meaningfully joined with the Vendors table because there is no common column in the tables. To facilitate retrieval for those applications that use the Vendors and OrderItems tables, the VendorNumber column was added to the Orders table, as shown in Table 2-1. This allows a join to be made between the Vendors and OrderItems tables using the Orders table. Adding the VendorNumber column, of course, violates the third normal form. However, normalization is just a tool that helps you design efficient tables. It is up to the DBA to alter the table design to meet the needs of the users. For additional information on joins, including Cartesian products, refer to the "SQL Queries" chapter of the ALLBASE/SQL Reference Manual. The normalized data from the purchase order form produced the Vendors, Orders, OrderItems, and SupplyPrice tables. Integrated Peripherals, Inc. also designed a Parts table and an Inventory table to keep track of the internal parts. As a result, a total of six tables were designed from the defining and normalization phase. Columns are added to some of the tables to fit the user's needs. A VendorRemarks column is added to Vendors to keep comments for that vendor. A ReceivedQty column is added to the OrderItems table to denote when a shipment arrives. The resulting six tables are shown in Table 2-1. Table 2-1 Sample Database Tables
Each attribute of each entity is defined in ALLBASE/SQL as a column in a table. The "Data Types" chapter in the ALLBASE/SQL Reference Manual contains basic information on column names and data types. Use the information presented there and the following guidelines to define columns. A column is defined by specifying:
Be precise in your names and descriptions so no user can misunderstand a data element or its meaning. Each column name can be as long as 20 characters. Try to choose meaningful, unambiguous names. For example, a name of "Qty" in the OrderItems table could refer to order quantity or received quantity while "OrderQty" can refer only to order quantity. Long names, however, may become unwieldy in a report. In the sample database, the column and table names have been made as unambiguous as possible. A data type tells ALLBASE/SQL what type of data can be stored in the column and what can be done with the column. Thus, a column with an INTEGER data type can only have integers. A column with a numeric data type (FLOAT, DECIMAL, INTEGER, or SMALLINT) can be used in arithmetic operations. A column with an alphanumeric data type (CHAR or VARCHAR) can appear in a string operation such as a comparison using the LIKE predicate. Some guidelines for data types are:
Columns of type VARCHAR or VARBINARY may cause an extra tuple header to be stored. A tuple header is a description of the rows on a DBEFile page. If all rows on the page are the same, the header can be shared. A VARCHAR or VARBINARY column may be a different length in each row thus requiring each row to have its own tuple header. The calculations in Chapter 3, "Physical Design", assume that each row has its own tuple header. Refer to the ALLBASE/SQL Reference Manual chapter on "Data Types" for further explanation. The use of VARCHAR or VARBINARY data types can result in page shifting, that is, movement of data from one page to another when the size of the data changes. This can be a drain on performance. You should always define your columns to be large enough to hold the largest piece of information you expect it will ever hold. This helps to avoid restructuring of tables at a later time. Column size affects both physical design and logging. Rows are stored in 4096-byte DBEFile pages. After the space for a page header is used, 3996 bytes are left for data storage. During physical design row length is used in calculating the number of DBEFile pages needed to store data for a table. The sum of all columns plus a two byte overhead for each column is equal to the total row length:
where RL is row length, SC is the sum of the length of all columns in the table, and NC is the number of columns in the table. Since a page has 3996 bytes for storage, a row of 2000 to 3000 bytes is going to waste space by taking one half to three quarters of a page and leaving the rest of the page empty. Column size can be adjusted to use pages more efficiently, or a large table can be broken into two smaller tables to improve page use. This is also discussed under "Calculating Row Length" in the "Physical Design" chapter in this guide. Large columns that are modified frequently create proportionately large log records and consequently use more log file space. When you determine log file size in the "Physical Design" chapter, keep in mind the size of the columns that are being modified. Columns that might not always have data available should be allowed to contain null values. A column containing a null value does not store any data. Null values are distinguished from zeros and blank character data. If the UnitPrice column contains a null value, the price is considered unavailable. However, if the UnitPrice column contains a zero, the price is $0.00 and the item is free. The default when defining columns is to allow null values. Columns not permitted to contain null values must be created with the NOT NULL option. Some guidelines for null values are:
There are application and physical design implications with the use of null values. Application programs that use null values must declare special variables called indicator variables to handle the null values. Refer to the ALLBASE/SQL Application Programming Guide for the language you are using for details on using indicator variables. The use of null values can result in page shifting when the size of the row changes. This can be a drain on performance. For performance improvement, use the NOT NULL option on a column definition whenever possible. If a column is potentially null, SQLCore uses a two-byte overhead per column during query processing to check the null status of every selected column. Although SQLCore also needs to check the status for every inserted and updated value if the column is defined as NOT NULL, performance is better than if the column allowed null values. In addition, a two-byte tuple header is stored on disk for each tuple which has a NULL value if the inserted tuple's header does not match the first shared header on the DBEFile page. The data types and sizes of the columns for the OrderItems and Vendors tables are shown in Table 2-2. Table 2-2 Column Attributes for Two Tables
|