|
|
Chapter 1: Excerpted from SQL/400 Developer's Guide, (c) Paul Conte and Mike Cravitz, 29th Street Press, 2000. Chapter Overview
Databases and Database Management Systems This book teaches you how to use Structured Query Language, or SQL, which is a computer language used to define and manipulate databases. A database is a set of computer files for storing information that's used by a business or other organization. A typical business might keep information about customers and their orders, suppliers of materials, and employees who work for the company. Storing this type of information in computer files enables easy retrieval and updating as well as flexible analysis of the raw data to produce management reports, such as sales trends or average employee-benefit costs. Of course, how "easy" and "flexible" it is to work with the data is determined largely by how well the database has been set up and by the capabilities of the database management system (DBMS), which provides the software to store and update database contents. In this chapter, we look at the building blocks of DB2 Universal Database for AS/400, or UDB/400, the DBMS that runs on IBM AS/400 computer systems. In subsequent chapters, we'll explore how to use SQL to create and manipulate a UDB/400 database. UDB/400 is an integrated part of OS/400, the AS/400's operating system1, which means that you don't have to buy UDB/400 as a separate software product and that any AS/400 application you write can take advantage of UDB/400 features. UDB/400 is also the only relational DBMS that runs on the AS/400.2 Figure 1.1 shows a simplified view of how UDB/400 fits into the AS/400 architecture. As you can see, UDB/400 provides a DBMS layer that all high-level language (HLL) programs (i.e., RPG IV, Cobol, C/C++, and Java) use to access application data stored in UDB/400 files. You can also see that all utilities and remote applications for example, a Java applet running in a Web browser and using Java Database Connectivity (JDBC) or a Windows application running on a PC and using Open Database Connectivity (ODBC) must go through UDB/400 to access the AS/400 database. This integrated, uniform interface provides a high degree of consistency and control for AS/400 application developers. If you're familiar with other operating systems and DBMS products, you'll find that UDB/400 has features of both conventional operating systems' file-management facilities and relational DBMS products. For example, like a conventional file-management facility, UDB/400 lets you use built-in HLL input/output (I/O) operations, such as Cobol's Read and Write verbs, to access data. Like other relational DBMS products, UDB/400 lets you access the same data using SQL. This "dual" nature of UDB/400 is even expressed in the nomenclature: UDB/400 documentation for conventional file access uses the terms "file," "record," and "field," while documentation for SQL access uses the comparable terms "table," "record," and "column," respectively. In this chapter, we look briefly at both sides of UDB/400, considering it as a conventional file system and as an SQL-based relational DBMS. The remainder of the book concentrates on just the SQL perspective. At the end of this chapter, you'll find a list of additional resources you can use to learn more about other aspects of UDB/400. The AS/400 Integrated File System (IFS) The UDB/400 database is arguably the most important and widely used way to store data on an AS/400, but it's not the only way. Everything stored on an AS/400 is stored in the AS/400's integrated file system (IFS). The IFS is organized as a hierarchical directory structure that includes 11 distinct file systems, as depicted in Figure 1.2. One of these file systems, QSYS.LIB, contains record-structured files managed by UDB/400, as well as other types of OS/400 objects. These record-structured files can contain text, numeric, and other forms of data and can be read and updated by HLL programs and SQL. As a shorthand, we use the term "QSys files" to mean those database files in file system QSYS.LIB. Creating and manipulating QSys files with SQL is the main focus of this book. The "root" file system in the IFS provides a Windows-like directory structure for stream files that is, files that contain a sequence (or stream) of bytes that aren't organized by the operating system into separate records, as the bytes in QSys files are. Like QSys files, stream files can contain text or numeric data, but stream files are not the main database files used by UDB/400. Files in the root file system and some of the other non-QSYS.LIB file systems can be accessed by PCs and other computers connected to the AS/400; for the most part, these files have specialized purposes not directly related to UDB/400. As you'll see in Chapter 3, however, UDB/400 files can reference the contents of files in the non-QSYS.LIB file systems3, typically for image files and other types of data managed by non-AS/400 applications. You can learn more about the IFS and its file systems by consulting the resources listed at the end of this chapter. OS/400 Objects, Libraries, and User Profiles Because UDB/400 is an integral part of OS/400, it helps to have a general understanding of how OS/400 is organized and where UDB/400 fits in. Everything in the OS/400 operating system, including the database, is organized as objects. OS/400 identifies more than 80 types of objects that can be stored in the QSYS.LIB file system, including libraries, programs, database files, display and printer files, user profiles, message queues, SQL packages, and so forth.4 OS/400 also has a few object types for files outside the QSYS.LIB file system, but these object types aren't of central importance in developing typical AS/400 business applications. In the following discussion, we concentrate just on QSYS.LIB objects and, most important, on database files, which are what most AS/400 applications use. Like other operating systems (e.g., Unix), OS/400 stores program instructions, application data, and other system components on disk, loading them into main memory as needed. But, unlike most other operating systems, OS/400 doesn't let you get at the bytes on disk or in memory directly. Instead, you must always use specific commands or other system interfaces that are valid for each type of object. For example, you can't execute a database file or perform a file-update operation on a program object.5 The system prevents any attempt to use an invalid operation on an object altogether rather than let it proceed and possibly cause damage or produce undesirable results. This protection is uniform across all AS/400 operations, including user commands, application code, and operating-system operations. Object encapsulation is so integral to the AS/400 that there are no "back doors" that a system programmer can use to subvert the integrity of an OS/400 object.6 As we progress through UDB/400's capabilities, you'll learn about various types of OS/400 objects and the way to use them. QSys Objects, Libraries, and User Profiles OS/400 controls how you use an object by storing some descriptive information with the actual content of the object. Figure 1.3 depicts the storage layout (disk or memory) for an OS/400 object. As you can see, all objects have a header, object-specific contents (e.g., program instructions, file data), and an area known as the associated space, where system or user programs store miscellaneous data related to the object. The object header has a standard part and a type-specific part. All objects in the QSYS.LIB file system include at least the following information in the standard part of their header:
An AS/400 library is an object that contains other objects. Think of an AS/400 library as a Unix or MS-DOS directory or a Windows 9x/NT folder; the only difference is that you can't generally nest AS/400 libraries as you can directories or folders that is, a library can't contain another library object. There's one exception to this rule, however the QSys library, which contains all other library objects in the QSYS.LIB file system (as well as some additional objects).7 As a result, QSYS.LIB is organized as shown in Figure 1.4. In the discussion that follows, we'll consider only QSys objects that is, those objects contained in library QSys or in one of the libraries contained in QSys. (QSys objects are simply those objects in the QSYS.LIB file system.) QSys object names are generally up to 10 alphanumeric characters, beginning with a letter or a national character (e.g., $, #, @ in the United States). Thus, Customer might be the name of the customer master file object, and AppDta might be the name of the library that contains your application database files. A QSys object's qualified name is the combination of the name of the library that contains the object and the object's unqualified name, separated with a forward slash (/). For example, AppDta/Customer would be the qualified name of the customer master file if the file were stored in library AppDta. We've already touched on the notion of object type, and in QSys each object type is designated by a special value, such as *Pgm for program or *File for file. Some object types are further broken down into subtypes. The *File object type, for instance, includes physical files, logical files, printer files, display files, and communications files all of which are kinds of record-oriented sources or targets for external program data. In the rest of this chapter, we generally use the simple term "file" when it's clear from the context that we're talking about a physical or logical file. When it's necessary to refer to another specific type of file, we use an unambiguous term, such as "printer file." A QSys object is uniquely identified by the combination of its qualified name and its object type, which means OS/400 allows only one object on an AS/400 that has a given combination of library name, object name, and object type. As a result, you can have a Customer *File object in the AppDta library and another Customer *File object in the TstDta library, but you can't have two files with the same name in the same library. You can have two objects with the same name in the same library as long as they aren't the same type, but this practice is not a wise one, because of the potential for confusion. In general, you should give unique names to all the objects in the same library, regardless of their types. Each QSys object is owned by a user profile another type of AS/400 object. Each user profile stores information about a system user, including the user's name, password, and authority to access data or use system functions. Whenever you sign on, you supply a user profile name and password, and this lets OS/400 control your use of the system, including access to the database. Chapter 9 covers database security in more detail; the main things to know as you learn about creating and accessing database files are that each file object has a user profile designated as its owner and all access to the database is controlled based on the authority granted to one or more user profiles. Files, Record Formats, and Members Let's recap a couple of fundamental points we've covered: Everything on the AS/400 is stored in one of the 11 IFS file systems, and file system QSYS.LIB is where UDB/400 database file objects are stored. In simple terms, a database file object is a named collection of records. In the typical example depicted in Figure 1.5, the Customer file would contain a record for each of the company's customers. A record is a collection of fields, which are named items of data such as CustID (customer ID) and Name (customer name) that represent attributes of some item (e.g., a customer) of interest to the organization. (In a moment, you'll see how UDB/400 uses database files to implement SQL tables.) Your applications store business data in records in database files and subsequently read and update those records as needed. An OS/400 file object contains the object-header information we discussed previously. For files, this header is often referred to as the file description and includes a description of the file's record format the byte-by-byte layout of all the fields in the file's records.8 (Some kinds of logical files, which we discuss in the next section, have multiple record formats.) Figure 1.6 depicts a simple record format for the Customer file, along with a sample record. The record format is how your applications know where specific elements of data should be placed in a record that's to be added to the file or where specific elements of data should be retrieved from a record that's been read from the file. Once you've defined the record format for a file, your applications can reference all fields by their field names (e.g., Status); you don't have to worry about specific byte locations or other low-level storage details. You can organize the data in a particular file into one or more file members, each with its own member name.9 Single-member files are the most common organization you'll encounter in AS/400 applications, and typically the one member has the same name as the file (and, of course, holds all the file's data). When you use SQL to create a UDB/400 file, the file always has a single member with the same name as the file. Although multimember files are not commonly used with SQL applications, it's not uncommon to encounter older AS/400 applications that use multimember files and access these files with conventional HLL I/O statements (rather than SQL). One example of how a multimember file might be used is to store sales data in a file named Sale with one member for each year (e.g., Sale1999, Sale2000, and so on). UDB/400 provides features that let an application specify which file member should be used when the application is run. Thus, an application to display sales data could be run on the data for any particular year, just by specifying which file member to use. Despite the fact that you can't create a multimember file with SQL, you can use SQL to access any member of an existing multimember file. In a typical SQL application, however, you would normally partition data by using multiple (single-member) files rather than multiple members. For example, you might have files named Sale1999, Sale2000, and so forth. The effect is more or less equivalent to using multimember files. Physical and Logical Files As we mentioned earlier, there are two types of UDB/400 files:
You store application data in physical files. This is where the actual bytes are that represent numbers, text, and other kinds of information. UDB/400 takes care of low-level details such as reading and writing disk sectors. Application programs and database utility programs see the data in a physical file member as a sequence of records, as in Figure 1.7. Each record occupies a unique location in a member, and the records are not necessarily in any order based on their content. A record's location is identified by its relative record number (RRN), which starts at 1 for the first record in the member and increases by 1 for each location. When you delete a record, UDB/400 sets on an internal "deleted record" flag in the record's location. When you insert a new record, UDB/400 puts it either in the first available location with a "deleted record" flag set or after the last record in the file. As more space is needed for additional records, UDB/400 dynamically expands the file member size. A physical file always has just one record format, and all records in the same physical file (regardless of how many members the file has) have the same record layout. For many business applications, all the fields in a record will have a fixed length, and the resulting record layout consequently has a fixed length as well. UDB/400 also supports variable-length fields and records, a topic we'll take up in Chapter 3. Logical files provide an alternative way to access data in one or more physical files. You can use a logical file to
Figure 1.8 provides a conceptual view of the relationship between a logical file and a physical file. It's important to understand that logical files have no data in them; data is always stored in physical file members. Logical files do have members, however. For each logical file member, you specify which physical file member (or members) it spans. Logical files also have record formats. Although most logical files have a single record format, logical files can have multiple formats. Multiformat logical files aren't widely used any more, and SQL doesn't support them, so we don't cover them in this book. Subsequent chapters describe the relationship between logical and physical files in more detail in particular, the relationship between SQL views (which are logical files) and tables (which are physical files). File and Field Descriptions One thing that distinguishes UDB/400 from a traditional operating-system file-management facility is that every UDB/400 file object contains a description of itself. The file description includes the following items:
When you compile an AS/400 HLL program, the compiler reads the file descriptions for any files you declare in your programs. AS/400 HLLs have extended I/O-related statements or functions that take advantage of the fact that the compiler has this file information. For example, you don't have to declare a record layout in your RPG input specifications or your Cobol Data Division with the appropriate file declaration in your program, the compiler automatically generates RPG or Cobol source code for the file's record layout. As another example, the compiler can automatically use the correct fields for keyed record access (e.g., by customer ID) based on the key field (or fields) you define for a file. In addition, utility programs (e.g., report generators) can use file descriptions to determine a file's layout and keyed sequence (if any) without requiring the end user to enter anything other than the file's name. One of the most useful parts of the file description is the record format, which stores the following information for each field in the record:
A file's descriptive information is actually stored in two places: the file object's header and the SQL catalog, which is a set of system files. The SQL catalog is the ANSI-standard means of storing descriptions of database objects and was added to UDB/400 several releases after the AS/400 was first introduced. In case you're wondering, this is the reason UDB/400 supports two, somewhat redundant, mechanisms to store a file's descriptive information. UDB/400 makes sure that no matter how you create or change a file definition, consistent information is maintained in both places. Access Paths and Indexes So far, we've seen that UDB/400 stores data as a sequence of records in a physical file. UDB/400 also provides a variety of ways to access records. The two most important concepts are
An access path describes the order in which records can be retrieved. There are two types of UDB/400 access paths: arrival-sequence access path and keyed-sequence access path. An arrival-sequence access path is the order of records as they're stored in the database (i.e., by relative record number).
A keyed-sequence access path is the order of records based on ascending or descending values in one or more key fields that you specify when you create a physical file or when you create a keyed logical file based on the physical file. UDB/400 also supports access paths that select a subset of the records in a physical file. For example, an access path might include just those customer records with a ShipCity value of "Seattle". Internally, UDB/400 maintains one index per file member for any physical or logical file that has key fields, as well as for some logical files that specify record selection. Internal indexes are stored as part of an OS/400 file object. The file description includes the index description, if the file has an index.
In simple terms, an index includes an entry for each record in the file, and each entry has the record's key-field value (or values) and RRN. UDB/400 stores index entries in a way that makes it very fast to look up a key value and then use the associated RRN to retrieve the record. UDB/400 can also step through an index in order of the key values, using the series of associated RRNs to retrieve records in key sequence.10
As you might expect, your programs can always use an arrival-sequence access path to access records in a physical file; no special coding is required when you create the file. You can also define one keyed access path as part of a physical file, and UDB/400 will create an index for each member of the physical file. (Note that the definition of the keyed access path is the same for all a physical file's members, but each member has its own index.)
Using logical files, you can have multiple access paths for the same data in a physical file and, thus, can access it in various ways. Each logical file can have one keyed access-path definition. So, for example, if you wanted to retrieve customers in order by name or by address, you could use two logical files (assuming you didn't use either of these fields for the physical file's keyed access path, which would eliminate the need for one of the logical files). Like physical files, each logical file member has its own internal index, if the file has a keyed access path. Using SQL, you aren't required to create an index to retrieve records in a particular order. If no appropriate index exists, UDB/400 will either create a temporary index or sort the records on the fly.
You can read and write UDB/400 data with either the sequential access method or the direct access method.11 With sequential access, your program essentially performs a series of "read next record" operations to retrieve records. If you use an arrival-sequence access path, your program receives records in their physical order (UDB/400 automatically skips "deleted record" locations). If you use a keyed access path, your program receives records in the order defined by the key fields. With direct access, you specify either an RRN or a specific key value, and UDB/400 returns the specific record you've identified (if one exists, of course).
As you insert, delete, or update records in a physical file member, UDB/400 maintains the necessary entries in any indexes that exist for keyed access paths (this includes logical file members over the physical file member). Although you can tune database performance by choosing from several alternative methods of index maintenance, in general your applications can count on all keyed access paths reflecting the current contents of the database. Creating Files
The AS/400 provides three main ways to create UDB/400 files:12
Creating Files with SQL
With SQL, you can use the following statements to create and revise UDB/400 files: Figure 1.9 (below) shows an SQL Create Table statement to create a Customer table. Figure 1.9
In standard SQL terminology, a base table (or simply table) is the database object that actually contains the data. In UDB/400, an SQL table is a single-member physical file. An SQL view provides an alternative way to access data in one or more tables and in UDB/400 is a single-member logical file. An SQL index provides a keyed access path that can be used to improve data-access performance. In UDB/400, an SQL index is also a single-member logical file. (Chapter 3 explains in more detail how SQL tables, views, and indexes correspond to physical and logical files.)
The Create View and Create Index SQL statements create the respective objects. You use the Alter Table statement to change a table definition; to change views and indexes, you simply re-create them. The Comment On and Label On statements let you add comments and labels for tables, views, and indexes.
Consistent with the table-oriented terminology, SQL refers to records as rows and to fields as columns. Figure 1.10 shows a conceptual perspective of an SQL table.
Not surprisingly, this looks just like the table-like presentation of the Customer physical file in Figure 1.5, except for the use of "Columns" and "Rows" instead of "Fields" and "Records." Once we dive into SQL in the next chapter, we'll generally use the table-oriented terminology. In this chapter, we've mostly discussed the underlying AS/400 and UDB/400 architecture using file-oriented terminology because the AS/400 was originally designed with file objects, and the file-oriented terminology is still used today by many OS/400 Control Language (CL) commands and much of the documentation. Even when you're working with SQL, it's important to understand UDB/400 file concepts.
You can enter SQL statements in several ways. If you have the DB2 Query Manager and SQL Development Kit for AS/400 (SQL Development Kit for short) product installed, you can execute SQL statements either interactively using the Interactive SQL (ISQL) utility that comes with the SQL Development Kit or as embedded statements compiled into an HLL program (again using a feature of the SQL Development Kit). The latter approach presents an interesting aspect of UDB/400's support for SQL. Although you must buy the SQL Development Kit to get the interactive SQL interface and the facility that lets you embed SQL statements in HLL programs, all AS/400s include in UDB/400 the ability to run compiled HLL programs that were created using embedded SQL. You do not need the SQL Development Kit product on an AS/400 just to run an application that uses SQL. Creating Files with Operations Navigator Beginning with V4R4, IBM's Client Access Express AS/400-to-PC connectivity product also provides a Windows-based tool, Operations Navigator, that has a graphical interface for creating new database objects. Figure 1.11 shows a sample Operations Navigator New Table dialog box.
We cover Operations Navigator features in a bit more detail in Chapter 2. Creating Files with DDS
When the AS/400 was first introduced, IBM provided the proprietary Data Description Specifications (DDS) language to, as its name suggests, describe file data. You use a source code editor, such as
Source Entry Utility (SEU) (covered in Appendix D), to enter DDS statements into a source file member.13 Figure 1.12 (below) shows some of the DDS source code for the Customer physical file. Figure 1.12
The syntax of DDS is fairly simple. Each line is split into a number of fixed-width columns. The last column (positions 45-80) provides a free-format area where you can place keyworded entries. You can learn about DDS facilities to define physical and logical files by consulting the resources listed at the end of this chapter.
After you've entered the DDS to define a file, you execute one of the following OS/400 CL commands to create the file object:
Figure 1.13 illustrates the process of creating a new file object from DDS. To change an existing file definition, you edit its DDS source and then execute a ChgPf (Change Physical File) command. Most AS/400 development installations also have the IBM Application Development ToolSet product installed, which provides a utility known as Programming Development Manager (PDM). PDM provides a list-based, interactive interface for working with AS/400 libraries, objects, and source file members. PDM options are available to simplify execution of the CrtPf and CrtLf commands. Appendix C provides an introduction to PDM.
Whether you create a database file with SQL, Operations Navigator, or DDS, you get the same type of OS/400 object a physical or logical file.14 What's more, you can create a file with Operations Navigator or DDS and then read and update the contents with SQL or create a file with SQL or Operations Navigator and read and update the contents with built-in HLL I/O operations. Because UDB/400 includes as part of the "native" file support features that SQL requires, there is no need to add a separate DBMS layer just to support applications written with SQL. Accessing Files from HLL Programs
When you want to access UDB/400 data, your application performs three steps: In some cases, you use explicit HLL statements, such as Open and Close in RPG or Cobol, to open and close a file member. In other cases, such as with RPG's built-in cycle, the HLL runtime does this for you automatically.
Note:
Many types of SQL statements for example, some forms of the Update statement open and close the appropriate file member implicitly. SQL also provides a construct known as a cursor, which your application opens and closes explicitly, again with the result that a file member is opened and closed.
Whether you open a file member implicitly or explicitly, UDB/400 creates a temporary internal control structure known as an open data path (ODP), which your program uses to access the records in a file member. UDB/400 uses information from the file description (e.g., the record layout and access path description) to set up an ODP. Then, during your program's execution, HLL or SQL runtime routines and UDB/400 use the ODP for purposes such as keeping track of your position in the file and locking records to avoid conflicting updates by multiple users. To understand some of UDB/400's more advanced features, you need to understand the role of ODPs; however, for most of the topics covered in this book, the ODP is taken for granted, and we can treat UDB/400 files as if HLL and SQL operations operated on them directly, rather than through the ODP. Chapter Summary
A database is a set of computer files used to store business information. A database management system (DBMS) is the system software for creating database files and updating their contents. DB2 Universal Database for AS/400 (UDB/400) is the AS/400's integrated DBMS and is part of OS/400, the AS/400's operating system.
Everything stored on an AS/400 is stored in the AS/400's integrated file system (IFS), which is organized as a hierarchical directory structure that includes 11 distinct file systems. The QSYS.LIB file system contains all AS/400 programs and the database files used by UDB/400.
Everything in OS/400 is an object. More than 80 OS/400 object types exist, including programs and database files. An object contains a header and the actual content (e.g., program instructions, file data) of the object. OS/400 lets you use objects only through commands or other system interfaces that are valid for the specific type of object. Library objects (which are used only in the QSYS.LIB file system) contain other types of OS/400 objects, except other library objects. User profile objects own objects. In the QSYS.LIB file system, an object is uniquely identified by its library name, object name, and object type.
Two types of UDB/400 database file objects exist: physical files, which contain data, and logical files, which provide alternative ways to access data in physical files. File objects contain a file description, which includes the record format (or formats) of the records in the file. The record format describes the type, length, and other attributes of the fields in a record. The data in a physical file is organized into one or more members, all of which have the same record format. Logical files also have members; each logical file member provides access to the data in one or more underlying physical file members.
UDB/400 has two types of access paths: arrival sequence, which orders records by their relative location in a physical file member, and keyed, which orders records by values in the records' key fields. You can read and write records sequentially (in order of the access path) or directly (by specific relative record number or key value).
You can create database files using SQL statements, Operations Navigator dialog boxes, or Data Description Specifications (DDS). Files created in any of these ways can generally be used interchangeably.
SQL uses a table-oriented terminology to describe database objects. A table contains data, a view provides an alternative way to access data in one or more tables, and an index provides a keyed access path that can be used to improve data-access performance. On the AS/400, when you create an SQL table, UDB/400 creates a physical file; for a view or SQL index, UDB/400 creates a logical file.
To access UDB/400 data from a high-level language (HLL) program, you open a file member, process the records, and then close the file member. When you open a file member, UDB/400 creates an open data path (ODP) that your program and UDB/400 use to keep track of the file position and other runtime information.
Exercises
Footnotes
If you have a question about our products or about an order you have placed, please contact the customer service department at (800) 650-1804 or (970) 203-2914 or via e-mail. If you have a technical question about this site, please contact us. Terms & Conditions | Privacy Policy Copyright © 2009 Penton Technology Media Loveland. |