|
|
Home » Starter Kit » TOC » Chapter 15
Chapter 15 - Backup Basics The most valuable component of any computer system isn’t the hardware or software that runs the computer but, rather, the data that resides on the system. If a system failure or disaster occurs, you can replace the computer hardware and software that runs your business. Your company’s data, however, is irreplaceable. For this reason, it’s critical to have a good backup and recovery strategy. Companies go out of business when their data can’t be recovered. What should you be backing up? The simple answer to this question is that you should back up everything. A basic rule of backup and recovery is that if you don’t save it, it doesn’t get restored. However, you may have some noncritical data (e.g., test data) on your system that doesn’t need to be restored and can be omitted from your backup. When and how often do you need to back up? Ideally, saving your entire system every night is the simplest and safest backup strategy. This approach also gives you the simplest and safest strategy for recovery. Realistically, though, when and how you run your backup, as well as what you back up, depend on the size of your backup window — the amount of time your system can be unavailable to users while you perform a backup. To simplify recovery, you need to back up when your system is at a known point and your data isn’t changing. When you design a backup strategy, you need to balance the time it takes to save your data with the value of the data you might lose and the amount of time it may take to recover. Always keep your recovery strategy in mind as you design your backup strategy. If your system is so critical to your business that you don’t have a manageable backup window, you probably can’t afford an unscheduled outage either. If this is your situation, you should seriously evaluate the availability options of the iSeries, including dual systems. For more information about these options, see “Availability Options.” Designing and Implementing a Backup Strategy You should design your backup strategy based on the size of your backup window. At the same time you design your backup strategy, you should also design your recovery strategy to ensure that your backup strategy meets your system recovery needs. The final step in designing a backup strategy is to test a full system recovery. This is the only way to verify that you’ve designed a good backup strategy that will meet your system recovery needs. Your business may depend on your ability to recover your system. You should test your recovery strategy at your recovery services provider’s location. When designing your backup and recovery strategy, think of it as a puzzle: The fewer pieces you have in the puzzle, the more quickly you can put the pieces of the puzzle together. The fewer pieces needed in your backup strategy, the more quickly you can recover the pieces. Your backup strategy will typically be one of three types:
A simple way to ensure you have a good backup of your system is to use the options provided on menu SAVE ( Figure 15.1), which you can reach by typing Go Save on a command line. This command presents you with additional menus that make it easy either to back up your entire system or to split your entire system backup into two parts: system data and user data. In the following discussion of backup strategies, the menu options I refer to are from menu SAVE. Implementing a Simple Backup Strategy The simplest backup strategy is to save everything daily whenever there is no system activity. You can use SAVE menu option 21 (Entire system) to completely back up your system (with the exception of queue entries such as spooled files). You should also consider using this option to back up the entire system after installing a new release, applying PTFs, or installing a new licensed program product. As an alternative, you can use SAVE menu option 22 (System data only) to save just the system data after applying PTFs or installing a new licensed program product. Option 21 offers the significant advantage that you can schedule the backup to run unattended (with no operator intervention). Keep in mind that unattended save operations require you to have a tape device capable of holding all your data. (For more information about backup media, see “Preparing and Managing Your Backup Media.”) Even if you don’t have enough time or enough tape-device capability to perform an unattended save using option 21, you can still implement a simple backup strategy: Daily
backup: Back up only user data
that changes frequently. A simple backup strategy may also involve SAVE menu option 23 (All user data). This option saves user data that can change frequently. You can also schedule option 23 to run without operator intervention. If your system has a long period of inactivity on weekends, your backup strategy might look like this: Friday night: Entire system (option 21) Implementing a Medium Backup Strategy You may not have a large enough backup window to implement a simple backup strategy. For example, you may have large batch jobs that take a long time to run at night or a considerable amount of data that takes a long time to back up. If this is your situation, you’ll need to implement a backup and recovery strategy of medium complexity. When developing a medium backup strategy, keep in mind that the more often your data changes, the more often you need to back it up. You’ll therefore need to evaluate in detail how often your data changes. Several methods are available to you in developing a medium backup strategy:
You can use one or a combination of these methods. Saving changed objects. Several commands let you save only the data that has changed since your last save operation or since a particular date and time. You can use the SavChgObj (Save Changed Objects) command to save only those objects that have changed since a library or group of libraries was last saved or since a particular date and time. This approach can be useful if you have a system environment in which program objects and data files exist in the same library. Typically, data files change very frequently, while program objects change infrequently. Using the SavChgObj command, you can save just the data files that have changed. The SavDLO (Save Document Library Objects) command lets you save documents and folders that have changed since the last save or since a particular date and time. You can use SavDLO to save changed documents and folders in all your user auxiliary storage pools (ASPs) or in a specific user ASP. You can use the Sav (Save) command to save only those objects in directories that have changed since the last save or since a particular date or time. You can also choose to save only your changed data, using a combination of the SavChgObj, SavDLO, and Sav commands, if the batch workload on your system is heavier on specific days of the week. For example:
Journaling objects and saving the journal receivers. If your save operations take too long because your files are large, saving changed objects may not help in your system environment. For instance, if you have a file member with 100,000 records and one record changes, the SavChgObj command saves the entire file member. In this environment, journaling your database files and saving the journal receivers regularly may be a better solution. However, keep in mind that this approach will make your recovery more complex. When you journal a database file, the system writes a copy of every changed record to a journal receiver. When you save a journal receiver, you’re saving only the changed records in the file, not the entire file. If you journal your database files and have a batch workload that varies, your backup strategy might look like this:
To take full advantage of journaling protection, you should detach and save the journal receivers regularly. The frequency with which you save the journal receivers depends on the number of journaled changes that occur on your system. Saving the journal receivers several times during the day may be appropriate for your system environment. The way in which you save journal receivers depends on whether they reside in a library with other objects. Depending on your environment, you’ll use either the SavLib (Save Library) command or the SavObj (Save Object) command. It’s best to keep your journal receivers isolated from other objects so that your save/restore functions are simpler. Be aware that you must save a new member of a database file before you can apply journal entries to the file. If your applications regularly add new file members, you should consider using the SavChgObj strategy either by itself or in combination with journaling. Saving groups of user libraries, folders, or directories. Many applications are set up with data files and program objects in different libraries. This design simplifies your backup and recovery procedures. Data files change frequently, and, on most systems, program objects change infrequently. If your system environment is set up like this, you may want to save only the libraries with data files on a daily basis. You can also save, on a daily basis, groups of folders and directories that change frequently. Implementing a Complex Backup Strategy If you have a very short backup window that requires a complex strategy for backup and for recovery, you can use some of the same techniques described for a medium backup strategy, but with a greater level of detail. For example, you may need to save specific critical files at specific times of the day or week. Several other methods are available to you in developing a complex backup strategy. You can use one or a combination of these methods:
Before you use any of these methods, you must have a complete backup of your entire system. Saving data concurrently using multiple tape devices. You can reduce the amount of time your system is unavailable by performing save operations on more than one tape device at a time. For example, you can save libraries to one tape device, folders to another tape device, and directories to a third tape device. Or you can save different sets of libraries, objects, folders, or directories to different tape devices. Later, I provide more information about saving data concurrently using multiple tape devices. Saving data in parallel using multiple tape devices. Starting with V4R4, you can perform a parallel save using multiple tape devices. A parallel save is intended for very large objects or libraries. With this method, the system “spreads” the data in the object or library across multiple tape devices. (This function is implemented with IBM’s Backup, Recovery and Media Services product; for more information about it, see “Backup, Recovery and Media Services (BRMS) Overview” [Chapter 16].) Save-While-Active. The save-while-active process can significantly reduce the amount of time your system is unavailable during a backup. If you choose to use save-while-active, make sure you understand the process and monitor for any synchronization checkpoints before making your objects available for use. I provide more details about save-while-active later. An Alternative Backup Strategy Another option available to help implement your backup strategy is the Backup, Recovery and Media Services licensed program product. BRMS is IBM’s strategic OS/400 backup and recovery product on the iSeries and AS/400. BRMS is a comprehensive tool for managing the backup, archiving, and recovery environment for one or more servers in a site or across a network in which data exchange by tape is required. For more information about using BRMS to implement your backup strategy, see “Backup, Recovery and Media Services (BRMS) Overview.” [Chapter 16] The Inner Workings of Menu SAVE Menu SAVE contains many options for saving your data, but four are primary:
You can use these menu options to back up your system. Or, if your installation requires a more complex backup strategy, you can use OS/400’s save commands in a CL program to customize your backup. To help you make your decision, as well as to provide skeleton code that you can use as a guideline for your own backup programs, this section provides a look at some of the inner workings of these primary save options. For detailed instructions and a checklist on using these options, refer to OS/400 Backup and Recovery (SC41-5304). Figure 15.2 illustrates the save commands and the SAVE menu options you can use to save the parts of the system and the entire system. Entire System (Option 21) SAVE menu Option 21 lets you perform a complete backup of all the data on your system, with the exception of backing up spooled files (I cover spooled file backup later). This option puts the system into a restricted state. This means no users can access your system while the backup is running. It’s best to run this option overnight for a small system or during the weekend for a larger system. Option 21 runs program QMNSave. The following CL program extract represents the significant processing that option 21 performs:
EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr) +
Dlvry(*Break or *Notify)
SavSys
SavLib Lib(*NonSys) AccPth(*Yes)
SavDLO DLO(*All) Flr(*Any)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD') +
Obj(('/*') +
('/QSYS.LIB' *Omit) +
('/QDLS' *Omit)) +
UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)
Note: The Sav command omits the QSys.Lib file system because the SavSys (Save System) command and the SavLib Lib(*NonSys) command save QSys.Lib. The Sav command also omits the QDLS file system because the SavDLO command saves QDLS. System Data Only (Option 22) Option 22 saves only your system data. It does not save any user data. You should run this option (or option 21) after applying PTFs or installing a new licensed program product. Like option 21, option 22 puts the system into a restricted state. Option 22 runs program QSRSavI. The following program extract represents the significant processing that option 22 performs:
EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr) +
Dlvry(*Break or *Notify)
SavSys
SavLib Lib(*IBM) AccPth(*Yes)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD') +
Obj(('/QIBM/ProdData') +
('/QOpenSys/QIBM/ProdData')) +
UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)
All User Data (Option 23) Option 23 saves all user data, including files, user-written programs, and all other user data on the system. This option also saves user profiles, security data, and configuration data. Like options 21 and 22, option 23 places the system in restricted state. Option 23 runs program QSRSavU. The following program extract represents the significant processing that option 23 performs:
EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr) +
Dlvry(*Break or *Notify)
SavSecDta
SavCfg
SavLib Lib(*AllUsr) AccPth(*Yes)
SavDLO DLO(*All) Flr(*Any)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD') +
Obj(('/*') +
('/QSYS.LIB' *Omit) +
('/QDLS' *Omit) +
('/QIBM/ProdData' *Omit) +
('/QOpenSys/QIBM/ProdData' *Omit)) +
UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)
Note: The Sav command omits the QSys.Lib file system because the SavSys command, the SavSecDta (Save Security Data) command, and the SavCfg (Save Configuration) command save QSys.Lib. The Sav command also omits the QDLS file system because the SavDLO command saves QDLS. In addition, the Sav command executed by option 23 omits the /QIBM and /QOpenSys/QIBM directories because these directories contain IBM-supplied objects. Setting Save Option Defaults When you save information using option 21, 22, or 23, you can specify default values for some of the commands used by the save process. Figure 15.3 shows the Specify Command Defaults panel values used by these options. You can use SAVE menu option 20 (Define save system and user data defaults) to change the default values displayed on this panel for menu options 21, 22, and 23. Changing the defaults simplifies the task of setting up your backups. To change the defaults, you must have *Change authority to both library QUsrSys and the QSRDflts data area in QUsrSys. When you select option 20, the system displays the default parameter values for options 21, 22, and 23. The first time you use option 20, the system displays the IBM-supplied default parameter values. You can change any or all of the parameter values to meet your needs. For example, you can specify additional tape devices or change the message queue delivery default. The system saves the new default values in data area QSRDflts in library QUsrSys for future use (the system creates QSRDflts only after you change the IBM-supplied default values). Once you’ve defined new default values, you no longer need to worry about which, if any, options to change on subsequent backups. You can simply review the new default options and then press Enter to start the backup using the new default parameters. If you have multiple, distributed systems with the same save parameters on each system, option 20 offers an additional benefit: You can simply define your default parameters using option 20 on one system and then save data area QSRDflts in library QUsrSys, distribute the saved data area to the other systems, and restore it. Printing System Information When you perform save operations using option 21, 22, or 23 from menu SAVE, you can optionally request a series of reports with system information that can be useful during system recovery. The Specify Command Defaults panel presented by these options provides a prompt for printing system information. You can also use command PrtSysInf (Print System Information) to print the system information. This information is especially useful if you can’t use your SavSys media to recover and must use your distribution media. Printing the system information requires *AllObj, *IOSysCfg, and *JobCtl authority and produces many spooled file listings. You probably don’t need to print the information every time you perform a backup. However, you should print it whenever important information about your system changes. The following lists and reports are generated when you print the system information (the respective CL commands are noted in parentheses):
Saving Data Concurrently Using Multiple Tape Devices As I mentioned earlier, one way to reduce the amount of time required for a complex backup strategy is to perform save operations to multiple tape devices at once. You can save data concurrently using multiple tape devices by saving libraries to one tape device, folders to another tape device, and directories to a third tape device. Or, you can save different sets of libraries, objects, folders, or directories to different tape devices. Concurrent Saves of Libraries and Objects You can run multiple save commands concurrently against multiple libraries. When you run multiple save commands, the system processes the request in several stages that overlap, improving save performance. To perform concurrent save operations to different tape devices, you can use the OmitLib (Omit library) parameter with generic naming. For example: SavLib Lib(*AllUsr) +
Dev(FirstTapeDevice) +
OmitLib(A* B* $* #* @* ... L*)
SavLib Lib(*AllUsr) +
Dev(SecondTapeDevice) +
OmitLib(M* N* ... Z*)
You can also save a single library concurrently to multiple tape devices by using the SavObj or SavChgObj command. This technique lets you issue multiple save operations using multiple tape devices to save objects from one large library. For example, you can save generic objects from one large library to one tape device and concurrently issue another SavObj command against the same library to save a different set of generic objects to another tape device. You can use generic naming on the Obj (Object) parameter while performing concurrent SavChgObj operations to multiple tape devices against a single library. For example: SavChgObj Obj(A* B* C* $* #* ... L*) +
Dev(FirstTapeDevice) +
Lib(LibraryName)
SavChgObj Obj(M* N* O* ... Z*) +
Dev(SecondTapeDevice) +
Lib(LibraryName)
Concurrent Saves of DLOs (Folders) You can run multiple SavDLO commands concurrently for DLO objects that reside in the same ASP. This technique allows concurrent saves of DLOs to multiple tape devices. You can use the command’s Flr (Folder) parameter with generic naming to perform concurrent save operations to different tape devices. For example: SavDLO DLO(*All) +
Flr(DEPT*) +
Dev(FirstTapeDevice) +
OmitFlr(DEPT2*)
SavDLO DLO(*All) +
Flr(DEPT2*) +
Dev(SecondTapeDevice)
In this example, the system saves to the first tape device all folders starting with DEPT except those that start with DEPT2. Folders that start with DEPT2 are saved to the second tape device. Note: Parameter OmitFlr is allowed only when you specify DLO(*All) or DLO(*Chg). Concurrent Saves of Objects in Directories You can also run multiple Sav commands concurrently against objects in directories. This technique allows concurrent saves of objects in directories to multiple tape devices. You can use the Sav command’s Obj (Object) parameter with generic naming to perform concurrent save operations to different tape devices. For example: Sav Dev('/QSYS.LIB/FirstTapeDevice.DEVD') +
Obj(('/DIRA*')) +
UpdHst(*Yes)
Sav Dev('/QSYS.LIB/SecondTapeDevice.DEVD') +
Obj(('/DIRB*')) +
UpdHst(*Yes)
Save-While-Active To either reduce or eliminate the amount of time your system is unavailable for use during a backup (your backup outage), you can use the save-while-active process on particular save operations along with your other backup and recovery procedures. Save-while-active lets you use the system during part or all of the backup process. In contrast, other save operations permit either no access or only read access to objects during the backup. How Does Save-While-Active Work? OS/400 objects consist of units of storage called pages. When you use save-while-active to save an object, the system creates two images of the pages of the object. The first image contains the updates to the object with which normal system activity works. The second image is a “snapshot” of the object as it exists at a single point in time called a checkpoint. The save-while-active job uses this image called the checkpoint image — to save the object. When an application makes changes to an object during a save-while-active job, the system uses one image of the object’s pages to make the changes and, at the same time, uses the other image to save the object to tape. The system locks objects as it obtains the checkpoint images, and you can’t change objects during the checkpoint processing. After the system has obtained the checkpoint images, applications can once again change the objects. The image that the system saves doesn’t include any changes made during the save-while-active job. The image on the tape is an image of the object as it existed when the system reached the checkpoint. Rather than maintain two complete images of the object being saved, the system maintains two images only for the pages of the objects that are being changed as the save is performed. Synchronization. When you back up more than one object using the save-while-active process, you must choose when the objects will reach a checkpoint in relationship to each other a concept called synchronization. There are three kinds of synchronization:
How you use save-while-active in your backup strategy depends on whether you choose to reduce or eliminate the time your system is unavailable during a backup. Reducing the backup outage is much simpler and more common than eliminating it. It’s also the recommended way to use save-while-active. When you use save-while-active to reduce your backup outage, your system recovery process is exactly the same as if you performed a standard backup operation. Also, using save-while-active this way doesn’t require you to implement journaling or commitment control. To use save-while-active to reduce your backup outage, you can end any applications that change objects or end the subsystems in which these applications are run. After the system reaches a checkpoint for those objects, you can restart the applications. One save-while-active option lets you have the system send a message notification when it completes the checkpoint processing. Once you know checkpoint processing is completed, it’s safe to start your applications or subsystems again. Using save-while-active this way can significantly reduce your backup outage. Typically, when you choose to reduce your backup outage with save-while-active, the time during which your system is unavailable for use ranges anywhere from 10 minutes to 60 minutes. It’s highly recommended that you use save-while-active to reduce your backup outage unless you absolutely cannot have your system unavailable for this time frame. You should use save-while-active to eliminate your backup outage only if you have absolutely no tolerance for any backup outage. You should use this approach only to back up objects that you’re protecting with journaling or commitment control. When you use save-while-active to eliminate your backup outage, you don’t end the applications that modify the objects or end the subsystems in which the applications are run. However, this method affects the performance and response time of your applications. Keep in mind that eliminating your backup outage with save-while-active requires much more complex recovery procedures. You’ll need to include these procedures in your disaster recovery plans. Save Commands That Support the Save-While-Active Option The following save commands support the save-while-active option:
The following parameters are available on the save commands for the save-while-active process:
For complete details about using the save-while-active process to either reduce or eliminate your backup outage, visit IBM’s iSeries Information Center at http://publib.boulder.ibm.com/pubs/html/as400/infocenter.htm. Backing Up Spooled Files When you save an output queue, its description is saved but not its contents (the spooled files). With a combination of spooled file APIs, user space APIs, and list APIs, you can back up spooled files, including their associated advanced function attributes (if any). The spooled file APIs perform the real work of backing up spooled files. These APIs include
These APIs let you copy spooled file information to a user space for save purposes and copy the information back from the user space to a spooled file. Once you’ve copied spooled file information to user spaces, you can save the user spaces. For more information about these APIs, see System API Reference (SC41-5801). One common misconception is that you can use the CpySplF (Copy Spooled File) command to back up spooled files. This command does let you copy information from a spooled file to a database file, but you shouldn’t rely on this method for spooled file backup. CpySplF copies only textual data and not advanced function attributes such as graphics and variable fonts. CpySplF also does nothing to preserve print attributes such as spacing. IBM does offer support for saving and restoring spooled files in its BRMS product. BRMS maintains all the advanced function attributes associated with the spooled files. For more information about BRMS, see “Backup, Recovery and Media Services (BRMS) Overview.” [Chapter 16] Recovering Your System Although the iSeries is very stable and disasters are rare, there are times when some type of recovery may be necessary. The extent of recovery required and the processes you follow will vary greatly depending on the nature of your failure. The sheer number of possible failures precludes a one-size-fits-all answer to recovery. Instead, you must examine the details of your failure and recover accordingly. To help determine the best way to recover your system, you should refer to “Selecting the Right Recovery Strategy” in OS/400 Backup and Recovery, which categorizes failures and their associated recovery processes and provides checklists of recovery steps. Before beginning your recovery, be sure to do the following:
Starting with V4R5, the OS/400 Backup and Recovery manual includes a new appendix called “Recovering your AS/400 system,” which provides step-by-step instructions for completely recovering your entire system to the same system (i.e., restoring to a system with the same serial number). You can use these steps only if you saved your entire system using either option 21 from menu SAVE or the equivalent SavSys, SavLib, SavDLO, and Sav commands. Continue to use the checklist titled “Recovering your entire system after a complete system loss (Checklist 17)” in Chapter 3 of OS/400 Backup and Recovery to completely recover your system in any of the following situations:
One piece of advice warrants repeating: Test as many of the procedures in your recovery plan as you possibly can before disaster strikes. If any surprises await you, it’s far better to uncover them in a test situation than during a disaster. This article is excerpted from the book Starter Kit for the IBM iSeries and AS/400 by Gary Guthrie and Wayne Madden (29th Street Press, 2001). For more information about the book, see http://www.iseriesnetwork.com/str/books/uniquebook2.cfm?NextBook=187. Debbie Saugen is the technical owner of iSeries 400 and AS/400 Backup and Recovery in IBM’s Rochester, Minnesota, Development Lab. She is also a senior recovery specialist with IBM Business Continuity and Recovery Services. Debbie enjoys sharing her knowledge by speaking at COMMON, iSeries 400 and AS/400e Technical Conferences, and Business Continuity and Recovery Services Conferences and writing for various iSeries and AS/400e magazines and Web sites.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sponsored Links | Featured Links | |
Penton Technology Media Connected Home | SQL Server Magazine | Windows IT Pro Report Bugs | Contact Us | Comments/Suggestions | Terms & Conditions | Privacy Policy | Trademarks See Membership Levels | Subscribe | Free E-mail Newsletters | Free RSS Feeds | My Profile | Upgrade Now | Renew Now Copyright © 2009 - Penton Technology Media System i is a trademark of International Business Machines Corporation and is used by Penton Media, Inc., under license. SystemiNetwork.com is published independently of International Business Machines Corporation, which is not responsible in any way for the content. Penton Media, Inc., is solely responsible for the editorial content and control of the System iNetwork. |