Starter Kit- Chapter 15 System iNetwork (formerly iSeries Network)
Home Site Map Contact Us My Profile Log In Join Now!
Info Centers
 Forums

 Tech Center

 News & Analysis

 Solution Center

 UK Centre
Popular Spots
 25th Anniversary

 Article Archive

 ProVIP Center (Club Tech)

 Code

 System i DocFinder

 Essential Guides

 Blogs

 Wikis

 e-Learning

 Webcasts

 Podcasts

 System i Jobs

 Events
Products
 i5 Route Finders

 Learning Center (Store)

 Product Directory
Network Poll
Determining a programmer's desktop requirements is not a black-and-white proposition, but matching equipment to programmer type can help productivity. Which "programmer type" are you?
Vote Now!
Network Memberships
 See Membership Levels

 Free E-Mail Newsletters

 Free RSS Feeds

 Subscribe/Join

 Upgrade Now

 Renew Now
About Us
 About the Network

 Network Publications

 Tech Editor Profiles

 Editorial Calendar

 Contact Us

 Subscribe

 Media Kit (PDF)

 Write For Us


System iNetwork March Sponsor





        System iNetwork March Sponsor


Home » Starter Kit » TOC » Chapter 15
  AS/400-iSeries Starter Kit


Chapter 15 - Backup Basics

The most valuable component of any computer system isn’t the hardware or software that runs the computer but, rather, the data that resides on the system. If a system failure or disaster occurs, you can replace the computer hardware and software that runs your business. Your company’s data, however, is irreplaceable. For this reason, it’s critical to have a good backup and recovery strategy. Companies go out of business when their data can’t be recovered.

What should you be backing up? The simple answer to this question is that you should back up everything. A basic rule of backup and recovery is that if you don’t save it, it doesn’t get restored. However, you may have some noncritical data (e.g., test data) on your system that doesn’t need to be restored and can be omitted from your backup.

When and how often do you need to back up? Ideally, saving your entire system every night is the simplest and safest backup strategy. This approach also gives you the simplest and safest strategy for recovery. Realistically, though, when and how you run your backup, as well as what you back up, depend on the size of your backup window — the amount of time your system can be unavailable to users while you perform a backup. To simplify recovery, you need to back up when your system is at a known point and your data isn’t changing.

When you design a backup strategy, you need to balance the time it takes to save your data with the value of the data you might lose and the amount of time it may take to recover. Always keep your recovery strategy in mind as you design your backup strategy.

If your system is so critical to your business that you don’t have a manageable backup window, you probably can’t afford an unscheduled outage either. If this is your situation, you should seriously evaluate the availability options of the iSeries, including dual systems. For more information about these options, see “Availability Options.”

Designing and Implementing a Backup Strategy

You should design your backup strategy based on the size of your backup window. At the same time you design your backup strategy, you should also design your recovery strategy to ensure that your backup strategy meets your system recovery needs. The final step in designing a backup strategy is to test a full system recovery. This is the only way to verify that you’ve designed a good backup strategy that will meet your system recovery needs. Your business may depend on your ability to recover your system. You should test your recovery strategy at your recovery services provider’s location.

When designing your backup and recovery strategy, think of it as a puzzle: The fewer pieces you have in the puzzle, the more quickly you can put the pieces of the puzzle together. The fewer pieces needed in your backup strategy, the more quickly you can recover the pieces.

Your backup strategy will typically be one of three types:

  • Simple — You have a large backup window, such as an 8- to 12-hour block of time available daily with no system activity.
  • Medium — You have a medium backup window, such as a 4- to 6-hour block of time available daily with no system activity.
  • Complex — You have a short backup window, with little or no time of system inactivity.

A simple way to ensure you have a good backup of your system is to use the options provided on menu SAVE ( Figure 15.1), which you can reach by typing Go Save on a command line. This command presents you with additional menus that make it easy either to back up your entire system or to split your entire system backup into two parts: system data and user data. In the following discussion of backup strategies, the menu options I refer to are from menu SAVE.

Implementing a Simple Backup Strategy

The simplest backup strategy is to save everything daily whenever there is no system activity. You can use SAVE menu option 21 (Entire system) to completely back up your system (with the exception of queue entries such as spooled files). You should also consider using this option to back up the entire system after installing a new release, applying PTFs, or installing a new licensed program product. As an alternative, you can use SAVE menu option 22 (System data only) to save just the system data after applying PTFs or installing a new licensed program product.

Option 21 offers the significant advantage that you can schedule the backup to run unattended (with no operator intervention). Keep in mind that unattended save operations require you to have a tape device capable of holding all your data. (For more information about backup media, see “Preparing and Managing Your Backup Media.”)

Even if you don’t have enough time or enough tape-device capability to perform an unattended save using option 21, you can still implement a simple backup strategy:

Daily backup: Back up only user data that changes frequently.
Weekly backup: Back up the entire system.

A simple backup strategy may also involve SAVE menu option 23 (All user data). This option saves user data that can change frequently. You can also schedule option 23 to run without operator intervention.

If your system has a long period of inactivity on weekends, your backup strategy might look like this:

Friday night: Entire system (option 21)
Monday night: All user data (option 23)
Tuesday night: All user data (option 23)
Wednesday night: All user data (option 23)
Thursday night: All user data (option 23)
Friday night: Entire system (option 21)

Implementing a Medium Backup Strategy

You may not have a large enough backup window to implement a simple backup strategy. For example, you may have large batch jobs that take a long time to run at night or a considerable amount of data that takes a long time to back up. If this is your situation, you’ll need to implement a backup and recovery strategy of medium complexity.

When developing a medium backup strategy, keep in mind that the more often your data changes, the more often you need to back it up. You’ll therefore need to evaluate in detail how often your data changes.

Several methods are available to you in developing a medium backup strategy:

  • saving changed objects
  • journaling objects and saving the journal receivers
  • saving groups of user libraries, folders, or directories

You can use one or a combination of these methods.

Saving changed objects. Several commands let you save only the data that has changed since your last save operation or since a particular date and time.

You can use the SavChgObj (Save Changed Objects) command to save only those objects that have changed since a library or group of libraries was last saved or since a particular date and time. This approach can be useful if you have a system environment in which program objects and data files exist in the same library. Typically, data files change very frequently, while program objects change infrequently. Using the SavChgObj command, you can save just the data files that have changed.

The SavDLO (Save Document Library Objects) command lets you save documents and folders that have changed since the last save or since a particular date and time. You can use SavDLO to save changed documents and folders in all your user auxiliary storage pools (ASPs) or in a specific user ASP.

You can use the Sav (Save) command to save only those objects in directories that have changed since the last save or since a particular date or time.

You can also choose to save only your changed data, using a combination of the SavChgObj, SavDLO, and Sav commands, if the batch workload on your system is heavier on specific days of the week. For example:

Day/time Batch workload Save operation
Friday nightLight Entire system (option 21)
Monday nightHeavy Changed data only*
Tuesday nightLight All user data (option 23)
Wednesday night Heavy Changed data only*
Thursday nightHeavy Changed data only*
Friday nightLightEntire system (option 21)
* Use a combination of the SavChgObj, SavDLO, and Sav commands.

Journaling objects and saving the journal receivers. If your save operations take too long because your files are large, saving changed objects may not help in your system environment. For instance, if you have a file member with 100,000 records and one record changes, the SavChgObj command saves the entire file member. In this environment, journaling your database files and saving the journal receivers regularly may be a better solution. However, keep in mind that this approach will make your recovery more complex.

When you journal a database file, the system writes a copy of every changed record to a journal receiver. When you save a journal receiver, you’re saving only the changed records in the file, not the entire file.

If you journal your database files and have a batch workload that varies, your backup strategy might look like this:

Day/time Batch workload Save operation
Friday night Light Entire system (option 21)
Monday night Heavy Journal receivers only
Tuesday night Light All user data (option 23)
Wednesday night Heavy Journal receivers only
Thursday night Heavy Journal receivers only
Friday night Light Entire system (option 21)

To take full advantage of journaling protection, you should detach and save the journal receivers regularly. The frequency with which you save the journal receivers depends on the number of journaled changes that occur on your system. Saving the journal receivers several times during the day may be appropriate for your system environment.

The way in which you save journal receivers depends on whether they reside in a library with other objects. Depending on your environment, you’ll use either the SavLib (Save Library) command or the SavObj (Save Object) command. It’s best to keep your journal receivers isolated from other objects so that your save/restore functions are simpler. Be aware that you must save a new member of a database file before you can apply journal entries to the file. If your applications regularly add new file members, you should consider using the SavChgObj strategy either by itself or in combination with journaling.

Saving groups of user libraries, folders, or directories. Many applications are set up with data files and program objects in different libraries. This design simplifies your backup and recovery procedures. Data files change frequently, and, on most systems, program objects change infrequently. If your system environment is set up like this, you may want to save only the libraries with data files on a daily basis. You can also save, on a daily basis, groups of folders and directories that change frequently.

Implementing a Complex Backup Strategy

If you have a very short backup window that requires a complex strategy for backup and for recovery, you can use some of the same techniques described for a medium backup strategy, but with a greater level of detail. For example, you may need to save specific critical files at specific times of the day or week.

Several other methods are available to you in developing a complex backup strategy. You can use one or a combination of these methods:

  • save data concurrently using multiple tape devices
  • save data in parallel using multiple tape devices
  • use the save-while-active process

Before you use any of these methods, you must have a complete backup of your entire system.

Saving data concurrently using multiple tape devices. You can reduce the amount of time your system is unavailable by performing save operations on more than one tape device at a time. For example, you can save libraries to one tape device, folders to another tape device, and directories to a third tape device. Or you can save different sets of libraries, objects, folders, or directories to different tape devices. Later, I provide more information about saving data concurrently using multiple tape devices.

Saving data in parallel using multiple tape devices. Starting with V4R4, you can perform a parallel save using multiple tape devices. A parallel save is intended for very large objects or libraries. With this method, the system “spreads” the data in the object or library across multiple tape devices. (This function is implemented with IBM’s Backup, Recovery and Media Services product; for more information about it, see “Backup, Recovery and Media Services (BRMS) Overview” [Chapter 16].)

Save-While-Active. The save-while-active process can significantly reduce the amount of time your system is unavailable during a backup. If you choose to use save-while-active, make sure you understand the process and monitor for any synchronization checkpoints before making your objects available for use. I provide more details about save-while-active later.

An Alternative Backup Strategy

Another option available to help implement your backup strategy is the Backup, Recovery and Media Services licensed program product. BRMS is IBM’s strategic OS/400 backup and recovery product on the iSeries and AS/400.

BRMS is a comprehensive tool for managing the backup, archiving, and recovery environment for one or more servers in a site or across a network in which data exchange by tape is required. For more information about using BRMS to implement your backup strategy, see “Backup, Recovery and Media Services (BRMS) Overview.” [Chapter 16]

The Inner Workings of Menu SAVE

Menu SAVE contains many options for saving your data, but four are primary:

  • 20 — Define save system and user data defaults
  • 21 — Entire system
  • 22 — System data only
  • 23 — All user data

You can use these menu options to back up your system. Or, if your installation requires a more complex backup strategy, you can use OS/400’s save commands in a CL program to customize your backup.

To help you make your decision, as well as to provide skeleton code that you can use as a guideline for your own backup programs, this section provides a look at some of the inner workings of these primary save options. For detailed instructions and a checklist on using these options, refer to OS/400 Backup and Recovery (SC41-5304). Figure 15.2 illustrates the save commands and the SAVE menu options you can use to save the parts of the system and the entire system.

Entire System (Option 21)

SAVE menu Option 21 lets you perform a complete backup of all the data on your system, with the exception of backing up spooled files (I cover spooled file backup later). This option puts the system into a restricted state. This means no users can access your system while the backup is running. It’s best to run this option overnight for a small system or during the weekend for a larger system.

Option 21 runs program QMNSave. The following CL program extract represents the significant processing that option 21 performs:

EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr)                    +
        Dlvry(*Break or *Notify)
SavSys
SavLib Lib(*NonSys) AccPth(*Yes)
SavDLO DLO(*All) Flr(*Any)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD') +
    Obj(('/*')                           +
        ('/QSYS.LIB' *Omit)              +
        ('/QDLS' *Omit))                 +
    UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)

Note: The Sav command omits the QSys.Lib file system because the SavSys (Save System) command and the SavLib Lib(*NonSys) command save QSys.Lib. The Sav command also omits the QDLS file system because the SavDLO command saves QDLS.

System Data Only (Option 22)

Option 22 saves only your system data. It does not save any user data. You should run this option (or option 21) after applying PTFs or installing a new licensed program product. Like option 21, option 22 puts the system into a restricted state.

Option 22 runs program QSRSavI. The following program extract represents the significant processing that option 22 performs:

EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr)                    +
        Dlvry(*Break or *Notify)
SavSys
SavLib Lib(*IBM) AccPth(*Yes)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD') +
    Obj(('/QIBM/ProdData')               +
        ('/QOpenSys/QIBM/ProdData'))     +
    UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)

All User Data (Option 23)

Option 23 saves all user data, including files, user-written programs, and all other user data on the system. This option also saves user profiles, security data, and configuration data. Like options 21 and 22, option 23 places the system in restricted state.

Option 23 runs program QSRSavU. The following program extract represents the significant processing that option 23 performs:

 EndSbs Sbs(*All) Option(*Immed)
ChgMsgQ MsgQ(QSysOpr)                      +
        Dlvry(*Break or *Notify)
SavSecDta
SavCfg
SavLib Lib(*AllUsr) AccPth(*Yes)
SavDLO DLO(*All) Flr(*Any)
Sav Dev('/QSYS.LIB/TapeDeviceName.DEVD')   +
    Obj(('/*')                             +
        ('/QSYS.LIB' *Omit)                +
        ('/QDLS' *Omit)                    +
        ('/QIBM/ProdData' *Omit)           +
        ('/QOpenSys/QIBM/ProdData' *Omit)) +
    UpdHst(*Yes)
StrSbs SbsD(ControllingSubsystem)

Note: The Sav command omits the QSys.Lib file system because the SavSys command, the SavSecDta (Save Security Data) command, and the SavCfg (Save Configuration) command save QSys.Lib. The Sav command also omits the QDLS file system because the SavDLO command saves QDLS. In addition, the Sav command executed by option 23 omits the /QIBM and /QOpenSys/QIBM directories because these directories contain IBM-supplied objects.

Setting Save Option Defaults

When you save information using option 21, 22, or 23, you can specify default values for some of the commands used by the save process. Figure 15.3 shows the Specify Command Defaults panel values used by these options. You can use SAVE menu option 20 (Define save system and user data defaults) to change the default values displayed on this panel for menu options 21, 22, and 23. Changing the defaults simplifies the task of setting up your backups. To change the defaults, you must have *Change authority to both library QUsrSys and the QSRDflts data area in QUsrSys.

When you select option 20, the system displays the default parameter values for options 21, 22, and 23. The first time you use option 20, the system displays the IBM-supplied default parameter values. You can change any or all of the parameter values to meet your needs. For example, you can specify additional tape devices or change the message queue delivery default. The system saves the new default values in data area QSRDflts in library QUsrSys for future use (the system creates QSRDflts only after you change the IBM-supplied default values).

Once you’ve defined new default values, you no longer need to worry about which, if any, options to change on subsequent backups. You can simply review the new default options and then press Enter to start the backup using the new default parameters.

If you have multiple, distributed systems with the same save parameters on each system, option 20 offers an additional benefit: You can simply define your default parameters using option 20 on one system and then save data area QSRDflts in library QUsrSys, distribute the saved data area to the other systems, and restore it.

Printing System Information

When you perform save operations using option 21, 22, or 23 from menu SAVE, you can optionally request a series of reports with system information that can be useful during system recovery. The Specify Command Defaults panel presented by these options provides a prompt for printing system information. You can also use command PrtSysInf (Print System Information) to print the system information. This information is especially useful if you can’t use your SavSys media to recover and must use your distribution media.

Printing the system information requires *AllObj, *IOSysCfg, and *JobCtl authority and produces many spooled file listings. You probably don’t need to print the information every time you perform a backup. However, you should print it whenever important information about your system changes.

The following lists and reports are generated when you print the system information (the respective CL commands are noted in parentheses):

  • a library backup list with information about each library in the system, including which backup schedules include the library and when the library was last backed up (DspBckupL *Lib)
  • a folder backup list with the same information for all folders in the system (DspBckupL *Flr)
  • a list of all system values (DspSysVal)
  • a list of network attributes (DspNetA)
  • a list of edit descriptions (DspEdtD)
  • a list of PTF details (DspPTF)
  • a list of reply list entries (WrkRpyLE)
  • a report of access-path relationships (DspRcyAP)
  • a list of service attributes (DspSvrA)
  • a list of network server storage spaces (DspNwSStg)
  • a report showing the power on/off schedule (DspPwrScd)
  • a list of hardware features on your system (DspHdwRsc)
  • a list of distribution queues (DspDstSrv)
  • a list of all subsystems (DspSbsD)
  • a list of the IBM software licenses installed on your machine (DspSfwRsc)
  • a list of journal object descriptions for all journals (DspObjD)
  • a report showing journal attributes for all journals (WrkJrnA)
  • a report showing cleanup operations (ChgClnup)
  • a list of all user profiles (DspUsrPrf)
  • a report of all job descriptions (DspJobD)

Saving Data Concurrently Using Multiple Tape Devices

As I mentioned earlier, one way to reduce the amount of time required for a complex backup strategy is to perform save operations to multiple tape devices at once. You can save data concurrently using multiple tape devices by saving libraries to one tape device, folders to another tape device, and directories to a third tape device. Or, you can save different sets of libraries, objects, folders, or directories to different tape devices.

Concurrent Saves of Libraries and Objects

You can run multiple save commands concurrently against multiple libraries. When you run multiple save commands, the system processes the request in several stages that overlap, improving save performance.

To perform concurrent save operations to different tape devices, you can use the OmitLib (Omit library) parameter with generic naming. For example:

SavLib Lib(*AllUsr)                   +
       Dev(FirstTapeDevice)           +
       OmitLib(A* B* $* #* @* ... L*)
SavLib Lib(*AllUsr)                   +
       Dev(SecondTapeDevice)          +
       OmitLib(M* N* ... Z*)

You can also save a single library concurrently to multiple tape devices by using the SavObj or SavChgObj command. This technique lets you issue multiple save operations using multiple tape devices to save objects from one large library. For example, you can save generic objects from one large library to one tape device and concurrently issue another SavObj command against the same library to save a different set of generic objects to another tape device.

You can use generic naming on the Obj (Object) parameter while performing concurrent SavChgObj operations to multiple tape devices against a single library. For example:

SavChgObj Obj(A* B* C* $* #* ... L*) +
          Dev(FirstTapeDevice)       +
          Lib(LibraryName)
SavChgObj Obj(M* N* O* ... Z*)       +
          Dev(SecondTapeDevice)      +
          Lib(LibraryName)

Concurrent Saves of DLOs (Folders)

You can run multiple SavDLO commands concurrently for DLO objects that reside in the same ASP. This technique allows concurrent saves of DLOs to multiple tape devices.

You can use the command’s Flr (Folder) parameter with generic naming to perform concurrent save operations to different tape devices. For example:

SavDLO DLO(*All)             +
       Flr(DEPT*)            +
       Dev(FirstTapeDevice)  +
       OmitFlr(DEPT2*)
SavDLO DLO(*All)             +
       Flr(DEPT2*)           +
       Dev(SecondTapeDevice)

In this example, the system saves to the first tape device all folders starting with DEPT except those that start with DEPT2. Folders that start with DEPT2 are saved to the second tape device.

Note: Parameter OmitFlr is allowed only when you specify DLO(*All) or DLO(*Chg).

Concurrent Saves of Objects in Directories

You can also run multiple Sav commands concurrently against objects in directories. This technique allows concurrent saves of objects in directories to multiple tape devices.

You can use the Sav command’s Obj (Object) parameter with generic naming to perform concurrent save operations to different tape devices. For example:

Sav Dev('/QSYS.LIB/FirstTapeDevice.DEVD')  +
    Obj(('/DIRA*'))                        +
    UpdHst(*Yes)
Sav Dev('/QSYS.LIB/SecondTapeDevice.DEVD') +
    Obj(('/DIRB*'))                        +
    UpdHst(*Yes)

Save-While-Active

To either reduce or eliminate the amount of time your system is unavailable for use during a backup (your backup outage), you can use the save-while-active process on particular save operations along with your other backup and recovery procedures. Save-while-active lets you use the system during part or all of the backup process. In contrast, other save operations permit either no access or only read access to objects during the backup.

How Does Save-While-Active Work?

OS/400 objects consist of units of storage called pages. When you use save-while-active to save an object, the system creates two images of the pages of the object. The first image contains the updates to the object with which normal system activity works. The second image is a “snapshot” of the object as it exists at a single point in time called a checkpoint. The save-while-active job uses this image — called the checkpoint image — to save the object. When an application makes changes to an object during a save-while-active job, the system uses one image of the object’s pages to make the changes and, at the same time, uses the other image to save the object to tape.

The system locks objects as it obtains the checkpoint images, and you can’t change objects during the checkpoint processing. After the system has obtained the checkpoint images, applications can once again change the objects.

The image that the system saves doesn’t include any changes made during the save-while-active job. The image on the tape is an image of the object as it existed when the system reached the checkpoint. Rather than maintain two complete images of the object being saved, the system maintains two images only for the pages of the objects that are being changed as the save is performed.

Synchronization. When you back up more than one object using the save-while-active process, you must choose when the objects will reach a checkpoint in relationship to each other — a concept called synchronization. There are three kinds of synchronization:

  • With full synchronization, the checkpoints for all the objects occur at the same time, during a time period in which no changes can occur to the objects. It’s strongly recommended that you use full synchronization, even when you’re saving objects in only one library.
  • With library synchronization, the checkpoints for all the objects in a library occur at the same time.
  • With system-defined synchronization, the system decides when the checkpoints for the objects occur. The checkpoints may occur at different times, resulting in a more complex recovery procedure.

How you use save-while-active in your backup strategy depends on whether you choose to reduce or eliminate the time your system is unavailable during a backup. Reducing the backup outage is much simpler and more common than eliminating it. It’s also the recommended way to use save-while-active.

When you use save-while-active to reduce your backup outage, your system recovery process is exactly the same as if you performed a standard backup operation. Also, using save-while-active this way doesn’t require you to implement journaling or commitment control.

To use save-while-active to reduce your backup outage, you can end any applications that change objects or end the subsystems in which these applications are run. After the system reaches a checkpoint for those objects, you can restart the applications. One save-while-active option lets you have the system send a message notification when it completes the checkpoint processing. Once you know checkpoint processing is completed, it’s safe to start your applications or subsystems again. Using save-while-active this way can significantly reduce your backup outage.

Typically, when you choose to reduce your backup outage with save-while-active, the time during which your system is unavailable for use ranges anywhere from 10 minutes to 60 minutes. It’s highly recommended that you use save-while-active to reduce your backup outage unless you absolutely cannot have your system unavailable for this time frame.

You should use save-while-active to eliminate your backup outage only if you have absolutely no tolerance for any backup outage. You should use this approach only to back up objects that you’re protecting with journaling or commitment control.

When you use save-while-active to eliminate your backup outage, you don’t end the applications that modify the objects or end the subsystems in which the applications are run. However, this method affects the performance and response time of your applications.

Keep in mind that eliminating your backup outage with save-while-active requires much more complex recovery procedures. You’ll need to include these procedures in your disaster recovery plans.

Save Commands That Support the Save-While-Active Option

The following save commands support the save-while-active option:

Command Function
SavLib Save library
SavObj Save object
SavChgObj Save changed objects
SavDLO Save document library objects
Sav Save objects in directories

The following parameters are available on the save commands for the save-while-active process:

ParameterDescription
SavAct (Save-while-active) You must decide whether you're going to use full synchronization, library synchronization, or system-defined synchronization. It's highly recommended that you use full synchronization in most cases.
SavActWait (Save active wait time) You can specify the maximum number of seconds that the save-while-active operation will wait to allocate an object during checkpoint processing.
SavActMsgQ (Save active message queue) You can specify whether the system sends you a message when it reaches a checkpoint.
SavActOpt (Save-while-active options) This parameter has values that are specific to the Sav command.

For complete details about using the save-while-active process to either reduce or eliminate your backup outage, visit IBM’s iSeries Information Center at http://publib.boulder.ibm.com/pubs/html/as400/infocenter.htm.

Backing Up Spooled Files

When you save an output queue, its description is saved but not its contents (the spooled files). With a combination of spooled file APIs, user space APIs, and list APIs, you can back up spooled files, including their associated advanced function attributes (if any).

The spooled file APIs perform the real work of backing up spooled files. These APIs include

  • QUSLSpl (List Spooled Files)
  • QUSRSplA (Retrieve Spooled File Attributes)
  • QSpOpnSp (Open Spooled File)
  • QSpCrtSp (Create Spooled File)
  • QSpGetSp (Get Spooled File Data)
  • QSpPutSp (Put Spooled File Data)
  • QSpCloSp (Close Spooled File)

These APIs let you copy spooled file information to a user space for save purposes and copy the information back from the user space to a spooled file. Once you’ve copied spooled file information to user spaces, you can save the user spaces. For more information about these APIs, see System API Reference (SC41-5801).

One common misconception is that you can use the CpySplF (Copy Spooled File) command to back up spooled files. This command does let you copy information from a spooled file to a database file, but you shouldn’t rely on this method for spooled file backup. CpySplF copies only textual data and not advanced function attributes such as graphics and variable fonts. CpySplF also does nothing to preserve print attributes such as spacing.

IBM does offer support for saving and restoring spooled files in its BRMS product. BRMS maintains all the advanced function attributes associated with the spooled files. For more information about BRMS, see “Backup, Recovery and Media Services (BRMS) Overview.” [Chapter 16]

Recovering Your System

Although the iSeries is very stable and disasters are rare, there are times when some type of recovery may be necessary. The extent of recovery required and the processes you follow will vary greatly depending on the nature of your failure.

The sheer number of possible failures precludes a one-size-fits-all answer to recovery. Instead, you must examine the details of your failure and recover accordingly. To help determine the best way to recover your system, you should refer to “Selecting the Right Recovery Strategy” in OS/400 Backup and Recovery, which categorizes failures and their associated recovery processes and provides checklists of recovery steps.

Before beginning your recovery, be sure to do the following:

  • If you have to back up and recover because of some system problem, make sure you understand how the problem occurred so you can choose the correct recovery procedures.
  • Plan your recovery.
  • Make a copy of the OS/400 Backup and Recovery checklist you’re using, and check off each step as you complete it. Keep the checklist for future reference. If you need help later, this record will be invaluable.
  • If your problem requires hardware or software service, make sure you understand exactly what the service representative does. Don’t be afraid to ask questions.

Starting with V4R5, the OS/400 Backup and Recovery manual includes a new appendix called “Recovering your AS/400 system,” which provides step-by-step instructions for completely recovering your entire system to the same system (i.e., restoring to a system with the same serial number). You can use these steps only if you saved your entire system using either option 21 from menu SAVE or the equivalent SavSys, SavLib, SavDLO, and Sav commands.

Continue to use the checklist titled “Recovering your entire system after a complete system loss (Checklist 17)” in Chapter 3 of OS/400 Backup and Recovery to completely recover your system in any of the following situations:

  • Your system has logical partitions.
  • Your system uses the Alternate Installation Device Setup feature that you can define through Dedicated Service Tools (DST) for a manual IPL from tape.
  • Your system has mounted user-defined file systems before the save.
  • You’re recovering to a different system (a system with a different serial number).

One piece of advice warrants repeating: Test as many of the procedures in your recovery plan as you possibly can before disaster strikes. If any surprises await you, it’s far better to uncover them in a test situation than during a disaster.

This article is excerpted from the book Starter Kit for the IBM iSeries and AS/400 by Gary Guthrie and Wayne Madden (29th Street Press, 2001). For more information about the book, see http://www.iseriesnetwork.com/str/books/uniquebook2.cfm?NextBook=187.

Debbie Saugen is the technical owner of iSeries 400 and AS/400 Backup and Recovery in IBM’s Rochester, Minnesota, Development Lab. She is also a senior recovery specialist with IBM Business Continuity and Recovery Services. Debbie enjoys sharing her knowledge by speaking at COMMON, iSeries 400 and AS/400e Technical Conferences, and Business Continuity and Recovery Services Conferences and writing for various iSeries and AS/400e magazines and Web sites.


Availability Options

Availability options are a complement to a backup strategy, not a replacement. These options can significantly reduce the time it takes you to recover after a failure. In some cases, availability options can prevent the need for recovery. To justify the cost of using availability options, you need to understand the following:

  • the value of the data on your system
  • the cost of a scheduled or unscheduled outage
  • your availability requirements

The following availability options can complement your backup strategy:

  • journal management
  • access-path protection
  • auxiliary storage pools
  • device parity protection
  • mirrored protection
  • dual systems
  • clustered systems

You should compare these options and decide which are best suited to your business needs. For details about availability options, their benefits versus costs, and how to implement them, refer to IBM's iSeries Information Center at http://publib.boulder.ibm.com/pubs/html/as400/infocenter.htm.

We'll look more closely at each availability option in a moment, but first, it's helpful to be acquainted with the following terms, which are often used in discussing system availability:

  • An outage is a period of time during which the system is unavailable to users. During a scheduled outage, you deliberately make your system unavailable to users. You might use a scheduled outage to run batch work, back up your system, or apply PTFs. An unscheduled outage is usually caused by a failure of some type.
  • High availability means that the system has no unscheduled outages.
  • In continuous operations, the system has no scheduled outages.
  • Continuous availability means that the system has neither scheduled nor unscheduled outages.

Journal Management for Backup and Recovery

You can use journal management (often referred to as journaling a file or an access path) to recover the changes to database files (or other objects) that have occurred since your last complete backup. You use a journal to define which files and access paths you want to protect. A journal receiver contains the entries (called journal entries) that the system adds when events occur that are journaled, such as changes to database files, changes to other journaled objects, or security-related events.

You can use the remote journal function to set up journals and journal receivers on a remote iSeries system. These journals and journal receivers are associated with journals and journal receivers on the source system. The remote journal function lets you replicate journal entries from the source system to the remote system.

Access-Path Protection

An access path describes the order in which the records in a database file are processed. Because different programs may need to access the file’s records in different sequences, a file can have multiple access paths. Access paths in use at the time of a system failure are at risk of corruption. If access paths become corrupted, the system must rebuild them before you can use the files again. This can be a very time-consuming process.

You should consider an access-path protection plan to limit the time required to recover corrupted access paths. The system offers two methods of access-path protection:

  • system-managed access-path protection (SMAPP)
  • explicit journaling of access paths

You can use these methods independently or together.

By using journal management to record changes to access paths, you can greatly reduce the amount of time it takes to recover access paths should doing so become necessary. Using journal entries, the system can recover access paths without the need for a complete rebuild. This can result in considerable time savings.

With SMAPP, you can let the system determine which access paths to protect. The system makes this determination based on access-path target recovery times that you specify. SMAPP provides a simple way to reduce recovery time after a system failure, managing the required environment for you.

You can use explicit journaling, even when using SMAPP, to ensure that certain access paths critical to your business are protected. The system evaluates the protected and unprotected access paths to develop its strategy for meeting your access-path recovery targets.

Auxiliary Storage Pools

Your system may have many disk units attached to it for auxiliary storage of your data that, to your system, look like a single unit of storage. When the system writes data to disk, it spreads the data across all of these units.

You can divide your disk units into logical subsets known as auxiliary storage pools (ASPs) which don't necessarily correspond to the physical arrangement of disks. You can then assign objects to particular ASPs, isolating them on particular disk units. When the system now writes to these objects, it spreads the information across only the units within the ASP.

ASPs provide a recovery advantage if the system experiences a disk unit failure that results in data loss. In such a case, recovery is required only for the objects in the ASP containing the failed disk unit. System objects and user objects in other ASPs are protected from the disk failure.

In addition to the protection that isolating objects to particular ASPs provides, the use of ASPs provides a certain level of flexibility. When you assign the disk units on your system to more than one ASP, each ASP can have different strategies for availability, backup and recovery, and performance.

Device Parity Protection

Device parity protection is a hardware availability function that protects against data loss due to disk unit failure or damage to a disk. To protect data, the disk controller or input/output processor (IOP) calculates and saves a parity value for each bit of data. The disk controller or IOP computes the parity value from the data at the same location on each of the other disk units in the device parity set. When a disk failure occurs, the data can be reconstructed by using the parity value and the values of the bits in the same locations on the other disks. The system continues to run while the data is being reconstructed. The overall goal of device parity protection is to provide high availability and to protect data as inexpensively as possible.

If possible, you should protect all the disk units on your system with either device parity protection or mirrored protection (covered next). In many cases, your system remains operational during repairs.

Device parity protection is designed to prevent system failure and to speed the recovery process for certain types of failures, not as a substitute for a good backup and recovery strategy. Device parity protection doesn’t protect you if you have a site disaster or user error. It also doesn’t protect against system outages caused by failures in other disk-related hardware (e.g., disk controllers, disk I/O).

Mirrored Protection

Mirrored protection is a software availability function that protects against data loss due to failure or damage to a disk-related component. The system protects your data by maintaining two copies of the data on two separate disk units. When a disk-related component fails, the system continues to operate without interruption, using the mirrored copy of the data until repairs are complete on the failed component.

When you start mirrored protection or add disk units to an ASP that has mirrored protection, the system creates mirrored pairs using disk units that have identical capacities. The goal is to protect as many disk-related components as possible. To provide maximum hardware redundancy and protection, the system tries to pair disk units from different controllers, IOPs, and buses.

Different levels of mirrored protection are possible, depending on the duplicated hardware. For instance, you can duplicate

  • disk units
  • disk controllers
  • disk IOPs
  • a bus

If a duplicate exists for the failing component and attached hardware components, the system remains available during the failure.

Remote mirroring support lets you have one mirrored unit within a mirrored pair at the local site and the second mirrored unit at a remote site. For some systems, standard DASD mirroring will remain the best choice; for others, remote DASD mirroring provides important additional capabilities.

Dual Systems

System installations with very high availability requirements use a dual-systems approach, in which two systems maintain some or all data. If the primary system fails, the secondary system can take over critical application programs.

The most common way to maintain data on the secondary system is through journaling. The primary system transmits journal entries to the secondary system, where a user-written program uses them to update files and other journaled objects in order to replicate the application environments of the primary system. Users sometimes implement this by transmitting journal entries at the application layer. The remote journal function improves on this technique by transmitting journal entries to a duplicate journal receiver on the secondary system at the licensed internal code layer. Several software packages are available from independent software vendors to support dual systems.

Clustered Systems

A cluster is a collection or group of one or more systems that work together as a single system. The cluster is identified by name and consists of one or more cluster nodes. Clustering let you efficiently group your systems together to create an environment that approaches 100 percent availability.


Preparing and Managing Your Backup Media

OS/400’s save commands support different types of devices (including save file, tape, diskette, and optical). For a backup strategy, you should always back up to a tape device. Choose a tape device and tape media that has the performance capabilities and density capacity that will meet your backup window and any requirements you have for running an unattended backup.

Preparing and managing your tape media is an important part of your backup operations. You need to be able to easily locate the correct media to perform a successful system recovery.

You’ll need to use sets of tapes and implement a rotation schedule. An important part of a good backup strategy is to have more than one set of backup media. When you perform a system recovery, you may need to go back to an older set of tape media if your most recent set is damaged or if you discover a programming error that has affected data on your most recent backup media.

At a minimum, you should rotate three sets of media, as follows:

Backup Media set
Backup 1 Set 1
Backup 2 Set 2
Backup 3 Set 3
Backup 4 Set 1
Backup 5 Set 2
Backup 6 Set 3
     . .
     . .
     . .

You may find that the easiest method is to have a different set of media for each day of the week. This strategy makes it easier for the operator to know which set to mount for backup.

Cleaning Your Tape Devices

It’s important to clean your tape devices regularly. The read-write heads can collect dust and other material that can cause errors when reading or writing to tape media. If you’re using new tapes, it’s especially important to clean the device because new tapes tend to collect more material on the read-write heads. For specific recommendations, refer to your tape drive’s manual.

Preparing Your Tapes for Use

To prepare tape media for use, you’ll need to use the InzTap (Initialize Tape) command. (Some tapes come pre-initialized.) When you initialize tapes, you’re required to give each tape a new-volume identifier (using the InzTap command’s NewVol parameter) and a density (Density parameter). The new-volume identifier identifies the tape as a standard-labeled tape that can be used by the system for backups. The density specifies the format in which to write the data on the tape based on the tape device you’re using. You can use the special value *DevType to easily specify that the format be based on the type of tape device being used.

When initializing new tapes, you should also specify Check(*No); otherwise, the system tries to read labels from the volume on the specified tape device until the tape completely rewinds.

Here’s a sample command to initialize a new tape volume:

InzTap Dev(Tap01)        +
       NewVol(A23001)    +
       Check(*No)        +
       Density(*DevType)

Tip: It’s important to initialize each tape only once in its lifetime and give each tape volume a different volume identifier so tape-volume error statistics can be tracked.

Naming and Labeling Your Tapes

Initializing each tape volume with a volume identifier helps ensure that your operators load the correct tape for the backup. It’s a good idea to choose volume-identifier names that help identify tape-volume contents and the volume set to which each tape belongs. The following table illustrates how you might initialize your tape volumes and label them externally in a simple backup strategy. Each label has a prefix that indicates the day of the week (A for Monday, B for Tuesday, and so on), the backup operation (option number from menu SAVE), and the media set with which the tape volume is associated.

Volume Naming — Part of a Simple Backup Strategy
Volume name External label
B23001 Tuesday-Menu SAVE, option 23-Media set 1
B23002 Tuesday-Menu SAVE, option 23-Media set 2
B23003 Tuesday-Menu SAVE, option 23-Media set 3
E21001 Friday-Menu SAVE, option 21-Media set 1
E21002 Friday-Menu SAVE, option 21-Media set 2
E21003 Friday-Menu SAVE, option 21-Media set 3

Volume names and labels for a medium backup strategy might look like this:

Volume Naming — Part of a Medium Backup Strategy
Volume name External label
E21001 Friday-Menu SAVE, option 21-Media set 1
E21002 Friday-Menu SAVE, option 21-Media set 2
AJR001 Monday-Save journal receivers-Media set 1
AJR002 Monday-Save journal receivers-Media set 2
ASC001 Monday-Save changed data-Media set 1
ASC002 Monday-Save changed data-Media set 2
BJR001 Tuesday-Save journal receivers-Media set 1
BJR002 Tuesday-Save journal receivers-Media set 2
B23001 Tuesday-Menu SAVE, option 23-Media set 1
B23002 Tuesday-Menu SAVE, option 23-Media set 2

Tip: If your tapes don’t come prelabeled, you should put an external label on each tape volume. The label should show the volume-identifier name and the most recent date the tape was used for a backup. Color-coded labels can help you locate and store your media — for example, yellow for set 1, red for set 2, and so on.

Verifying Your Tapes

Good backup procedures dictate that you verify you’re using the correct tape volumes. Depending on your system environment, you can choose to manually verify your tapes or have the system verify your tapes:

  • Manual verification — If you use the default value of *Mounted on the Vol (Volume) parameter of the save commands, telling the system to use the currently mounted volume, the operator must manually verify that the correct tape volumes are loaded in the correct order.
  • System verification — By specifying a list of volume identifiers on the save commands, you can have the system verify that the correct tape volumes are loaded in the correct order. If the tape volumes aren’t loaded correctly, the system will send a message telling the operator to load the correct volumes.

Another way to verify that the correct tape volumes are used is to specify expiration dates on the media files. If you rely on your operators to verify tape volumes, you can use the ExpDate (Expiration date) parameter and specify the value *Perm (permanent) for your save operations. This will prevent someone from writing over a file on the tape volume by mistake. When you’re ready to use the tape volume again, specify Clear(*All) on the save operations.

If you want the system to verify your tape volumes, specify an ExpDate value that ensures you don’t use the media again too soon. For example, if you rotate five sets of media for daily saves, specify an expiration date of the current day plus four on the save operation. Specify Clear(*None) on save operations so the system doesn’t write over unexpired files.

Caution: It’s important to try to avoid situations in which an operator must regularly respond to (and ignore) messages such as “Unexpired files on the media.” If operators get into the habit of ignoring routine messages, they may miss important messages.

Storing Your Tapes

An important part of any recovery strategy is storing the tape volumes in a safe but accessible location. Ensure the tape volumes have external labels and are organized well so you can locate them easily.

To enable disaster recovery, you should store a complete set of your backups at a safe, accessible location away from your site. Consider contracting with a vendor that will pick up and store your tapes. When choosing off-site storage, consider how quickly you can retrieve the tapes. Also, consider whether you’ll have access to your tapes on weekends and during holidays.

A complete recovery strategy keeps one set of tapes close at hand for immediate data recovery and keeps a duplicate set of tapes in off-site storage for disaster recovery purposes. To duplicate your tape volumes for off-site storage, you can use the DupTap (Duplicate Tape) command.

Handling Tape Media Errors

When you’re saving or restoring to tape, it’s normal for some tape read/write errors to occur. Tape read/write errors fall into one of three categories:

  • Recoverable errors: Some tape devices support recovering from read/write errors. The system repositions the tape automatically and tries the save or restore operation again.
  • Unrecoverable errors — processing can continue: In some instances, the system can’t continue to use the current tape but can continue processing on a new tape. The system will ask you to load another tape. You can still use the tape with the unrecoverable error for restore operations.
  • Unrecoverable errors — processing cannot continue:

    In some cases, an unrecoverable read/write error will cause the system to end the save operation.

Tapes physically wear out after extended use. You can determine whether a tape is wearing out by periodically printing the error log. Use the PrtErrLog (Print Error Log) command, and specify Type(*VolStat). The printed output provides statistics about each tape volume. If you’ve used unique volume-identifier names for your tapes and you’ve initialized each volume only once, you can determine which tapes have excessive read/write errors. Refer to your tape-volume documentation to determine the error threshold for the tape volumes you’re using. You should discard any bad tape volumes.

If you think you have a bad tape volume, you can use the DspTap (Display Tape) or the DupTap command to check the tape’s integrity. Both of these commands read the entire tape volume and will detect any objects on the tape that the system can’t read.



  Sponsored Links   Featured Links


Penton Technology Media
Connected Home | SQL Server Magazine | Windows IT Pro
Report Bugs | Contact Us | Comments/Suggestions | Terms of Use | Privacy Statement | Trademarks
See Membership Levels | Subscribe | Free E-mail Newsletters | Free RSS Feeds | My Profile | Upgrade Now | Renew Now

© 2010 Penton Media, Inc.
Penton Media
System i is a trademark of International Business Machines Corporation and is used by Penton Media, Inc., under license. SystemiNetwork.com is published independently of International Business Machines Corporation, which is not responsible in any way for the content. Penton Media, Inc., is solely responsible for the editorial content and control of the System iNetwork.