![]() |
![]() |
|
|
![]() |
![]() |
HP-UX Memory Management: White Paper > Chapter 1 MEMORY MANAGEMENT![]() SWAP SPACE MANAGEMENT |
|
Swap space is an area on a high-speed storage device (almost always a disk drive), reserved for use by the virtual memory system for deactivation and paging processes. At least one swap device (primary swap) must be present on the system. During system startup, the location (disk block number) and size of each swap device is displayed in 512-KB blocks. The swapper reserves swap space at process creation time, but does not allocate swap space from the disk until pages need to go out to disk. Reserving swap at process creation protects the swapper from running out of swap space. You can add or remove swap as needed (that is, dynamically) while the system is running, without having to regenerate the kernel. HP-UX uses both physical and pseudo swap to enable efficient execution of programs. System memory used for swap space is called pseudo-swap space. It allows users to execute processes in memory without allocating physical swap. Pseudo-swap is controlled by an operating-system parameter; by default, swapmem_on is set to 1, enabling pseudo-swap. Typically, when the system executes a process, swap space is reserved for the entire process, in case it must be paged out. According to this model, to run one gigabyte of processes, the system would have to have one gigabyte of configured swap space. Although this protects the system from running out of swap space, disk space reserved for swap is under-utilized if minimal or no swapping occurs. To avoid such waste of resources, HP-UX is configured to access up to three-quarters of system memory capacity as pseudo-swap. This means that system memory serves two functions: as process-execution space and as swap space. By using pseudo-swap space, a one-gigabyte memory system with one-gigabyte of swap can run up to 1.75 GB of processes. As before, if a process attempts to grow or be created beyond this extended threshold, it will fail. When using pseudo swap for swap, the pages are locked; as the amount of pseudo-swap increases, the amount of lockable memory decreases. For factory-floor systems (such as controllers), which perform best when the entire application is resident in memory, pseudo-swap space can be used to enhance performance: you can either lock the application in memory or make sure the total number of processes created does not exceed three-quarters of system memory. Pseudo-swap space is set to a maximum of three-quarters of system memory because the system can begin paging once three-quarters of system available memory has been used. The unused quarter of memory allows a buffer between the system and the swapper to give the system computational flexibility. When the number of processes created approaches capacity, the system might exhibit thrashing and a decrease in system response time. If necessary, you can disable pseudo-swap space by setting the tunable parameter swapmem_on in /usr/conf/master.d/core-hpux to zero. At the head of a doubly linked list of regions that have pseudo-swap allocated is a null terminated list called pswaplist. There are two kinds of physical swap space: device swap and file-system swap. Device swap space resides in its own reserved area (an entire disk or logical volume of an LVM disk) and is faster than file-system swap because the system can write an entire request (256 KB) to a device at once. File-system swap space is located on a mounted file system and can vary in size with the system's swapping activity. However, its throughput is slower than device swap, because free file-system blocks may not always be contiguous; therefore, separate read/write requests must be made for each file-system block. To optimize system performance, file-system swap space is allocated and de-allocated in swchunk-sized chunks. swchunk is a configurable operating system parameter; its default is 2048 KB (2 MB). Once a chunk of file system space is no longer being used by the paging system, it is released for file system use, unless it has been preallocated with swapon. If swapping to file-system swap space, each chunk of swap space is a file in the file system swap directory, and has a name constructed from the system name and the swaptab index (such as becky.6 for swaptab[6] on a system named becky). Several configurable parameters deal with swapping. Table 1-24 Configurable swap-space parameters
When the kernel is initialized, conf.c includes globals.h, which contains numerous characteristics related to swap space, shown in the next table. The most important to swap space reservation are swapspc_cnt, swapspc_max, swapmem_cnt, swapmem_max, and sys_mem Table 1-25 Swap-space characteristics in globals.h
System swap space values are calculated as follows:
In HP-UX, only data area growth (using sbrk()) or stack growth will cause a process to die for lack of swap space. Program text does not use swap. Swap reservation is a numbers game. The system has a finite number of pages of physical swap space. By decrementing the appropriate counters, HP-UX reserves space for its processes. Most UNIX systems and UNIX-like systems allocate swap when needed. However, if the system runs out of swap space but needs to write a process' page(s) to a swap device, it has no alternative but to kill the process. To alleviate this problem, HP-UX reserves swap at the time the process is forked or exec'd. When a new process is forked or executed, if insufficient swap space is available and reserved to handle the entire process, the process may not execute. At system startup, swapspc_cnt and swapmem_cnt are initialized to the total amount of swap space and pseudo-swap available. Whenever the swapon() call is made to a device or file syste, the amount of swap newly enabled is converted to units of pages and added to the two global swap-reservation counters swapspc_max (total enabled swap) and swapspc_cnt (available swap space). Each time swap space is reserved for a process (that is, at process creation or growth time), swapspc_cnt is decremented by the number of pages required. The kernel does not actually assign disk blocks until needed. Once swap space is exhausted (that is, swapspc_cnt == 0), any subsequent request to reserve swap causes the system to allocate addition chunk of file-system swap space. If successful, both swapspc_max and swapspc_cnt are updated and the current (and subsequent requests) can be satisfied. If a file-system chunk cannot be allocated, the request fails, unless pseudo-swap is available. When swap space is no longer needed (due to process termination or shrinkage), swapspc_cnt is incremented by the number of pages freed. swapspc_cnt never exceeds swapspc_max and is always greater than or equal to zero. If a chunk of file-system swap is no longer needed, it is released back to the file system and swapspc_max and swapspc_cnt are updated. If no device or file system swap space is available, the system uses pseudo-swap as a last resort. It decrements swapmem_cnt and locks the pages into memory. Pseudo swap is either free or allocated; it is never reserved. Approximately 3/4 of available system memory is available as pseudo-swap space if the tunable parameter swapmem_on is set to 1. Pseudo-swap is tracked in the global pseudo swap reservation counters swapmem_max (enabled pseudo-swap) and swapmem_cnt (currently available pseudo-swap). If physical swap space is exhausted and no additional file-system swap can be acquired, pseudo swap space is reserved for the process by decrementing swapmem_cnt. For example, on a 64MB system, swapmem_max and swapmem_cnt track approximately 48MB of pseudo-swap space, the remainder tracked by the global sys_mem, which represents the number of pages reserved for system use only. Processes track the number of pseudo swap pages allocated to them by incrementing a per region counter r_swapmem. All regions using pseudo swap are linked on the pseudo swap list pswaplist. Once pseudo swap is exhausted (that is, swapmem_cnt==0), attempts at process creation or growth will fail. Because the swapper competes with the operating system for use of memory, swapmem_cnt can also be decremented by the operating system for any dynamically allocated memory. Once swapmem_cnt is exhausted, subsequent requests for swap space fail; however, the operating system can still reserve memory out of the malloc pool. Once a process no longer needs its allocated pseudo swap space, swapmem_cnt is incremented by the amount released and r_swapmem is updated. If the system returns the pseudo swap space used for dynamically allocated kernel memory, the amount being released is firtst added to sys_mem. Once sys_mem grows to its maximum value, any additional pages returned are used to update swapmem_cnt. swapmem_cnt must be less than or equal to swapmem_max and greater than or equal to zero. Because pseudo swap is shared by the swapper and memory allocation routines, it is used sparingly. The operating system periodically checks to see if physical swap space has been recently freed. If it has, the system attempts to migrate processes using pseudo swap only to use the available physical swap by walking the doubly linked list of pseudo swap regions. swapspc_cnt is decremented by the r_swapmem value for each region on the list until either swapspc_cnt drops to zero or no other regions utilize pseudo swap. swapmem_cnt is then incremented by the amount of pseudo swap successfully migrated. Because pseudo swap is related to system memory usage, the swap reservation scheme reflects lockable memory policies. Although the system is not necesarily allocating additional memory when a process locks itself into memory, locked pages are no longer available for general use. This causes swapmem_cnt to be decremented to account for the pages. swapmem_cnt is also decremented by the size of the entire process if that process gets plocked in memory All swap devices and file systems enabled for swap have an associated priority, ranging from 0 to 10, indicating the order that swap space from a device or file system is used. System administrators can specify swap-space priority using a parameter of the swapon(1M) command. Swapping rotates among both devices and file systems of equal priority. Given equal priority, however, devices are swapped to by the operating system before file systems, because devices make more efficient use of CPU time. We recommend that you assign the same swapping priority to most swap devices, unless a device is significantly slower than the rest. Assigning equal priorities limits disk head movement, which improves swapping performance.
Swapping is accomplished on HP-UX using the following data structures:
The following table details the elements of the struct swdevt. Table 1-26 Device swap table (struct swdevt)
The following table details the principle elements of the struct fswdevt. Table 1-27 File system swap table (struct fswdevt)
Two structures track swap space. The swaptab[] array tracks a chunk of swap space. swapmap entries hold swap information on a per-page level. swaptab defaults to track a 2MB chunk of space and swapmap tracks each page within that 2MB chunk. Each entry in the swaptab[] array has a pointer (called st_swpmp) to a unique swapmap. swapmap entries have backwards pointers to the swaptab index. There is one entry in the swapmap for each page represented by the swaptab entry (default 2 MB, or 512 pages); that is, swapmap conforms in size to swchunk. A linked list of free swap pages begin at the swaptab entry's st_free and use each free swapmap entry's sm_next. When a page of swap is needed, the kernel walks the structures (using the getswap() routine in vm_swalloc.c), which calls other routines that actually locate the chunk, and so forth.
Now all information needed to retrieve the page from swap has been stored. Table 1-28 Swap table entry (struct swaptab)
Table 1-29 swap map entry (struct swapmap)
Since vhand()is tuned to be nice regarding I/O usage and CPU usage, it allows the pager to fault out swapped processes. The swapper marks the process to be swapped for deactivation, which takes it off the run queue. Since it cannot run once its pages are aged, they cannot be referenced again. When the steal hand comes around, it steals all the pages in the region. When memory pressure is high, sched() selects a process to swap using the routine choose_deactivate(). This routine is biased to choose non-interactive processes over interactive ones, sleeping processes over running ones, and long-running processes over newer ones. Once a process has been chosen to be deactivated, the following actions occur:
A process that has been inactive long enough for all its pages to have been aged and stolen is virtually swapped out already. The global deactprocs points to the head of a list of inactive processes, its chain running through the pregion element p_nextdeact. If the average number of free pages drops below lotsfree, these pages are swapped out. When memory pressure eases, a deactivated process is reactivated. The choose_reactivate() routine is biased to choose interactive over non-interactive ones processes, runnable processes over sleeping ones, and processes that have been deactivated longest over those more recently deactivated. |
![]() |
||
![]() |
![]() |
![]() |
|||||||||
|