![]() |
![]() |
|
|
![]() |
![]() |
HP-UX Memory Management: White Paper > Chapter 1 MEMORY MANAGEMENT![]() MAINTAINING PAGE AVAILABILITY |
|
Two computational elements maintain page availability:
vhand monitors free pages to keep their number above a threshold and ensure sufficient memory for demand paging. vhand governs the overall state of the paging system. sched becomes operative when the number of pages available in memory diminishes below a certain level. vhand and sched will be described in the context of their work shortly.
Memory management uses paging thresholds that trigger various paging activities. The figure shows the full range of available memory and indicates what paging activity occurs when memory level falls below each paging threshold. The value termed freemem represents the total number of free pages in the phead linked list, which includes all memory available in a system after kernel initialization. Three tunable paging thresholds are initialized by the setmemthresholds() routine. Table 1-20 setmemthresholds() paging thresholds
The gpgslim paging threshold is the point at which vhand starts paging. gpgslim adjusts dynamically according to the needs of the system. It oscillates between an upper bound called lotsfree and a lower bound called desfree. Both lotsfree and desfree are calculated when the system boots up and are based on the size of system memory. When the system boots, gpgslim is set to 1/4 the distance between lotsfree and desfree (desfree + (lotsfree - desfree)/4). As the system runs, this value fluctuates between desfree and lotsfree. When the sum of available memory and the number of pages scheduled for I/O (soon to be freed) falls below gpgslim, vhand() begins aging and stealing little-used pages in an attempt to increase the available memory above this threshold. The system wants to keep memory at gpgslim. If the system is not stressed, gpgslim starts rising, because it does not need to have a lot more pages freed. As memory becomes more scarce, the system tries to maintain the pool of free memory, causing gpgslim to fall. If gpgslim decreases to minfree, the system starts to deactivate entire processes. Performance testing has shown that memory usage differs for a server versus a workstation. Workstations typically run a few large applications whereas servers typically run many applications of varying size. Consequently, the paging and deactivation thresholds on workstations are a smaller fraction of memory than on the servers. In a typical workstation environment, applications start up requiring a large number of pages, which eventually reduce to a smaller working set of pages. By allowing applications to claim more memory before paging or deactivating, the working set is more likely to stay in memory. Paging and activation algorithms take these and other differences into account. Depending on the physical memory size of the system, the paging thresholds are initialized to either a "small memory" or "large memory" set of values. For small memory systems (that is, systems with 32MB or less of freemem), the paging thresholds are set to a smaller fraction of total memory to allow applications to utilize more memory before the system begins paging and deactivating. The paging thresholds are set as follows: Table 1-21 Small-memory paging thresholds
For large memory systems (that is, systems with greater than 32 MB of freemem), the paging thresholds are set to a larger fraction of memory to allow vhand() to start paging earlier so that it can efficiently walk a (potentially) longer active pregions list. This also helps sched() process a potentially longer active process list by starting process deactivation earlier. The paging thresholds are set as follows: Table 1-22 Large-memory paging thresholds
These settings result in a linear increase of the paging thresholds up to a certain memory size, after which the thresholds remain fixed. For example, lotsfree increases linearly and reaches its maximum value of 32 MB when freemem is 512 MB. For memory sizes beyond 512 MB, lotsfree remains fixed at 32 MB. This results in the system paging earlier for smaller memory configurations and later for larger sizes. When physical memory sizes exceed 2 GB, all the paging thresholds are increased to a larger set of fixed values. Programmatically, vhand is awakened by schedpaging() periodically to maintain recently referenced pages and to move pages out when memory is tight. vhand operates on the basis of vhandargs_t, which consists of a pointer to the target pregion, a count of the physical pages visited, and a nice value for preferential aging. vhand can also be awakened by allocpfd2() (in vm_page.c), a routine that allocates a single page of memory. If all the pages on the free memory list (phead) are locked, or the routine has been called while using the interrupt control stack (ICS) and all pages on the free list are also in the page cache (phash), allocpfd2() cannot get any pages. If on the ICS without any available pages, allocpfd2() wakes the page daemon. Regardless of which stack the system is running on, allocpfd2() then wakes up unhashdaemon, which removes pages from the page cache. If on the ICS, allocpfd2() returns NULL; if not on the ICS, allocpfd2() sleeps waiting for a page to become available, and then retry. A doubly linked list of pregions, termed the active pregion list, is used by vhand to examine memory availability. Conceptually, the pregions can be visualized as being linked in a circle, in the center of which are two clock-like hands. The two hands function as a steal hand and an age hand.
The kernel automatically keeps an appropriate distance between the hands, based on the available paging bandwidth, the number of pages that need to be stolen, the number of pages already scheduled to be freed, and the frequency by which vhand runs. The two hands cycle through the active pregion linked lists of physical memory to look for memory pages that have not been referenced recently and move them to secondary storage - the swap space. Pages that have not been referenced from the time the age hand passes to the time the steal hand passes are pushed out of memory. The hands rotate at a variable rate determined by the demand for memory. The vhand daemon decides when to start paging by determining how much free memory is available. Once free memory drops below the gpgslim threshold, paging occurs. vhand attempts to free enough pages to bring the supply of memory back up to gpgslim. Between gpgslim and lotsfree, the page daemon continues to age pages (that is, clear their reference bits) but no longer steals pages. vhand responds to various workloads, transient situations, and memory configurations. When aging and stealing from regions, vhand
When the age hand arrives at a region, it ages some constant fraction of pages before moving to the next region (by default 1/16 of the region's total pages). The p_agescan tag enables the age hand to move to the location within a pregion where it left off during its previous pass, while the p_ageremain charts how many pages must be aged to fill the 1/16 quota before moving on to the next pregion. The steal hand uses the pregion field p_stealscan to locate itself within a pregion and resume taking pages that have not been referenced since last aged. If no valid page remain, vhand pushes out of memory the vfd/dbd pairs associated with the region. How much to age and steal depends on several factors:
vhand is biased against threads that have nice priorities: the nicer a thread, the more likely vhand will steal its pages. The pregion field p_bestnice reflects the best (numerically, the smallest value) nice value of all threads sharing a region. Refer to the table that follows for explanations of the vhand variables.
Table 1-23 Variables affecting vhand
Once vhand establishes its criteria, it proceeds to traverse the linked list of pregions. Continuing in the clock-hands analogy, vhand is ready to move its hands.
Note, the steal hand is moved first to keep it behind the age hand and prevent aging and stealing a page in the same cycle. The sched() routine (colloquially termed "the swapper") handles the deactivation and reactivation of processes when free memory falls below minfree, or when the system appears to be thrashing.
Deactivation occurs when sched() determines the system:
Reactivation occurs when the system is no longer low on memory or thrashing. Deactivation and reactivation are determined by:
The swapper deactivates processes and prevents them from running, thus reducing the rate at which new pages are accessed. Once swapper detects that available memory has risen above minfree and the system is not thrashing, the swapper reactivates the deactivated processes and continues monitoring memory availability. Figure 1-27 sched() chooses processes to deactivate based on size, nice priority, and how long it has been running. ![]() sched() walks the chain of active processes, examining each, and deciding the best candidate to be deactivated based on size, nice priority, and how long it has been running. Programmatically, sched() deactivates and reactivates processes. If the system appears to be thrashing or experiencing memory pressure, the sched routine walks through the active process list calculating each process's deactivation priority based on type, state, size, length of time in memory, and how long it has been sleeping. (Batch and processes marked for serialization by the serialize() command are more likely to be deactivated than interactive processes.) The best candidate is then marked for deactivation. If the system is not thrashing or experiencing memory pressure, the sched routine walks through the active proc list calculating each deactivated process' reactivation priority based on how long it has been deactivated, its size, state, and type. Batch processes and those marked by the serialize() command are less likely to be reactivated than is an interactive process. Once the most deserving process has been determined, it is reactivated. Once a process and its pregions are marked for deactivation, sched()
Eventually, vhand pushes the deactivated process's pages to secondary storage. Processes stay deactivated until the system has freed up enough memory and the paging rate has slowed sufficiently to return processes to the run queue. The process with the highest reactivation priority is then returned to the run queue. Once a process and its pregions are marked for reactivation, sched():
Earlier HP-UX implementations did not permit a process to be swapped out if it was holding a lock, doing I/O, or was not at a signalable priority. Even if priority made it most likely to be deactivated, vhand bypassed the process. Now, if the most deserving process cannot be deactivated immediately, it is marked for self-deactivation; that is, the process sets a self-deactivation flag. The next time the process must fault in a page, it deactivates itself.
Thrashing is defined as low CPU usage with high paging rate. Thrashing might occur when several processes are running, several processes are waiting for I/O to complete, or active processes have been marked for serialization. On systems with very demanding memory needs (for example, systems that run many large processes), the paging daemons can become so busy deactivating/reactivating, and swapping pages in and out that the system spends too much time paging and not enough time running processes. When this happens, system performance degrades rapidly, sometimes to such a degree that nothing seems to be happening. At this point, the system is said to be thrashing, because it is doing more overhead than productive work. If your working set is larger than physical memory, the system will thrash. To solve the problem,
If you are left with one huge process constrained with physical memory and the system still thrashes, you will need to rewrite the application so that it uses fewer pages simultaneously, by grouping data structures according to access, for example. All processes marked by the serialize command are run serially. This functionality unjams the bottleneck (recognizable by process throughput degradation) caused by groups of large processes contending for the CPU. By running large processes one at a time, the system can make more efficient use of the CPU as well as system memory since each process does not end up constantly faulting in its working set, only to have the pages stolen when another process starts running. As long as there is enough memory in the system, processes marked by serialize() behave no differently than other processes in the system. However, once memory becomes tight, processes marked by serialize are run one at a time in priority order. Each process runs for a finite interval of time before another serialized process may run. The user cannot enforce an execution order on serialized processes. serialize() can be run from the command line or with a PID value. serialize() also has a timeshare option that returns the PID specified to normal timeshare scheduling algorithms. If serialization is insufficient to eliminate thrashing, you will need to add more main memory to the system. |
![]() |
||
![]() |
![]() |
![]() |
|||||||||
|