![]() |
![]() |
|
|
![]() |
![]() |
HP-UX Memory Management: White Paper > Chapter 1 MEMORY MANAGEMENT![]() MEMORY-RELEVANT PORTIONS OF THE PROCESSOR |
|
The figure above and the table that follows, name the principal processor components; of them, registers, translation lookaside buffer, and cache are crucial to memory management, and will be discussed in greater detail following the table. Table 1-2 Processor Architecture, components and purposes
The operating system maintains a table in memory called the Page Directory (PDIR) which keeps track of all pages currently in memory. When a page is mapped in some virtual address space, it is allocated an entry in the PDIR. The PDIR is what links a physical page in memory to its virtual address. The PDIR is implemented as a memory-resident table of software structures called page directory entries (PDEs), which contain virtual addresses. The PDIR maps the entire physical memory with one entry for every page in physical memory. Each entry contains a 48/64 bit virtual address. When the processor needs to find a physical page not indexed in the TLB, it can search the PDIR with a virtual address until it finds a matching address. The PDIR table is a hash table with collision chains. The virtual address is used to hash into one of the buckets in the hash table and the corresponding chain is searched until a chain entry with a matching virtual address is found. A trap occurs because translation is missing in the translation lookaside buffer (TLB, discussed shortly). If the processor can find the missing translation in the PDIR, it installs it in the TLB and allows execution to continue. If not, a page fault occurs. A page fault is a trap taken when the address needed by a process is missing from the main memory. This occurrance is also known as a PDIR miss. A PDIR miss indicates that the page is either on the free list, in the page cache, or on disk; the memory management system must then find the requested page on the swap device or in the file system and bring it into main memory. Conversely, a PDIR hit indicates that a translation exists for the virtual address in the TLB. Each PDE contains information on the virtual-to-physical address translation, along with other information necessary for the management of each page of virtual memory. The structural elements of the hashed page directory for PA-RISC 1.1 are shown in the following table. Table 1-3 struct hpde, the hashed page directory
A word-oriented hpde structure (struct whpde) is implemented for faster manipulation and is documented in /usr/include/machine/pde.h. The pde.h header file also contains the definitions space for manipulation, maximum number of entries in the PDIR hashtable, constants related to field positions within the PDE structure, access rights (which are now given on a region basis), and another hashed page directory (struct hpde2_0) for PA-RISC 2.0.
The translation lookaside buffer (TLB) translates virtual addresses to physical addresses. Address translation is handled from the top of the memory hierarchy hitting the fastest components first (such as the TLB on the processor) and then moving on to the page directory table (pdir in main memory) and lastly to secondary storage. Depending on model, the TLB may be organized on the processor in one of two ways:
At one time many systems were being designed with split Data TLB (DTLB) and Instruction TLB (ITLB), to account for the different characteristics of data and instruction locality and type of access (frequent random access of data versus relatively sequential single usage of instructions). Cost factors have allowed the inclusion of much larger TLBs on processors, which has lessened the disadvantages of a unified TLB. As a result many newer processors have unified TLBs. In addition to the standard TLB that maps each entry to a single page of memory, many processors also have a block TLB. The block TLB is used to map entries to virtual address ranges larger than a single page, that is, multiple hpdes. Block TLB entries are used to reference kernel memory that remains resident. Since the operating system moves data in and out of memory by pages, a range of pages referenced by a block TLB entry is locked in memory and cannot be paged out. Addressing blocks of pages thus increases the overall address range of the TLB and the speed with which large transactions can be serviced, and thus may be thought of as a hardware implementation of large pages. The block TLB is typically used for graphics, because their data is accessed in huge chunks. It is also used for mapping other static areas such as kernel text and data. The TLB looks up the translation for the virtual page numbers (VPNs) and gets the physical page numbers (PPNs) used to reference physical memory. Ideally the TLB would be large enough to hold translations for every page of physical memory; however this is prohibitively expensive; instead the TLB holds a subset of entries from the page directory table (PDIR) in memory. The TLB speeds up the process of examining the PDIR by caching copies of its most recently utilized translations. Because the purpose of the TLB is to satisfy virtual to physical address translation, the TLB is only searched when memory is accessed while in virtual mode. This condition is indicated by the D-bit in the PSW (or the I-bit for instruction access). Since the TLB translates virtual to physical addresses, each entry contains both the Virtual Page Number (VPN) and the Physical Page Number (PPN). Entries also contain Access Rights, an Access Identifier, and five flags. Table 1-4 TLB flags (PA 2.x architecture)
The T,D, and B flags are only present in data or unified TLBs. In PA 1.x architecture, an E bit (or "valid" bit) indicates that the TLB entry reflects the current attributes of the physical page in memory. Cache is fast, associative memory on the processor module that stores recently accessed instructions and data. From it, the processor learns whether it has immediate access to data or needs to go out to (slower) main memory for it. Cacheable data going to the CPU from main memory passes through the cache. Conversely, the cache serves as the means by which the CPU passes data to and from main memory. Cache reduces the time required for the CPU to access data by maintaining a copy of the data and instructions most recently requested. A cache improves system performance because most memory accesses are to addresses that are very close to or the same as previously accessed addresses. The cache takes advantage of this property by bringing into cache a block of data whenever the CPU requests an address. Though this depends on size of the cache, associativity, and workload, a vast majority of the time (according to performance measurements), the cache has what you're looking for the next time, enabling you to reference it. Depending on model, PA-RISC processors are equipped with either a unified cache or separate caches for instructions and data (for better locality and faster performance). In multiprocessing systems, each processor has its own cache, and a cache controller maintains consistency. Cache memory itself is organized as follows:
When a process executes, it stores its code (text) and data in processor registers for referencing. If the data or code is not present in the registers, the CPU supplies the virtual address of the desired data to the TLB and to the cache controller. Depending on implementation, caches can be direct mapped, set associative, or fully associative. Recent PA implementations use direct associative caches and fully associative TLBs. Virtual addresses can be sent in parallel to the TLB and cache because the cache is virtually indexed. A physical page may not be referenced by more than one virtual page, and a virtual address cannot translate to two different physical addresses; that is, PA-RISC does not support hardware address aliasing, although HP-UX implements software address aliasing for text only in EXEC_MAGIC executables. The cache controller uses the low-order bits of the virtual address to index into the direct-mapped cache. Each index in the cache finds a cache tag containing a physical page number (PPN) and a cache line of data. If the cache controller finds an entry at the cache location, the cache line is checked to see whether it is the right one by llooking at the PPN in the cache tag and the one returned by the TLB, because blocks from many different locations in main memory can be mapped legitimately to a given cache location. If the data is not in cache but the page is translated, the resultant data cache miss is handled completely by the hardware. A TLB miss occurs if the page is not translated in the TLB; if the translation is also not in the PDIR, HP-UX uses the page fault code to fault it in. If not in RAM, the data and code might have to be paged from disk, in which case the disk-to-memory transaction must be performed. On a more detailed level, the next figure demonstrates the mapping of virtual and physical address components. The sequence followed by the processor as it validates addresses is one of "hit or miss."
There are five TLB miss handlers (instruction, data, non-access instruction, nonaccess data, and dirty) located in locore.s; the header file pde.h has the TLB/PDIR structure definition. In addition to assisting in virtual address translation, the translation lookaside buffer (TLB) serves a security function on behalf of the processor, by controlling access and ensuring that a user process sees only data for which it has privilege rights. The TLB contains access rights and protection identifiers. PA-RISC allows up to four protection IDs to be associated with each process. These IDs are held in control registers CR-8, CR-9, CR-12, and CR-13. Table 1-5 Security checks in the TLB
Figure 1-11 “Access control to virtual pages” shows the checkpoints for controlling access to a page of data through the TLB. Two checks are performed: protection check and access rights check. If both checks pass, access is granted to the page referenced by the TLB.
If the two PPNs do not match (assuming a TLB hit), the cache line is loaded because the bytes referenced on the virtual page are not yet in the cache. The time it takes to service a cache miss varies depending on if the data already present in the cache is clean or dirty. (When the cache is dirty, the old contents are written out to memory and the new contents are read in from memory.) If the cache line is "clean" (that is, not modified), it does not have to be written back to main memory, and the penalty is fewer instruction cycles than if the cache is dirty and must be written back to main memory. All PA-RISC machines use a cache write-back policy, meaning that the main memory is updated only when the cache line is replaced. PA-RISC allows for privilege level promotion by using a GATEWAY instruction. This instuction performs an interspace branch to increase the privilege level. The most common example of this in HP-UX is a system call, which changes the privilege level from user to kernel. Registers, high-speed memory in the processor's CPU, are used by the software as storage elements that hold data for instruction control flow, computations, interruption processing, protection mechanisms, and virtual memory management. All computations are performed between registers or between a register and a constant (embedded in an instruction), which minimizes the need to access main memory or code. This register-intensive approach accelerates performance of a PA-RISC system. This memory is much faster than conventional main memory but it is also much more expensive, and therefore used for processor-specific purposes. Registers are classified as privileged or non-privileged, depending on the privilege level of the instruction being executed. Table 1-6 Types of Registers
|
![]() |
||
![]() |
![]() |
![]() |
|||||||||
|