Hal's Software

Processor and Memory API Report


Chapter 1. Introduction

Basic Overview

Traditionally, systems with multiple processors have been designed using the symmetric multiprocessor (SMP) model. In this model, no individual processor or group of processors have faster access to system memory or I/O devices than any other processor in the system. Unfortunately, the scalability of an SMP setup is limited by the finite bandwidth of the singular processor bus used to connect the processors to each other and the other devices on the system. To overcome the scalability problems associated with SMP designs, systems with a large number of processors often use a design which features non-uniform memory access (NUMA). In a NUMA system, CPUs are grouped into smaller subsystems called nodes. Each node may contain a small number of processor, memory and I/O devices and is connected to other nodes in the system using an interconnect bus. As a result of the physical system topology, each processor and I/O device has faster access to memory located on the same or in nearby nodes and slower access to memory located in nodes which are farther away. Most NUMA systems use cache-coherent NUMA (ccNUMA) designs which insure that each processor has globally consistent view of system memory.

Operating systems and other software vendors have developed various APIs to allow user-level programs to examine system topology, control thread scheduling and and optimize memory allocation. Although these different APIs often share many similarities, they are not governed by industry standards and differ from each other in significant ways. Additionally, each vendor provides administrative control programs which differ in both interface and functionality from similar programs by other vendors.

Program Optimizations

When a variety of different applications with differing resource needs are run on the same system, it is often best to rely on the default behavior of the operating system to reactively schedule threads on different processors and allocate memory for those threads appropriately. However, many applications may realize performance benefits through the preemptive use of APIs which control thread scheduling and memory allocation.

Applications with many processes or threads which operate on different subsets of a large dataset may benefit from the use of APIs which control thread scheduling and memory allocation on NUMA systems. Thread scheduling APIs can be used to force threads to run on processors with fast access times to the dataset(s) which the threads are accessing frequently. Likewise, the application can change the memory allocation policies of the operating system to help insure that relevant partitions of the dataset(s) are placed in memory near the processors which are executing the threads from which data will be most frequently accessed.

In general, thread scheduling APIs should be used to insure that threads which are heavily dependent on shared data run on the same node. Additionally, shared data should be partitioned and/or replicated whenever possible such that the data is available locally to the threads running on each node which must access the data frequently.

Applications which execute large I/O requests may benefit from the use of APIs which control memory allocation on NUMA systems supporting multipath I/O (MPIO). Multipath I/O refers to a setup in which a physical device such as a disk is attached to more than one adapter in the system, possibly in different nodes, thus creating multiple access paths to the device. Unless specific information regarding the device topology is available, memory used in large I/O requests should be distributed across all available system nodes to maximize performance.

When an application accesses a portion of its address space, the processor must translate the requested virtual memory address into a physical memory address. This translation process is expensive, but the processor can cache the results on a page-by-page basis. In general, the cache can only hold information on a fixed number of translations. When an application frequently accesses more memory pages than can be held in the translation cache, its performance may be significantly degraded. Such applications will likely benefit from the use of APIs which control memory page size. By increasing the size of memory pages used to manage specific portions of the application's address space it is possible to significantly increase the application's performance.

Informational Interfaces

Almost all vendors provide APIs which at least allow a program to determine the number of processors available to the program on the system. Some also allow programs to access additional system topology information. On such systems, optimizations which take advantage of the hierarchical organization of the system can prove surprisingly fruitful. Although operating systems generally do a reasonable job at scheduling processes and allocating memory to take advantage of system configuration, there are many instances where better resource utilization can be achieved by asserting control over scheduling and memory allocation using program specific knowledge.

Control Interfaces

Available control interfaces can be broken down into several categories: processor-binding APIs, processor-set APIs, and APIs to control memory allocation and placement.

Processor Binding APIs

Processor binding APIs associate a collection of processors with each thread and/or process and only allow the thread or process to be scheduled to run on processors in the set. If the system allows the set to consist of more than one processor then it is usually represented by a bit-mask of processors. On some systems, the binding can be made advisory instead of manditory.

Processor-Set APIs

Processor sets are global collections of processors to which a thread and/or process can be assigned. Once assigned, the thread or process will only be scheduled to run on processors within the set. Exclusive processor sets allow an individual processor to belong to only one set at a time.

Memory Allocation and Placement APIs

The capabilities provided for memory allocation control vary widely. Some systems provide the capability to control the set of nodes from which memory is allocated. Also, some systems allow threads to request that parts of their address space be placed on specific nodes.

Chapter 2. System Support

The following sections discuss the relevant interfaces provided by specific systems.

Microsoft Windows

Basic System Information

The GetSystemInfo function can be used to retrieve the number of processors on the system, the list of active processors and the memory page size.

NUMA System Information

Windows Server 2003 introduced NUMA support. System topology information is provided by the GetLogicalProcessorInformation, GetNumaHighestNodeNumber, GetNumaNodeProcessorMask, GetNumaProcessorNode and GetNumaAvailableMemoryNode functions. The current processor can be determined using the GetCurrentProcessorNumber function. The provided topology information can also include information about SMT (Hyperthreaded) logical processors.

Advisory Processor Association

Starting with Windows NT 4.0, support was introduced for a thread to have a preferred processor using the SetThreadIdealProcessor function. A thread's current ideal processor is stored in the thread's KTHREAD structure ([NTIFS]).

Processor Binding

All systems starting with Windows NT 3.5 (and Windows 95) support thread scope binding using the SetThreadAffinityMask function. Additionally, these systems support process scope binding in the form of the GetProcessAffinityMask function. The characteristics of these functions differ slightly on the Windows 95 family of operating systems as these systems do not include multi-processor support.

Windows NT 4.0 and higher systems support provide for the manipulation of the process affinity mask using the SetProcessAffinityMask function. The current thread affinity mask can be queried on Windows 2000, Windows XP and Windows Server 2003 systems using the undocumented thread information class ThreadBasicInformation with the NtQueryInformationThread function to get a copy of the thread's THREAD_BASIC_INFORMATION structure ([NTDLL]).

Sun Solaris

Basic System Information

The sysconf function can be used with the _SC_CPUID_MAX, _SC_NPROCESSORS_CONF, _SC_NPROCESSORS_MAX flags to determine basic system processor configuration. The current state of a processor can be determined using the processor_info function.

The getpagesize function returns the default memory page size. A list of available memory page sizes can be retrieved using the getpagesizes function.

NUMA System Information

Newer versions of Solaris Operating Environment 9 (starting December 2002) support the Memory Placement Optimization (MPO) API. This API allows a thread to determine its current processor using the getcpuid function and its current home latency group (current node) using the gethomelgroup function.

Additionally, newer versions of the Solaris Operating Environment support the Locality Group API. This API is exported through the lgrp library. NUMA topology information can be retrieved using the lgrp_view, lgrp_children, lgrp_parents, lgrp_root, lgrp_nlgrps, lgrp_cpus and lgrp_mem_size functions.

Advisory Processor Association

The lgrp_affinity_get and lgrp_affinity_set functions can be used to get and set a thread's local group association. Each process and thread is assigned a home group used for default scheduling purposes. The home group can be retrieved using the lgrp_home function.

System Control

The Solaris Operating Environment allows online processor status changes using the p_online function. This function allows a processor to be switched between online and offline states. Solaris supports the concept of a non-interruptible online processor state in which the processor is schedulable but will not be used to process external I/O events.

Processor Binding

The Solaris Operating Environment supports the concept of process and thread processor binding. The function processor_bind can be used to set or query a process's or thread's processor binding.

Global Processor Sets

The Solaris Operating Environment also supports exclusive processor sets. These sets are created and manipulated using the pset_create, pset_destroy, pset_assign and pset_setattr functions. Information regarding a given processor set can be retrieved using the pset_info and pset_getattr functions. The processor set binding can be set or queried using the pset_bind function. System load information specific to a given processor set can be retrieved using the pset_getloadavg function. A list of processor sets can be retrieved using the pset_list function.

Page Size Manipulation

Before Solaris 9, allocating memory using a large page size could be done using the SHM_SHARE_MMU flag with the shmat function. This feature is known as intimate shared memory (ISM). On Solaris 9 and latter systems, specific page sizes can be requested through the use of the MC_HAT_ADVISE flag with the memcntl function.

Memory Migration and Placement

Information regarding the associated latency group of memory regions can be obtained using the meminfo function with the MEMINFO_VLGRP or MEMINFO_VREPL_LGRP flags. Memory region placement can also be optimized using the madvise function with the MADV_ACCESS_LWP, MADV_ACCESS_MANY or MADV_ACCESS_DEFAULT flags. Specifically, using the MADV_ACCESS_LWP flag, a given thread can claim a region of virtual memory for placement to optimize access from the thread's current latency group.

HP-UX

Basic System Information

The number of processors on the system can be determined using the MPC_GETNUMSPUS_SYS flag with the mpctl function. Each processor on the system has a unique ID. The IDs of all system processors can be enumerated using the MPC_GETFIRSTSPU_SYS and MPC_GETNEXTSPU_SYS flags with the mpctl function. The number of processors available to a particular thread can be determined using the pthread_num_processors_np function or using the MPC_GETNUMSPUS flag with the mpctl function. The IDs of processors available to a particular thread can be determined using the PTHREAD_GETFIRSTSPU_NP and PTHREAD_GETNEXTSPU_NP flags with the pthread_processor_bind_np function or using the MPC_GETFIRSTSPU and MPC_GETNEXTSPU flags with the mpctl function.

NUMA System Information

The _SC_CCNUMA_SUPPORT flag can be used with the sysconf function to determine if the system has NUMA support. The MPC_GETNUMLDOMS_SYS, MPC_GETFIRSTLDOM_SYS, MPC_GETNEXTLDOM_SYS, MPC_LDOMSPUS_SYS and MPC_SPUTOLDOM flags can be used with the mpctl function to retrieve system topology information. A process can determine its current processor using the MPC_GETCURRENTSPU flag with the mpctl function. The number of NUMA nodes available to a particular thread can be determined using the pthread_num_ldoms_np function or using the MPC_GETNUMLDOMS flag with the mpctl function. The ID of each node available to a particular thread can be determined using that PTHREAD_GETFIRSTLDOM_NP and PTHREAD_GETNEXTLDOM_NP flags with the pthread_ldom_id_np function or using the MPC_GETFIRSTLDOM and MPC_GETNEXTLDOM flags with the mpctl function. The number of available processors within each such node can be determined using the pthread_num_ldomprocs_np function or using the MPC_LDOMSPUS flag with the mpctl function.

Advisory Processor Association

The MPC_SETPROCESS flag can be used with the mpctl to associate a process with a particular processor. The PTHREAD_BIND_ADVISORY_NP flag can be used to associate a thread with a particular processor.

Processor Binding

A process can be bound to a specific processor using the MPC_SETPROCESS_FORCE flag with the mpctl function. Similarly, a process can be bound to a specific NUMA node using the MPC_SETLDOM flag with the mpctl function. A thread can be bound to a specific processor using the PTHREAD_BIND_FORCED_NP flag with the pthread_processor_bind_np function. A thread can be bound to a specific NUMA node using the pthread_ldom_bind_np function.

Global Processor Sets

Support for exclusive processor sets and processor binding have been included since HP-UX 11i version 1.6. Processor sets are created and manipulated using the pset_create, pset_destroy, pset_assign and pset_setattr functions. Information regarding a given processor set can be retrieved using the pset_ctl and pset_getattr functions. The pset_ctl function can also be used to obtain a list of current system processor sets and processor topology information. Support for processor sets can be queried by using the _SC_PSET_SUPPORT flag with the sysconf function.

The pset_bind function is used to bind a particular process to a processor set. A thread can be bound to a processor set using the pthread_pset_bind_np function.

Memory Migration and Placement

The mmap and shmget functions have been enhanced to accept the (MAP|IPC)_MEM_INTERLEAVED, (MAP|IPC)_MEM_LOCAL and (MAP|IPC)_MEM_FIRST_TOUCH flags.

IBM AIX

Basic System Information

The number of system processors or online system processors can be determined using the sysconf function with the _SC_NPROCESSORS_CONF or _SC_NPROCESSORS_ONLN flags respectively. Alternatively, the variables _system_configuration.ncpus or _system_configuration.max_ncpus can be used.

The default memory page size can be determined by using the _SC_PAGESIZE flag with the sysconf function. The _SC_LARGE_PAGESIZE flag can be used with the sysconf function to determine the size of large memory pages.

NUMA System Information

The number of NUMA nodes available to a particular program can be determined using the rs_numrads function. Information about a specific node can be retrieved using the rs_getinfo function. Detailed topology information can be obtained through the use of the rs_getassociativity function.

Configuration Change Notification

Processes which are bound to a processor will be notified by the SIGRECONFIG signal if the state of the processor is scheduled to change. Within the signal handler, the application should call the dr_reconfig function to obtain the details of the impending change. Applications with appropriate credentials can cancel the change before it goes into effect.

Processor Binding

Applications and/or threads can bind to a processor using the bindprocessor function. Kernel thread IDs can be retrieved using the thread_self function. Additionally, the mapping between kernel thread ids and pthread handles can be accessed using the functions pthdb_pthread_tid and pthdb_tid_pthread. Threads bound to a CPU which is being deallocated will be sent the SIGCPUFAIL signal.

The ra_attachrset function or the rs_setpartition function can be used to bind a process to a specified NUMA node. The ra_exec and ra_fork functions allow processes to be created with such bindings.

Global Processor Sets

AIX allows for arbitrary groupings of processors and memory units known as resource sets. The rs_op function is used in combination with the rs_init, rs_alloc and rs_getrad functions to create and manipulate arbitrary resource sets. Global resource sets are assigned names with the rs_setnameattr function and retrieved using the rs_getnamedrset function.

Page Size Manipulation

An application can allocate memory using the large page size by using the SHM_LGPAGE and SHM_PIN flags with the shmget function. Large page support must be enabled by the system administrator using the vmtune command.

Memory Migration and Placement

As AIX resource sets can include memory units, it is possible to use resource sets to restrict the set of NUMA nodes from which application memory is allocated. Also, it is possible to force migration of application memory from one set of nodes to another set using resource-set bindings.

SGI IRIX

Basic System Information

Basic configuration information can be retrieved using the MP_NPROCS, MP_NAPROCS and MP_STAT flags with the sysmp function. The current default memory-management policy set can be accessed using the pm_getdefault function.

NUMA System Information

IRIX provides a namespace for processors and nodes under the /hw directory. Nodes are identified by, for example, /hw/module/1/slot/n1/node, with aliases as, for example, /hw/nodenum/0 -> /hw/module/1/slot/n1/node. Processors are identified by, for example, /hw/module/1/slot/n1/node/cpu/a where cpus are named either a or b within their nodes. Processor aliases are available as, for example, /hw/cpunum/0 -> /hw/module/1/slot/n1/node/cpu/a. The hardware graph filesystem is not guaranteed to be mounted under /hw, so it may be necessary to search for the mount point of hwgfs using the getmntent function.

System Control

The sysmp function allows a processor's state to be changed to exclude it from running processes not specifically bound to it. It is also possible to assign the processor used to manage the system clock. The MP_ISOLATE flag allows a processor to delay cache synchronization until system services are requested. The MP_NONPREEMPTIVE and MP_WARDRTC flags allow a processor not to process clock and other timer events.

Advisory Processor Association

When a memory locality domain is linked to a process using the process_mldlink with the RQMODE_ADVISORY flag, then the binding is considered advisory.

Processor Binding

Using the sysmp function with the MP_MUSTRUN or MP_MUSTRUN_PID flags, it is possible to bind a process to an isolated processor. The function pthread_setrunon_np allows binding of a thread with PTHREAD_SCOPE_SYSTEM or PTHREAD_SCOPE_BOUND_NP scope to a particular isolated processor.

Binding to NUMA nodes can be accomplished using memory locality domains (MLDs). When a process is attached to a MLD using the process_cpulink function, the scheduler attempts to schedule the process on a processor where the MLD has been placed. Memory locality domain binding with process_cpulink is not available for pthread enabled applications.

Global Processor Sets

IRIX allows processors to be excluded from the running of unbound processes. These processors will only run processes which are explicitly bound to them. Thus, each excluded processor can be though of as being a single-element exclusive set processor set.

Page Size Manipulation

IRIX supports the concept of a memory-management policy set. The memory page size is only element which can be included in the policy.

Memory Allocation and Placement

IRIX supports the concept of memory locality domains. The function mld_create is used to create a memory locality domain of a given size. The function numa_acreate will create a memory arena for memory allocation within the memory locality domain.

The memory locality domains are grouped into memory locality domain sets. A set is created using the mldset_create function. The function mldset_place must be used to place a memory locality domain set using a given topology and resource affinity set. It is possible to request specific topological arrangements close to a given device or file. The function migr_range_migrate along with migr_policy_args_init can be used to migrate a memory range into a given memory locality domain.

Different memory management strategies can be associated with different regions in a program's address space. Policy information can include a memory locality domain set used preferentially for placement and allocation. A policy set is created using the pm_create function and attached to a given memory region using the pm_attach function. A policy set can be designated the default policy set using the pm_setdefault function. The current policy set for a given memory region is accessed using the pm_getall function. Policy set parameters can be extracted from a policy set handle using the pm_getstat function.

The __pm_get_page_info and __mld_to_node functions can be used to determine placement information for a given memory region.

Tru64 UNIX

Basic System Information

The number of system processors and the number of active system processors can be determined using the sysconf function with the _SC_NPROCESSORS_CONF or _SC_NPROCESSORS_ONLN flag respectively. The default memory page size can be determined using the _SC_PAGESIZE flag with the sysconf function. System processor and basic topology information is available though the getsysinfo function.

NUMA System Information

Starting with Tru64 UNIX version 5.1, the system supports the concept of Resource Affinity Domains (RADs) and CPU sets. CPU set utility functions are cpuaddset, cpuandset, cpucopyset, cpucountset, cpudelset, cpudiffset, cpuemptyset, cpufillset, cpuisemptyset, cpuismember, cpuorset, cpusetcreate, cpusetdestroy and cpuxorset. The CPUs in a CPU set can be enumerated using the cpu_foreach function. The current CPU can be retrieved using the cpu_get_current function. System CPU information can be retrieved using the cpu_get_info, cpu_get_num and cpu_get_max functions.

The function cpu_get_rad can be used to get the RAD associated with a given CPU. Topology information is available though the nloc function. The CPU set of a given RAD can be determined using the rad_get_cpus function. RAD memory information can be retrieved using the rad_get_freemem and rad_get_physmem functions. The online/offline state of a RAD can be queried using the rad_get_state function. Basic system RAD information can be retrieved using the rad_get_num and rad_get_max functions. RAD set utility functions are radaddset, radandset, radcopyset, radcountset, raddelset, raddiffset, rademptyset, radfillset, radisemptyset, radismember, radorset, radsetcreate, radsetdestroy and radxorset.

Advisory Processor Association

A process can be associated with a given RAD using the rad_attach_pid function. A thread can be associated with a given RAD using the pthread_rad_attach function.

Processor Binding

Process CPU binding is supported on Tru64 UNIX using the bind_to_cpu and bind_to_cpu_id functions. A thread can be bound to a specific processor using the pthread_use_only_cpu function.

A process can be bound to a given RAD using the rad_bind_pid function. A thread can be bound to a given RAD using the pthread_rad_bind function. The current RAD can be determined using the rad_get_current_home function.

The function nfork can be used to copy the calling process or thread and assign the child process to a different RAD. Also, the rad_fork function can be used to fork the current process or thread specifying the RAD of the child process.

Tru64 UNIX supports the concept of a NUMA Scheduling Group which allows processes to be bound to specific system nodes. A scheduling group can be created or accessed using the nsg_init function. The status of a scheduling group is queried using the nsg_get function. Processes are attached to a given scheduling group using the nsg_attach_pid function. A list of configured NUMA scheduling groups can be retrieved using the nsg_get_nsgs function. A list of processes attached to a given scheduling group can be retrieved using the nsg_get_pids function. The owner and permissions of a scheduling group can be set using the nsg_set function. Threads can be attached to a scheduling group using the pthread_nsg_attach function and detached using the pthread_nsg_detach function. A list of threads attached to a given scheduling group can be retrieved using the pthread_nsg_get function. Process and RAD binding information for a process can be retrieved using the numa_query_pid function.

Global Processor Sets

Processor sets are created using the create_pset function. Processors are assigned to processor sets using the assign_cpu_to_pset function. Processes are assigned to a given processor set using the assign_pid_to_pset function.

Memory Migration and Placement

The memory allocation policy for a given memory region can be queried using the memalloc_attr function. The system can be advised as to the expected access pattern of a given memory region using the function nmadvise. The nmmap function allows an open file to be mapped into the process's address space specifying the memory allocation policy. The nshmget function allows a shared memory region to be created or accessed while specifying the memory allocation policy.

Linux

Basic System Information

Most Linux systems support the retrieval of basic system processor information using the sysconf function with the _SC_NPROCESSORS_CONF and _SC_NPROCESSORS_ONLN flags.

NUMA System Information

Linux 2.5.46 and later systems provide topology information though the sysfs file system namespace (sysfs was also known as driverfs in some earlier releases). Nodes are exported in the class/node/devices directory such that, for example, the directory class/node/devices/0 -> ../../../root/sys/node0 has a file named cpumap which contains a bitmap of CPUs in a given node. Also class/cpu/devices contains symlinks to CPUs, for example, class/cpu/devices/0 -> ../../../root/sys/cpu0. Memory block topology is also represented under the class/memblk/devices directory tree such that, for example, class/memblk/devices/0 -> ../../../root/sys/memblk0 and root/sys/node0/memblk0 -> ../../memblk0 exist. The getmntent function can be used to locate the mount point for sysfs.

SGI cpumemsets

SGI provides Linux based systems which have been extended to support the cpumemsets API. The system uses CPU maps to map system CPU identifiers to process CPU identifiers. The CPU map for a given process can be retrieved using the cmsQueryCMM function and set using the cmsSetCMM function. The function cpu2node is used to determine the node containing a given CPU. Basic system configuration information is provided by the numnodes, numcpus and lubcpunum functions. Basic system configuration information is provided by the numnodes, numcpus and lubcpunum functions. The functions nodebind, cpu2node, numnodes, numcpus, and lubcpunum depend on the existence of a valid /var/cpuset/cpu-node-map file which should be created a boot time by a system initialization script.

Processor Binding

Linux 2.5.8 and later systems support the sched_getaffinity and sched_setaffinity functions which can be used to get and set the processor affinity of a given process. The processor affinity is a represented as a list of processors on which a given process or thread is allowed to run.

LinuxThreads

On systems with the LinuxThreads pthread implementation, each thread is given its own PID and thus can be controlled in the same manner as any other system process. The thread debugging interface provides the necessary functions to map a thread handle into a system PID and a system PID into a thread handle using the functions td_thr_get_info and td_ta_map_lwp2thr respectively (the thread debugging library was designed to emulate the thread debugging library on Solaris). In the LinuxThreads implementation the LWPID of type lwpid_t maps to the system PID of type pid_t which instantiates the thread.

NPTL

On systems using the NPTL (Native POSIX Thread Library) pthread implementation, the function pthread_setaffinity_np or pthread_attr_setaffinity_np can be used to bind a thread to a set of processors. The function pthread_getaffinity_np or pthread_attr_setaffinity_np can be used to retrieve the given processor set bindings. The thread debugging library can be used with the NPTL implementation as with the LinuxThreads implementation if necessary.

SGI cpumemsets

Each process has a cpumemset which contains a list of processors on which the given process can run and a list of nodes on which a given set of CPUs can allocate memory. Different areas in the process's address space can also be assigned a cpumemset to control memory allocation. CPU maps and cpumemsets returned by the query functions must be freed using the cmsFreeCMM and cmsFreeCMS functions respectively. The cmsGetCpu function returns the current CPU. Several utility binding functions are also provided: runon, cpubind and nodebind.

RTLinux

RTLinux, a Linux based system produced by FSMLabs, supports binding threads to a given processor using the pthread_attr_setcpu_np function and querying the CPU binding using the pthread_attr_getcpu_np function. The current CPU can be determined by using the rtl_getcpuid function.

Memory Migration and Placement

SGI cpumemsets

Using the SGI cpumemsets API it is possible to assign different cpumemset to different regions of a process's address space. This allows an application to designate different topological policies for different address-space regions.

QNX

Basic System Information

On a QNX system, information regarding the system topology is accessed using the _syspage_ptr global structure instance accessed using the SYSPAGE_ENTRY(entry) macro.

Processor Binding

Processor affinity can be set using the _NTO_TCTL_RUNMASK flag with the ThreadCtl or ThreadCtl_r function.

BSD Derivatives

Basic System Information

BSD Derivatives include FreeBSD, OpenBSD, NetBSD and MacOS X (Darwin). Newer BSD derivative systems support the _SC_NPROCESSORS_CONF and _SC_NPROCESSORS_ONLN flags to the sysconf function. The number of system processors can also be obtained using the CTL_HW and HW_NCPU flags with the sysctl function.

Chapter 3. Acknowledgments

  • Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

  • Solaris is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries.

  • HP-UX is a registered trademark of Hewlett-Packard Company Corporation in the United States and other countries.

  • AIX is a registered trademark of International Business Machines Corporation in the United States and other countries.

  • IRIX is a registered trademark of Silicon Graphics, Inc. in the United States and other countries.

  • Tru64 is a registered trademark of Compaq Computer Corporation in the United States and other countries.

  • UNIX is a registered trademark of The Open Group in the United States and other countries.

  • Linux is a registered trademark of Linus Torvalds in the United States and other countries.

  • RTLinux is a registered trademark of Finite State Machine Labs, Inc.

  • QNX is a registered trademark of QNX Software Systems, Ltd.

  • BSD is a registered trademark of Berkeley Software Design, Inc.

  • FreeBSD is a registered trademark of Wind River Systems, Inc.

  • NetBSD is a registered trademark of the NetBSD Foundation.

  • MacOS and Darwin are registered trademarks of Apple Computer, Inc.

  • All other trademarks and product names are the property of their respective owners.

References

Official Documentation

[MSDN] “Processes and Threads”. Microsoft Developer Network. Microsoft Corporation.

[SolMan] “System Calls”. Solaris 9 Reference Manual Collection. Sun Microsystems, Inc.

[SolMulPage] “Supporting Multiple Page Sizes in the Solaris Operating System”. Richard McDougall. Sun BluePrints Online. Sun Microsystems, Inc. March 2004.

[SolPSet] “Solaris Processor Sets Made Easy”. Dr. Matthias Laux. Solaris Technical Articles and Tips. Sun Microsystems, Inc. June 2001.

[HPUXMan] “System Calls”. HP-UX Reference. Hewlett-Packard Development Company.

[AIXManK] “Kernel and Subsystems”. AIX Technical Reference. IBM, Inc.

[AIXMan] “General Programming Concepts”. Writing and Debugging Programs. AIX Technical Reference. IBM, Inc.

[AIXRA] “How to Control Resource Affinity on Multiple MCM or SCM pSeries Architecture in an HPC Environment”. Pascal Vezolle, Francois Thomas, and Jean-Armand Broyelle. Redbooks Paper. IBM, Inc. 2004.

[AIXPage] “AIX Support for Large Pages”. Michael Mall. AIX Whitepapers. IBM, Inc. April 2002.

[IRIXMan] “System Calls”. IRIX Manual Pages. SGI, Inc.

[Tru64Man] Tru64 UNIX Reference Pages. Compaq Corporation.

Additional Sources

[NTDLL] Undocumented functions of NTDLL. Tomasz Nowak.

[NTIFS] ntifs.h. Open source version. Bo Brantn.

Index

A

assign_cpu_to_pset, Global Processor Sets
assign_pid_to_pset, Global Processor Sets

B

bindprocessor, Processor Binding
bind_to_cpu, Processor Binding
bind_to_cpu_id, Processor Binding

G

getcpuid, NUMA System Information
GetCurrentProcessorNumber, NUMA System Information
gethomelgroup, NUMA System Information
GetLogicalProcessorInformation, NUMA System Information
getmntent, NUMA System Information, NUMA System Information
GetNumaAvailableMemoryNode, NUMA System Information
GetNumaHighestNodeNumber, NUMA System Information
GetNumaNodeProcessorMask, NUMA System Information
GetNumaProcessorNode, NUMA System Information
getpagesize, Basic System Information
getpagesizes, Basic System Information
GetProcessAffinityMask, Processor Binding
getsysinfo, Basic System Information
GetSystemInfo, Microsoft Windows

I

IPC_MEM_FIRST_TOUCH, Memory Migration and Placement
IPC_MEM_INTERLEAVED, Memory Migration and Placement
IPC_MEM_LOCAL, Memory Migration and Placement

M

madvise, Memory Migration and Placement
MADV_ACCESS_DEFAULT, Memory Migration and Placement
MADV_ACCESS_LWP, Memory Migration and Placement
MADV_ACCESS_MANY, Memory Migration and Placement
MAP_MEM_FIRST_TOUCH, Memory Migration and Placement
MAP_MEM_INTERLEAVED, Memory Migration and Placement
MAP_MEM_LOCAL, Memory Migration and Placement
MC_HAT_ADVISE, Page Size Manipulation
memalloc_attr, Memory Migration and Placement
memcntl, Page Size Manipulation
meminfo, Memory Migration and Placement
memory page size, Program Optimizations
migr_policy_args_init, Memory Allocation and Placement
migr_range_migrate, Memory Allocation and Placement
mldset_create, Memory Allocation and Placement
mldset_place, Memory Allocation and Placement
mld_create, Memory Allocation and Placement
mmap, Memory Migration and Placement
mpctl, Basic System Information
MPC_GETCURRENTSPU, NUMA System Information
MPC_GETFIRSTLDOM, NUMA System Information
MPC_GETFIRSTLDOM_SYS, NUMA System Information
MPC_GETFIRSTSPU, Basic System Information
MPC_GETFIRSTSPU_SYS, Basic System Information
MPC_GETNEXTLDOM, NUMA System Information
MPC_GETNEXTLDOM_SYS, NUMA System Information
MPC_GETNEXTSPU, Basic System Information
MPC_GETNEXTSPU_SYS, Basic System Information
MPC_GETNUMLDOMS, NUMA System Information
MPC_GETNUMLDOMS_SYS, NUMA System Information
MPC_GETNUMSPUS, Basic System Information
MPC_GETNUMSPUS_SYS, Basic System Information
MPC_LDOMSPUS, NUMA System Information
MPC_LDOMSPUS_SYS, NUMA System Information
MPC_SETLDOM, Processor Binding
MPC_SETPROCESS, Advisory Processor Association
MPC_SETPROCESS_FORCE, Processor Binding
MPC_SPUTOLDOM, NUMA System Information
MPIO, Program Optimizations
MP_ISOLATE, System Control
MP_MUSTRUN, Processor Binding
MP_MUSTRUN_PID, Processor Binding
MP_NAPROCS, Basic System Information
MP_NONPREEMPTIVE, System Control
MP_NPROCS, Basic System Information
MP_STAT, Basic System Information
MP_WARDRTC, System Control
multipath I/O, Program Optimizations

O

optimization, Program Optimizations

P

page size, Program Optimizations
pm_attach, Memory Allocation and Placement
pm_create, Memory Allocation and Placement
pm_getall, Memory Allocation and Placement
pm_getdefault, Basic System Information
pm_getstat, Memory Allocation and Placement
pm_setdefault, Memory Allocation and Placement
processor_bind, Processor Binding
processor_info, Basic System Information
process_cpulink, Processor Binding
process_mldlink, Advisory Processor Association, Memory Allocation and Placement
pset_assign, Global Processor Sets, Global Processor Sets
pset_bind, Global Processor Sets, Global Processor Sets
pset_create, Global Processor Sets, Global Processor Sets
pset_ctl, Global Processor Sets
pset_destroy, Global Processor Sets, Global Processor Sets
pset_getattr, Global Processor Sets, Global Processor Sets
pset_getloadavg, Global Processor Sets
pset_info, Global Processor Sets
pset_list, Global Processor Sets
pset_setattr, Global Processor Sets, Global Processor Sets
pthdb_pthread_tid, Processor Binding
pthdb_tid_pthread, Processor Binding
pthread_attr_getcpu_np, RTLinux
pthread_attr_setaffinity_np, NPTL
pthread_attr_setcpu_np, RTLinux
PTHREAD_BIND_ADVISORY_NP, Advisory Processor Association
PTHREAD_BIND_FORCED_NP, Processor Binding
pthread_getaffinity_np, NPTL
PTHREAD_GETFIRSTLDOM_NP, NUMA System Information
PTHREAD_GETFIRSTSPU_NP, Basic System Information
PTHREAD_GETNEXTLDOM_NP, NUMA System Information
PTHREAD_GETNEXTSPU_NP, Basic System Information
pthread_ldom_bind_np, Processor Binding
pthread_ldom_id_np, NUMA System Information
pthread_nsg_attach, Processor Binding
pthread_nsg_detach, Processor Binding
pthread_nsg_get, Processor Binding
pthread_num_ldoms_np, NUMA System Information
pthread_num_processors_np, Basic System Information
pthread_processor_bind_np, Processor Binding
pthread_pset_bind_np, Global Processor Sets
pthread_rad_attach, Advisory Processor Association
pthread_rad_bind, Processor Binding
PTHREAD_SCOPE_BOUND_NP, Processor Binding
PTHREAD_SCOPE_SYSTEM, Processor Binding
pthread_setaffinity_np, NPTL
pthread_setrunon_np, Processor Binding
pthread_use_only_cpu, Processor Binding
p_online, System Control

R

radaddset, NUMA System Information
radandset, NUMA System Information
radcopyset, NUMA System Information
radcountset, NUMA System Information
raddelset, NUMA System Information
raddiffset, NUMA System Information
rademptyset, NUMA System Information
radfillset, NUMA System Information
radisemptyset, NUMA System Information
radismember, NUMA System Information
radorset, NUMA System Information
radsetcreate, NUMA System Information
radsetdestroy, NUMA System Information
radxorset, NUMA System Information
rad_attach_pid, Advisory Processor Association
rad_bind_pid, Processor Binding
rad_fork, Processor Binding
rad_get_cpus, NUMA System Information
rad_get_current_home, Processor Binding
rad_get_freemem, NUMA System Information
rad_get_max, NUMA System Information
rad_get_num, NUMA System Information
rad_get_physmem, NUMA System Information
rad_get_state, NUMA System Information
ra_attachrset, Processor Binding
ra_exec, Processor Binding
ra_fork, Processor Binding
RQMODE_ADVISORY, Advisory Processor Association
rs_alloc, Global Processor Sets
rs_getassociativity, NUMA System Information
rs_getinfo, NUMA System Information
rs_getnamedrset, Global Processor Sets
rs_getrad, Global Processor Sets
rs_init, Global Processor Sets
rs_numrads, NUMA System Information
rs_op, Global Processor Sets
rs_setnameattr, Global Processor Sets
rs_setpartition, Processor Binding
rtl_getcpuid, RTLinux
runon, SGI cpumemsets

T

td_ta_map_lwp2thr, LinuxThreads
td_thr_get_info, LinuxThreads
ThreadBasicInformation, Processor Binding
ThreadCtl, Processor Binding
ThreadCtl_r, Processor Binding
THREAD_BASIC_INFORMATION, Processor Binding
thread_self, Processor Binding

W

Windows, Microsoft Windows

Download this document in other formats...