d into the kernel the OS can address a maximum of 4GB of RAM.
With 2.4 kernels (with a large memory configuration) a single process can
address up to the total amount of RAM in the machine minus 1GB
(reserved for the kernel), to a maximum 3GB.
By default the kernel reserves 1GB for it's own use, however I think that
this is a tunable parameter so if we have 4GB of RAM in a box we can tune it
so that most of that should be available to the processes (?).
Some of the URL's whence I dragged this infomation from:
http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html
http://www.prism.gatech.edu/~gte213x/LinuxMM/rpt.html
http://www.cyberport.com/~tangent/ix/pae.html
http://uwsg.iu.edu/hypermail/linux/kernel/0005.3/0835.html
http://uwsg.iu.edu/hypermail/linux/kernel/0105.1/1169.html
Responses from my post to the Linux Kernel mailing list:
From: Albert D. Cahalan
Subject: Re: What is the truth about Linux 2.4's RAM limitations?
> we had problems with processes dying right around 2.3GB (according to top).
Out of 3 GB, you had 2.3 GB used and 0.7 GB of tiny chunks of
memory that were smaller than what you tried to allocate.
> * What is the maximum amount of RAM that a *single* process can address
> under a 2.4 kernel, with PAE enabled? Without?
Just the same: 3 GB.
> * What (if any) parameters can effect this (recompiling the app etc)?
There is a kernel patch that will get you to 2.0 or 3.5 GB.
The limit is 4 GB minus a power of two big enough for the kernel.
> Linux 2.4 does support greater then 4GB of RAM with these caveats ...
>
> * It does this by supporting Intel's PAE (Physical Address Extension)
> features which are in all Pentium Pro and newer CPU's.
> * The PAE extensions allow up to a maximum of 64GB of RAM that the OS
> (not a process) can address.
> * It does this via indirect pointers to the higher memory locations, so
> there is a CPU and RAM hit for using this.
Sort of. It maps and unmaps memory to access it.
You suffer this with the 4 GB option as well.
> * Benchmarks seem to indicated around 3-6% CPU hit just for using the PAE
> extensions (ie. it applies regardless of whether you are actually
> accessing memory locations greater then 4GB).
> * If the kernel is compiled to use PAE, Linux will not boot on a computer
> whose hardware doesn't support PAE.
> * PAE does not increase Linux's ability for *single* processes to see
> greater then 3GB of RAM (see below).
>
> So what are the limits without using PAE? Here I'm still having a little
> problem finding definitive answers but ...
>
> * Without PAE compiled into the kernel the OS can address a maximum of 4GB
> of RAM.
The 4 GB limit is really less, depending on your hardware and BIOS.
Your BIOS will create a memory hole below 4 GB large enough for all
your PCI devices. This hole might be 1 or 2 GB.
> * With 2.4 kernels (with a large memory configuration) a single process
> can address up to the total amount of RAM in the machine minus 1GB
> (reserved for the kernel), to a maximum 3GB.
> * By default the kernel reserves 1GB for it's own use, however I think
> that this is a tunable parameter so if we have 4GB of RAM in a box we
> can tune it so that most of that should be available to the processes (?).
Yes. Then you suffer more map/unmap overhead.
From: Jonathan Lundell
Subject: Re: What is the truth about Linux 2.4's RAM limitations?
At 1:01 PM -0700 2001-07-09, Adam Shand wrote:
>So what are the limits without using PAE? Here I'm still having a little
>problem finding definitive answers but ...
>
> * With 2.4 kernels (with a large memory configuration) a single process
> can address up to the total amount of RAM in the machine minus 1GB
> (reserved for the kernel), to a maximum 3GB.
> * By default the kernel reserves 1GB for it's own use, however I think
> that this is a tunable parameter so if we have 4GB of RAM in a box we
> can tune it so that most of that should be available to the processes (?).
include/asm-i386/page.h has the key to this partitioning:
/*
* This handles the memory map. We could make this a config
* option, but too many people screw it up, and too few need it.
*
* A __PAGE_OFFSET of 0xC0000000 means that the kernel has
* a virtual address space of one gigabyte, which limits the
* amount of physical memory you can use to about 950MB.
*
* If you want more physical memory than this then see the CONFIG_HIGHMEM4G
* and CONFIG_HIGHMEM64G options in the kernel configuration.
*/
#define __PAGE_OFFSET (0xC0000000)
Whether you could simply bump __PAGE_OFFSET up to (say) 0xE0000000
and get 3.5GB of user-addressable memory I have no idea, but this is
where you'd have to start.
Also keep in mind the distinction between virtual and physical
addresses. A process has virtual addresses that must fit into 32
bits, so 4GB is the most that can be addressed without remapping part
of virtual space to some other physical space.
Also of interest is Chapter 3 of "IA-32 Intel Architecture Software
Developer's Manual Volume 3: System Programming Guide", which you can
find at http://developer.intel.com/design/PentiumIII/manuals/
Keep in mind that Linux uses the flat (unsegmented) model.
PAE extends physical addresses only (to 36 bits), and does nothing
for virtual space.
From: Andi Kleen
Subject: Re: What is the truth about Linux 2.4's RAM limitations?
> * What (if any) parameters can effect this (recompiling the app etc)?
The kernel parameter is a constant called __PAGE_OFFSET which you can set.
You also need to edit arch/i386/vmlinux.lds
The reason why your simulation stopped at 2.3GB is likely that the malloc
allocation hit the shared libraries (check with /proc/<pid>/maps). Ways
around that are telling malloc to use mmap more aggressively (see the
malloc documentation in info libc) or moving the shared libraries up
by changing a kernel constant called TASK_UNMAPPED_BASE.
-------------------------------------------------------------------------------
"Processor type and features", "High Memory Support" can be set to
one of the following: "off", "4GB", "64GB" which changes CONFIG_NOHIGHMEM.
Linux kernel configuration help for CONFIG_NOHIGHMEM in 2.4.4 is as follows:
CONFIG_NOHIGHMEM:
Linux can use up to 64 Gigabytes of physical memory on x86 systems.
However, the address space of 32-bit x86 processors is only 4
Gigabytes large. This means that, if you have a large amount of
physical memory, not all of it can be "permanently mapped" by the
kernel. The physical memory that's not permanently mapped is called
"high memory".
If you are compiling a kernel which will never run on a machine with
more than 1 Gigabyte total physical RAM, answer "off" here (default
choice and suitable for most users). This will result in a "3GB/1GB"
split: 3GB are mapped so that each process sees a 3GB virtual memory
space and the remaining part of the 4GB virtual memory space is used
by the kernel to permanently map as much physical memory as
possible.
If the machine has between 1 and 4 Gigabytes physical RAM, then
answer "4GB" here.
If more than 4 Gigabytes is used then answer "64GB" here. This
selection turns Intel PAE (Physical Address Extension) mode on.
PAE implements 3-level paging on IA32 processors. PAE is fully
supported by Linux, PAE mode is implemented on all recent Intel
processors (Pentium Pro and better). NOTE: If you say "64GB" here,
then the kernel will not boot on CPUs that don't support PAE!
The actual amount of total physical memory will either be
auto detected or can be forced by using a kernel command line option
such as "mem=256M". (Try "man bootparam" or see the documentation of
your boot loader (lilo or loadlin) about how to pass options to the
kernel at boot time.)
If unsure, say "off".
-------------------------------------------------------------------------------
Re: "3.5GB user address space" option.
From: Rik van Riel (riel@conectiva.com.br)
Date: Wed Oct 17 2001 - 17:27:26 EST
In reply to: Oleg A. Yurlov: ""3.5GB user address space" option."
On Thu, 18 Oct 2001, Oleg A. Yurlov wrote:
> How I can use 3.5GB in my apps ? I try malloc() and get
> error on 2G bounce... :-(
You may want to use the 'hoard' memory allocation library,
it seems a bit smarter than glibc's malloc in getting all
the address space your program wants.
1) install libhoard
2) export LD_PRELOAD=libhoard.so
3) run the program
cheers,
Rik
Re: "3.5GB user address space" option.
From: Andrea Arcangeli (andrea@suse.de)
Date: Thu Oct 18 2001 - 04:09:14 EST
In reply to: H. Peter Anvin: "Re: "3.5GB user address space" option."
Actually 3.5G per-process is theoretically possible using a careful
userspace as Rik suggested with -aa after enabling the proper
compile time configuration option. So for apps that needs say 3G
per-process it should work just fine. But of course for anything that
needs more than that 64-bit is the right way to go :)
Andrea
-------------------------------------------------------------------------------
On Firday Mar 22 2002, Wim Coekaerts <wim.coekaerts@oracle.com> wrote:
Linux with PAE support linked into the kernel provides a pagecache of up
to 64GB. It doesn't help alleviate in any way whatsoever the 32-bit
x86 architecture's 4GB process space limitation.
PAE support just means you can have 64GB worth of dirty pages in the page cache.
Oracle allocates a SHARED memory segment, so if you have a 2GB shared memory
segment, then that is only allocated ONCE in the page cache (NB: you share
those 2GB worth of pages) so you end up with:
63GB - 2GB = A mere 61GB left over for other private, per-process data.
If the per-process overhead (non shared) is 100MB/process, then that means
61GB / 100MB == 624 processes.
Remember that on NT, the Oracle instance is composed of threads which
share a single processes space. Processes on NT can't share memory,
at least not efficiently. The Oracle instance on NT exists as a
bunch of threads within a single process.
On Linux, Oracle allocates a shared memory segment and even though we run
separate processes, they all share the shared memory segment.
The VLM feature is an entirely different way of doing things.
With the VLM feature in Oracle release 9.2, we just mmap a
virtual file that lives in a virtual filesystem which can be larger than 4GB.
We still have the 4GB userprocess limit.
On NT the situation isn't exactly the same, but somewhat similar conceptually.
So don't confuse PAE support with the VLM feature.
Wim
-------------------------------------------------------------------------------
VLM (Very Large Memory) Support in Oracle 9i Release 2 for Linux.
EBC (Extended Buffer Cache) Support in Oracle 9i Release 2 for Linux.
TLA (Three Letter Acronym) Support in Oracle 9i Release 2 for Linux?
---------------------------------------------------------------------
Oracle9i can allocate and use more than 4 GB of memory for the database buffer
cache. This section describes the limitations and requirements of the extended
buffer cache support on Linux.
See Also: Oracle9i Database Concepts for more information on
the extended cache feature.
In-Memory File System
---------------------
To use the extended buffer cache support on Linux, an in-memory file
system must be mounted on the /dev/shm mount point. It must be equal in
size or larger than the amount of memory that you intend to use for the
database buffer cache.
For example, for Linux to create an 8 GB shmfs file system on the
/dev/shm mount point, enter the following command as the root user:
$ mount -t shm shmfs -o size=8g /dev/shm
When Oracle9i starts with the extended buffer cache feature enabled,
it creates a file in the /dev/shm directory that corresponds to the
Oracle buffer cache.
Note: If an in-memory file system is already mounted on the /dev/shm
mount point, ensure that it is equal to or larger than the amount
of memory that is used for the database buffer cache.
USE_INDIRECT_DATA_BUFFERS Parameter
-----------------------------------
To enable the extended buffer cache feature, set the USE_INDIRECT_DATA_BUFFERS
parameter to true in the initSID.ora file. Doing this allows Oracle9i to
specify a larger buffer cache.
Dynamic Cache Parameters
------------------------
Do not use any of the following dynamic cache parameters while
the extended cache feature is enabled:
DB_CACHE_SIZE
DB_2K_CACHE_SIZE
DB_4K_CACHE_SIZE
DB_8K_CACHE_SIZE
DB_16K_CACHE_SIZE
DB_32K_CACHE_SIZE
If the extended cache feature is enabled, use the DB_BLOCK_BUFFERS parameter
to specify the database cache size.
Limitations
-----------
The following limitations apply to the extended buffer cache feature on Linux:
- You cannot change the size of the buffer cache while the instance is running.
- You cannot create or use tablespaces with non-standard block sizes.
See Also: Oracle9i SQL Reference for information on the standard block
size used by the CREATE TABLE SPACE command.
VLM_WINDOW_SIZE environment variable:
-------------------------------------
If you have already "lowered" the SGA address according to the steps in the
Metalink document entitled "Lowering the Oracle SGA address on Linux",
then you can increase the setting for the VLM_WINDOW_SIZE.
Doing this may slightly increase performance under certain conditions
because a larger indirect window reduces the overhead of mapping an indirect
buffer into Oracle's address space.
To increase the indirect window size, set the environment
variable VLM_WINDOW_SIZE to the window size in bytes before
starting up the Oracle instance. For example:
export VLM_WINDOW_SIZE=1073741824
to set the indirect window size to 1GB. The default is 512MB.
Any value set should be a multiple of 64KB.
Note: It doesn't always help to increase the VLM_WINDOW_SIZE.
Keep in mind that increasing the VLM_WINDOW_SIZE reduces
the amount of SGA that can be allocated to other heavily-used
memory areas that might be needed (e.g. locks on RAC).
-------------------------------------------------------------------------------
Metalink document "Lowering the Oracle SGA address on Linux"
------------------------------------------------------------
Goal Outline:
To increase the size of the address space that Oracle can use for its
SGA on Linux in a 32-bit environment to allow for more database buffers or a
larger indirect data buffer window.
The current shipping version of Oracle is able to
use about (my note: 0xBF000000-0x50000000=0x6F000000)
1.7GB of address space for its SGA. To increase
this size, Oracle needs to be relinked with a lower
SGA base and Linux needs to have the mapped base lowered
for processes running Oracle.
Solution Description:
Currently, a solution exists only when running Oracle 9iR2 on
Red Hat 7.2 Advanced Server. Red Hat provides an adjustable parameter
mapped_base in the /proc filesystem under the process ID directory
to allow more useable address space in processes.
First, the SGA base address that Oracle uses must be lowered
by relinking Oracle. Currently, Oracle ships with this
base address set at 0x50000000 so that it is compatible with
the defaults set by most distributions of Linux. Lowering
this address allows Oracle to use more of the address space
in the process, but it is important to note that the newly relinked
Oracle binary will no longer work unless a corresponding
modification is also made to Linux (Red Hat 7.2 Advanced Server
provides a way to do this at runtime). Follow these steps to
complete the first part of the solution:
1. Shutdown all instances of Oracle
2. cd $ORACLE_HOME/lib
3. cp -a libserver9.a libserver9.a.org (to make a backup copy)
4. cd $ORACLE_HOME/bin
5. cp -a oracle oracle.org (to make a backup copy)
6. cd $ORACLE_HOME/rdbms/lib
7. genksms -s 0x15000000 >ksms.s (lower SGA base to 0x15000000)
8. make -f ins_rdbms.mk ksms.o (compile in new SGA base address)
9. make -f ins_rdbms.mk ioracle (relink)
The relinked Oracle binary now has a lower SGA base and
is now able to use (my note: 0xBF000000-0x15000000=0xAA000000)
about 2.65GB of address space if Linux is also modified to
support this.
Next, the Linux kernel's mapped base needs to be lowered below
Oracle's new SGA base. Red Hat 7.2 Advanced Server has a
parameter in /proc that lowers the kernel's mapped base for
each process. This parameter is not a system-wide parameter.
It is a per-process parameter, but it is inherited by
child processes. This parameter can only be modified by root.
The following steps document how to lower the mapped base
for 1 bash terminal session. Once this session has been
modified with the lower mapped base, this session (window)
will need to be used for all Oracle commands so that Oracle
processes use the inherited (lower) mapped base:
1. Shutdown the instance of Oracle.
2. Open a terminal session (Oracle session).
3. Open a second terminal session and su to root (root session).
4. Find out the process id for the Oracle session. For example,
do "echo $$" in the Oracle session.
5. Now lower the mapped base for the Oracle session to 0x10000000.
From the root session, echo 268435456 >/proc/<pid>/mapped_base,
where <pid> is the process id determined in step 4.
6. Increase the value of shmmax so that Oracle will allocate
the SGA in one segment. From the root session,
echo 3000000000 >/proc/sys/kernel/shmmax
7. From the Oracle terminal session, startup the Oracle instance.
The SGA now begins at a lower address, so more of the address
space can be used by Oracle.
Now you can increase the init.ora values of db_cache_size or
db_block_buffers to increase the size of the database buffer cache.
If you are running with the init.ora parameter
'use_indirect_data_buffers=true' and already have a large buffer
cache, you can use the above solution to increase the indirect
buffer window size. Doing this may slightly increase performance
under certain conditions because a larger indirect window reduces
the overhead of mapping an indirect buffer into Oracle's address
space. To increase the indirect window size, set the environment
variable VLM_WINDOW_SIZE to the window size in bytes before
starting up the Oracle instance. For example:
export VLM_WINDOW_SIZE=1073741824
to set the indirect window size to 1GB. The default is 512MB.
Any value set should be a multiple of 64KB.
Notes:
1. Increasing the buffer cache size (or the indirect window size)
too much can cause Oracle attach errors while starting up.
2. If you try to use an Oracle binary that has a lower SGA base
but did lower the /proc/<pid>/mapped_base value, you will
experience unpredictable results ranging from ORA-3113 errors,
attach errors, etc. while starting up.
3. If you don't increase the shmmax value, you could get
attach errors while starting up.
4. If you lower the SGA base and your SGA size is too small,
you may get attach errors.
5. It doesn't always help to increase VLM_WINDOW_SIZE. Also,
keep in mind that increasing VLM_WINDOW_SIZE reduces the
amount of SGA that can be allocated for other memory
areas that might be needed (e.g. locks on RAC).
6 If you get attach errors while starting up, you will probably
need to clean up the shared memory segments by running 'ipcs'
and then removing segments via 'ipcrm shm XXX' or 'ipcrm sem XXX'.
-------------------------------------------------------------------------------
@<Note to Author: DO NOT DELETE the following Disclaimer>
*************************************************************
This article is being delivered in Draft form and may contain
errors. Please use the MetaLink "Feedback" button to advise
Oracle of any issues related to this article.
*************************************************************
******** END of Note ********************************************************************