linux-2.6

Archived

Author	SHA1	Message	Date
Anton Blanchard	9f14c42d75	powerpc: Randomise mmap start address Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks. Until ppc32 uses the mmap.c functionality, this is ppc64 specific. Before: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 After: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f718b000-f7b8c000 rw-p f718b000 00:00 0 f7551000-f7f52000 rw-p f7551000 00:00 0 f6ee7000-f78e8000 rw-p f6ee7000 00:00 0 f74d4000-f7ed5000 rw-p f74d4000 00:00 0 f6e9d000-f789e000 rw-p f6e9d000 00:00 0 Similar for 64bit, but with 1GB of scatter: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0 fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0 fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0 fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0 fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:07 +11:00
Anton Blanchard	13a2cb3694	powerpc: Rearrange mmap.c Rearrange mmap.c to better match the x86 version. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:06 +11:00
Nathan Fontenot	0f16ef7fd3	powerpc/numa: Cleanup hot_add_scn_to_nid This patch reworks the hot_add_scn_to_nid and its supporting functions to make them easier to understand. There are no functional changes in this patch and has been tested on machine with memory represented in the device tree as memory nodes and in the ibm,dynamic-memory property. My previous patch that introduced support for hotplug memory add on systems whose memory was represented by the ibm,dynamic-memory property of the device tree only left the code more unintelligible. This will hopefully makes things easier to understand. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Anton Blanchard	13870b6575	powerpc/mm: Reduce hashtable size when using 64kB pages At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt	3b7faeb49e	Merge commit 'kumar/next' into next	2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt	82a0a1cc8f	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/pgtable-ppc32.h	2009-02-18 13:19:25 +11:00
Dave Hansen	06eccea6c3	powerpc/mm: Fix numa reserve bootmem page selection Fix the powerpc NUMA reserve bootmem page selection logic. commit `8f64e1f2d1` (powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes) changed the logic for how the powerpc LMB reserved regions were converted to bootmen reserved regions. As the folowing discussion reports, the new logic was not correct. mark_reserved_regions_for_nid() goes through each LMB on the system that specifies a reserved area. It searches for active regions that intersect with that LMB and are on the specified node. It attempts to bootmem-reserve only the area where the active region and the reserved LMB intersect. We can not reserve things on other nodes as they may not have bootmem structures allocated, yet. We base the size of the bootmem reservation on two possible things. Normally, we just make the reservation start and stop exactly at the start and end of the LMB. However, the LMB reservations are not aware of NUMA nodes and on occasion a single LMB may cross into several adjacent active regions. Those may even be on different NUMA nodes and will require separate calls to the bootmem reserve functions. So, the bootmem reservation must be trimmed to fit inside the current active region. That's all fine and dandy, but we trim the reservation in a page-aligned fashion. That's bad because we start the reservation at a non-page-aligned address: physbase. The reservation may only span 2 bytes, but that those bytes may span two pfns and cause a reserve_size of 2PAGE_SIZE. Take the case where you reserve 0x2 bytes at 0x0fff and where the active region ends at 0x1000. You'll jump into that if() statment, but node_ar.end_pfn=0x1 and start_pfn=0x0. You'll end up with a reserve_size=0x1000, and then call reserve_bootmem_node(node, physbase=0xfff, size=0x1000); 0x1000 may not be on the same node as 0xfff. Oops. In almost all the vm code, end_<anything> is not inclusive. If you have an end_pfn of 0x1234, page 0x1234 is not included in the range. Using PFN_UP instead of the (>> >> PAGE_SHIFT) will make this consistent with the other VM code. We also need to do math for the reserved size with physbase instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is precisely* the end of the node. However, (start_pfn << PAGE_SHIFT) is NOT precisely the beginning of the reserved area. That is, of course, physbase. If we don't use physbase here, the reserve_size can be made too large. From: Dave Hansen <dave@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-13 16:37:45 +11:00
Kumar Gala	96a8bac589	powerpc/fsl-booke: Fix compile warning arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem': arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:54:53 -06:00
Kumar Gala	d66c82ea45	powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines The Power ISA 2.06 added power of two page sizes to the embedded MMU architecture. Its done it such a way to be code compatiable with the existing HW. Made the minor code changes to support both power of two and power of four page sizes. Also added some new MAS bits and macros that are defined as part of the 2.06 ISA. Renamed some things to use the 'Book-3e' concept to convey the new MMU that is based on the Freescale Book-E MMU programming model. Note, its still invalid to try and use a page size that isn't supported by cpu. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:37:11 -06:00
Kumar Gala	f99fb8a2cb	powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW The following commit: commit `64b3d0e812` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Thu Dec 18 19:13:51 2008 +0000 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it out before we propogate it to the PPC HW PTE. Reported-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt	8d30c14cab	powerpc/mm: Rework I$/D$ coherency (v3) This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:10 +11:00
Anton Blanchard	91b0f5ec53	powerpc/mm: Move 64-bit unmapped_area to top of address space We currently place mmaps just below the stack on 32bit, but leave them in the middle of the address space on 64bit: 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep 10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep 10020000-10130000 rw-p 10020000 00:00 0 [heap] 40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so 40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so 40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so 400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so 40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so 40000220000-40008230000 rw-p 40000220000 00:00 0 fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack] Right now it isn't an issue, but at some stage we will run into mmap or hugetlb allocation issues. Using the same layout as 32bit gives us a some breathing room. This matches what x86-64 is doing too. 00100000-00103000 r-xp 00100000 00:00 0 [vdso] 10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test 10010000-10011000 r--p 00000000 08:06 554894 /tmp/test 10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test 10012000-10113000 rw-p 10012000 00:00 0 [heap] fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0 ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0 ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0 ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0 ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0 fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:07 +11:00
Milton Miller	8b16cd238d	powerpc/numa: Remove redundant find_cpu_node() Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in a less restrictive section (text vs cpuinit). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:59 +11:00
Milton Miller	20fcefe5a0	powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() find_min_common_depth() was checking the property length incorrectly. The value is in bytes not cells, and it is using the second entry. Signed-off-By: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt	edbc29d76d	Merge commit 'kumar/next' into next	2009-02-11 13:37:44 +11:00
Kumar Gala	6c24b17453	powerpc/fsl-booke: Fix mapping functions to use phys_addr_t Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t instead of unsigned long. In 36-bit physical mode we really need these functions to deal with phys_addr_t when trying to match a physical address or when returning one. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-09 21:11:55 -06:00
Trent Piepho	96051465fd	powerpc/fsl-booke: Make CAM entries used for lowmem configurable On booke processors, the code that maps low memory only uses up to three CAM entries, even though there are sixteen and nothing else uses them. Make this number configurable in the advanced options menu along with max low memory size. If one wants 1 GB of lowmem, then it's typically necessary to have four CAM entries. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:54 -06:00
Trent Piepho	c8f3570b7e	powerpc/fsl-booke: Allow larger CAM sizes than 256 MB The code that maps kernel low memory would only use page sizes up to 256 MB. On E500v2 pages up to 4 GB are supported. However, a page must be aligned to a multiple of the page's size. I.e. 256 MB pages must aligned to a 256 MB boundary. This was enforced by a requirement that the physical and virtual addresses of the start of lowmem be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow pages of that size isn't acceptable. To solve this, I simply have adjust_total_lowmem() take alignment into account when it decides what size pages to use. Give it PAGE_OFFSET = 0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map pages like this: PA 0x3000_0000 VA 0x7000_0000 Size 256 MB PA 0x4000_0000 VA 0x8000_0000 Size 1 GB PA 0x8000_0000 VA 0xC000_0000 Size 256 MB PA 0x9000_0000 VA 0xD000_0000 Size 256 MB PA 0xA000_0000 VA 0xE000_0000 Size 256 MB Because the lowmem mapping code now takes alignment into account, PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be possible. The lowmem code will work down to 4 kB but it's possible some of the boot code will fail before then. Poor alignment will force small pages to be used, which combined with the limited number of TLB1 pages available, will result in very little memory getting mapped. So alignments less than 64 MB probably aren't very useful anyway. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:53 -06:00
Trent Piepho	f88747e7f6	powerpc/fsl-booke: Remove code duplication in lowmem mapping The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The size of each is stored in three globals named __cam0, __cam1, and __cam2. All the code that uses them is duplicated three times for each of the three variables. We have these things called arrays and loops.... Once converted to use an array, it will be easier to make the number of CAMs configurable. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:51 -06:00
Gerhard Pircher	4c456a67f5	powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code _PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL. Thus it has to be masked out, if the BAT mapping should be non cacheable or CPU_FTR_NEED_COHERENT is not set. This will work on normal SMP setups because we force-set CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP. Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00
Dave Kleikamp	9ba0fdbfae	powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices The subpage_prot syscall fails on second and subsequent calls for a given region, because is_hugepage_only_range() is mis-identifying the 4 kB slices when the process has a 64 kB page size. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-16 16:15:16 +11:00
Ingo Molnar	fe333321e2	powerpc: Change u64/s64 to a long long integer type Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Benjamin Herrenschmidt	30aae739a9	Merge commit 'kumar/kumar-next' into next	2009-01-13 13:59:03 +11:00
Anton Vorontsov	7021d86afa	powerpc/mm: Make clear_fixmap() actually work The clear_fixmap() routine issues map_page() with flags set to 0. Currently this causes a BUG_ON() inside the map_page(), as it assumes that a PTE should be clear before mapping. This patch makes the map_page() to trigger the BUG_ON() only if the flags were set. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Benjamin Herrenschmidt	4a0826824b	powerpc: Fix missing semicolons in mmu_decl.h This is a brown paper bag from one of my earlier patches that breaks build on 40x and 8xx. And yes, I've now added 40x and 8xx to my list of test configs :-) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Dave Liu	d6a09e0cd6	powerpc: Remove the redundant _tlbil_pid at SMP case Signed-off-by: Dave Liu <daveliu@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:13 +11:00
Dave Hansen	893473df78	powerpc/mm: Cleanup careful_allocation(): consolidate memset() Both users of careful_allocation() immediately memset() the result. So, just do it in one place. Also give careful_allocation() a 'z' prefix to bring it in line with kzmalloc() and friends. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:09 +11:00
Dave Hansen	0be210fd66	powerpc/mm: Make careful_allocation() return virtual addrs Since we memset() the result in both of the uses here, just make careful_alloc() return a virtual address. Also, add a separate variable to store the physial address that comes back from the lmb_alloc() functions. This makes it less likely that someone will screw it up forgetting to convert before returning since the vaddr is always in a void* and the paddr is always in an unsigned long. I admit this is arbitrary since one of its users needs a paddr and one a vaddr, but it does remove a good number of casts. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	5d21ea2b0e	powerpc/mm:: Cleanup careful_allocation(): bootmem already panics If we fail a bootmem allocation, the bootmem code itself panics. No need to redo it here. Also change the wording of the other panic. We don't strictly have to allocate memory on the specified node. It is just a hint and that node may not even have any memory on it. In that case we can and do fall back to other nodes. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	c555e520ef	powerpc/mm: Add better comment on careful_allocation() The behavior in careful_allocation() really confused me at first. Add a comment to hopefully make it easier on the next doofus that looks at it. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Trent Piepho	6fd8be4bf7	powerpc/fsl-booke: Remove num_tlbcam_entries This is a global variable defined in fsl_booke_mmu.c with a value that gets initialized in assembly code in head_fsl_booke.S. It's never used. If some code ever does want to know the number of entries in TLB1, then "numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a global initialized during kernel boot from assembly. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:07 -06:00
Trent Piepho	19f5465e82	powerpc/fsl-booke: Don't hard-code size of struct tlbcam Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam to 20 when it indexed the TLBCAM table. Anyone changing the size of struct tlbcam would not know to expect that. The kernel already has a system to get the size of C structures into assembly language files, asm-offsets, so let's use it. The definition of the struct gets moved to a header, so that asm-offsets.c can include it. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:06 -06:00
Gary Hade	c04fc586c1	mm: show node to memory section relationship with symlinks in sysfs Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:00 -08:00
Mel Gorman	3340289ddf	mm: report the MMU pagesize in /proc/pid/smaps The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:58:58 -08:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
Ilya Yanok	ca9153a3a2	powerpc/44x: Support 16K/64K base page sizes on 44x This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-29 09:53:25 +11:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Dale Farnsworth	ccdcef72c2	powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Anton Vorontsov	01695a9687	powerpc/32: Allow __ioremap on RAM addresses for kdump kernel While for debugging it is good to catch bogus users of ioremap, though for kdump support it is more convenient to use __ioremap for copy_oldmem_page() (exactly as we do for PPC64 currently). Note that copy_oldmem_page() calls __ioremap with flags set to '0', so it should be safe with the regard to the caches. The other option is to use kmap_atomic_pfn()[1], but it will not work for kernels compiled without HIGHMEM. That is, on a board with 256MB RAM and crashkernel=64M@32M case, the !HIGHMEM capturing kernel maps 0-96M range, which does not include all the memory needed to capture the dump. And, obviously, accessing anything upper than 96M will cause faults. [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Benjamin Herrenschmidt	a14953597b	powerpc: Fix missing 'blr' in _tlbia() Rework to MMU code dropped a much missed 'blr' instruction. Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2008-12-21 02:54:25 -07:00
Benjamin Herrenschmidt	64b3d0e812	powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7752035180	powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	760ec0e02d	powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 The handlers for Critical, Machine Check or Debug interrupts will save and restore MMUCR nowadays, thus we only need to disable normal interrupts when invalidating TLB entries. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2a4aca1144	powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	f048aace29	powerpc/mm: Add SMP support to no-hash TLB handling This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7c03d653cd	powerpc/mm: Introduce MMU features We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2ca8cf7389	powerpc/mm: Rework context management for CPUs with no hash table This reworks the context management code used by 4xx,8xx and freescale BookE. It adds support for SMP by implementing a concept of stale context map to lazily flush the TLB on processors where a context may have been invalidated. This also contains the ground work for generalizing such lazy TLB flushing by just picking up a new PID and marking the old one stale. This will be implemented later. This is a first implementation that uses a global spinlock. Ideally, we should try to get at least the fast path (context ID already assigned) lockless or limited to a per context lock, but for now this will do. I tried to keep the UP case reasonably simple to avoid adding too much overhead to 8xx which does a lot of context stealing since it effectively has only 16 PIDs available. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	5e696617c4	powerpc/mm: Split mmu_context handling This splits the mmu_context handling between 32-bit hash based processors, 64-bit hash based processors and everybody else. This is preliminary work for adding SMP support for BookE processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	f63837f058	powerpc/mm: Remove flush_HPTE() The function flush_HPTE() is used in only one place, the implementation of DEBUG_PAGEALLOC on ppc32. It's actually a dup of flush_tlb_page() though it's -slightly- more efficient on hash based processors. We remove it and replace it by a direct call to the hash flush code on those processors and to flush_tlb_page() for everybody else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:34 +11:00
Benjamin Herrenschmidt	e41e811a79	powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c This renames the files to clarify the fact that they are used by the hash based family of CPUs (the 603 being an exception in that family but is still handled by that code). This paves the way for the new tlb_nohash.c coming via a subsequent commit. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:30 +11:00

1 2 3 4 5 ...

460 commits