dect
/
linux-2.6
Archived
13
0
Fork 0
Commit Graph

640 Commits

Author SHA1 Message Date
Michael Hanselmann e01af0384f [PATCH] powermac: Combined fixes for backlight code
This patch fixes several problems:
- pmac_backlight_key() is called under interrupt context, and therefore
  can't use mutexes or semaphores, so defer the backlight level for
  later, as it's not critical (original code by Aristeu S. Rozanski F.
  <aris@valeta.org>).
- Add exports for functions that might be called from modules
- Fix Kconfig depdencies on PMAC_BACKLIGHT.
- Fix locking issues on calls from inside the driver (reported by
  Aristeu S. Rozanski F., too)
- Fix wrong calculation of backlight values in some of the drivers
- Replace pmac_backlight_key_up/down by inline functions

[akpm@osdl.org: fix function prototypes]
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Aristeu S. Rozanski F. <aris@valeta.org>
Acked-by: Rene Nussbaumer <linux-kernel@killerfox.forkbomb.ch>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:20 -07:00
Benjamin Herrenschmidt 6e99e45828 [PATCH] powerpc: fix trigger handling in the new irq code
This patch slightly reworks the new irq code to fix a small design error.  I
removed the passing of the trigger to the map() calls entirely, it was not a
good idea to have one call do two different things.  It also fixes a couple of
corner cases.

Mapping a linux virtual irq to a physical irq now does only that.  Setting the
trigger is a different action which has a different call.

The main changes are:

- I no longer call host->ops->map() for an already mapped irq, I just return
  the virtual number that was already mapped.  It was called before to give an
  opportunity to change the trigger, but that was causing issues as that could
  happen while the interrupt was in use by a device, and because of the
  trigger change, map would potentially muck around with things in a racy way.
   That was causing much burden on a given's controller implementation of
  map() to get it right.  This is much simpler now.  map() is only called on
  the initial mapping of an irq, meaning that you know that this irq is _not_
  being used.  You can initialize the hardware if you want (though you don't
  have to).

- Controllers that can handle different type of triggers (level/edge/etc...)
  now implement the standard irq_chip->set_type() call as defined by the
  generic code.  That means that you can use the standard set_irq_type() to
  configure an irq line manually if you wish or (though I don't like that
  interface), pass explicit trigger flags to request_irq() as defined by the
  generic kernel interfaces.  Also, using those interfaces guarantees that
  your controller set_type callback is called with the descriptor lock held,
  thus providing locking against activity on the same interrupt (including
  mask/unmask/etc...) automatically.  A result is that, for example, MPIC's
  own map() implementation calls irq_set_type(NONE) to configure the hardware
  to the default triggers.

- To allow the above, the irq_map array entry for the new mapped interrupt
  is now set before map() callback is called for the controller.

- The irq_create_of_mapping() (also used by irq_of_parse_and_map()) function
  for mapping interrupts from the device-tree now also call the separate
  set_irq_type(), and only does so if there is a change in the trigger type.

- While I was at it, I changed pci_read_irq_line() (which is the helper I
  would expect most archs to use in their pcibios_fixup() to get the PCI
  interrupt routing from the device tree) to also handle a fallback when the
  DT mapping fails consisting of reading the PCI_INTERRUPT_PIN to know wether
  the device has an interrupt at all, and the the PCI_INTERRUPT_LINE to get an
  interrupt number from the device.  That number is then mapped using the
  default controller, and the trigger is set to level low.  That default
  behaviour works for several platforms that don't have a proper interrupt
  tree like Pegasos.  If it doesn't work for your platform, then either
  provide a proper interrupt tree from the firmware so that fallback isn't
  needed, or don't call pci_read_irq_line()

- Add back a bit that got dropped by my main rework patch for properly
  clearing pending IPIs on pSeries when using a kexec

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:20 -07:00
Linus Torvalds e2a3d40258 power: improve inline asm memory constraints
Use "+m" rather than a combination of "=m" and "m" for improved
clarity and consistency.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-08 15:00:28 -07:00
Jeremy Kerr b5a1a9abe1 [POWERPC] Use const qualifiers for prom parsing utilites
The of_bus callbacks map and get_flags can be constified, as they don't
alter the range or addr arguments. of_dump_addr and of_read_addr can
also be constified.

Built for 32- and 64-bit powerpc

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-07 20:19:16 +10:00
Benjamin Herrenschmidt 26c5032eaa [POWERPC] Add briq support to CHRP
The support for Briq machines has been floating around as patches for
ages. This cleans it up and adds it once for all.

Some of this is based on initial code provided by Karsten Jeppesen
<karsten@jeppesens.com> and mostly rewritten from scratch by me.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-07 20:19:15 +10:00
David Woodhouse a8e0c51c71 [PATCH] powerpc: implement missing jiffies64_to_cputime64()
asm-powerpc/cputime.h doesn't declare jiffies64_to_cputime64() or
cputime64_sub(), and due to CONFIG_VIRT_CPU_ACCOUNTING it's not picking
up the definition from asm-generic like x86-64 & friends do.

Cc: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-05 09:39:54 -07:00
Linus Torvalds 6fa0cb1141 Merge git://git.infradead.org/hdrinstall-2.6
* git://git.infradead.org/hdrinstall-2.6:
  Remove export of include/linux/isdn/tpam.h
  Remove <linux/i2c-id.h> and <linux/i2c-algo-ite.h> from userspace export
  Restrict headers exported to userspace for SPARC and SPARC64
  Add empty Kbuild files for 'make headers_install' in remaining arches.
  Add Kbuild file for Alpha 'make headers_install'
  Add Kbuild file for SPARC 'make headers_install'
  Add Kbuild file for IA64 'make headers_install'
  Add Kbuild file for S390 'make headers_install'
  Add Kbuild file for i386 'make headers_install'
  Add Kbuild file for x86_64 'make headers_install'
  Add Kbuild file for PowerPC 'make headers_install'
  Add generic Kbuild files for 'make headers_install'
  Basic implementation of 'make headers_check'
  Basic implementation of 'make headers_install'
2006-07-04 12:55:45 -07:00
Linus Torvalds 912b2539e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: add defconfig for Freescale MPC8349E-mITX board
  powerpc: Add base support for the Freescale MPC8349E-mITX eval board
  Documentation: correct values in MPC8548E SEC example node
  [POWERPC] Actually copy over i8259.c to arch/ppc/syslib this time
  [POWERPC] Add new interrupt mapping core and change platforms to use it
  [POWERPC] Copy i8259 code back to arch/ppc
  [POWERPC] New device-tree interrupt parsing code
  [POWERPC] Use the genirq framework
  [PATCH] genirq: Allow fasteoi handler to retrigger disabled interrupts
  [POWERPC] Update the SWIM3 (powermac) floppy driver
  [POWERPC] Fix error handling in detecting legacy serial ports
  [POWERPC] Fix booting on Momentum "Apache" board (a Maple derivative)
  [POWERPC] Fix various offb and BootX-related issues
  [POWERPC] Add a default config for 32-bit CHRP machines
  [POWERPC] fix implicit declaration on cell.
  [POWERPC] change get_property to return void *
2006-07-03 15:28:34 -07:00
Ingo Molnar de30a2b355 [PATCH] lockdep: irqtrace subsystem, core
Accurate hard-IRQ-flags and softirq-flags state tracing.

This allows us to attach extra functionality to IRQ flags on/off
events (such as trace-on/off).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:03 -07:00
Ingo Molnar 61f4c3d6db [PATCH] lockdep: remove RWSEM_DEBUG remnants
RWSEM_DEBUG used to be a printk based 'tracing' facility, probably used for
very early prototypes of the rwsem code.  Remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:01 -07:00
Ingo Molnar a875a69f8b [PATCH] lockdep: add per_cpu_offset()
Add the per_cpu_offset() generic method. (used by the lock validator)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:00 -07:00
Benjamin Herrenschmidt 0ebfff1491 [POWERPC] Add new interrupt mapping core and change platforms to use it
This adds the new irq remapper core and removes the old one.  Because
there are some fundamental conflicts with the old code, like the value
of NO_IRQ which I'm now setting to 0 (as per discussions with Linus),
etc..., this commit also changes the relevant platform and driver code
over to use the new remapper (so as not to cause difficulties later
in bisecting).

This patch removes the old pre-parsing of the open firmware interrupt
tree along with all the bogus assumptions it made to try to renumber
interrupts according to the platform. This is all to be handled by the
new code now.

For the pSeries XICS interrupt controller, a single remapper host is
created for the whole machine regardless of how many interrupt
presentation and source controllers are found, and it's set to match
any device node that isn't a 8259.  That works fine on pSeries and
avoids having to deal with some of the complexities of split source
controllers vs. presentation controllers in the pSeries device trees.

The powerpc i8259 PIC driver now always requests the legacy interrupt
range. It also has the feature of being able to match any device node
(including NULL) if passed no device node as an input. That will help
porting over platforms with broken device-trees like Pegasos who don't
have a proper interrupt tree.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-03 21:36:01 +10:00
Benjamin Herrenschmidt cc9fd71c62 [POWERPC] New device-tree interrupt parsing code
Adds new routines to prom_parse to walk the device-tree for interrupt
information. This includes both direct mapping of interrupts and low
level parsing functions for use with partial trees.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-03 19:55:24 +10:00
Benjamin Herrenschmidt b9e5b4e6a9 [POWERPC] Use the genirq framework
This adapts the generic powerpc interrupt handling code, and all of
the platforms except for the embedded 6xx machines, to use the new
genirq framework.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-03 19:55:12 +10:00
Jeremy Kerr a1af5b2fd4 [POWERPC] change get_property to return void *
Change the get_property() function to return a void *. This allows us
to later remove the cast done in the majority of callers.

Built for pseries, iseries, pmac32, cell, cbesim, g5, systemsim, maple,
and mpc* defconfigs

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-03 08:56:33 +10:00
Thomas Gleixner 6714465e83 [PATCH] irq-flags: POWERPC: Use the new IRQF_ constants
Use the new IRQF_ constants and remove the SA_INTERRUPT define

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:47 -07:00
Adrian Bunk b3c2ffd534 typo fixes: mecanism -> mechanism
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:20:44 +02:00
Catherine Zhang 877ce7c1b3 [AF_UNIX]: Datagram getpeersec
This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:06 -07:00
Linus Torvalds 3aa590c6b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits)
  [POWERPC] Use little-endian bit from firmware ibm,pa-features property
  [POWERPC] Make sure smp_processor_id works very early in boot
  [POWERPC] U4 DART improvements
  [POWERPC] todc: add support for Time-Of-Day-Clock
  [POWERPC] Make lparcfg.c work when both iseries and pseries are selected
  [POWERPC] Fix idr locking in init_new_context
  [POWERPC] mpc7448hpc2 (taiga) board config file
  [POWERPC] Add tsi108 pci and platform device data register function
  [POWERPC] Add general support for mpc7448hpc2 (Taiga) platform
  [POWERPC] Correct the MAX_CONTEXT definition
  powerpc: minor cleanups for mpc86xx
  [POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set
  [POWERPC] Simplify the code defining the 64-bit CPU features
  [POWERPC] powerpc: kconfig warning fix
  [POWERPC] Consolidate some of kernel/misc*.S
  [POWERPC] Remove unused function call_with_mmu_off
  [POWERPC] update asm-powerpc/time.h
  [POWERPC] Clean up it_lp_queue.h
  [POWERPC] Skip the "copy down" of the kernel if it is already at zero.
  [POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
  ...
2006-06-29 11:32:34 -07:00
Linus Torvalds 1903ac54f8 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
  [PATCH] i386: export memory more than 4G through /proc/iomem
  [PATCH] 64bit Resource: finally enable 64bit resource sizes
  [PATCH] 64bit Resource: convert a few remaining drivers to use resource_size_t where needed
  [PATCH] 64bit resource: change pnp core to use resource_size_t
  [PATCH] 64bit resource: change pci core and arch code to use resource_size_t
  [PATCH] 64bit resource: change resource core to use resource_size_t
  [PATCH] 64bit resource: introduce resource_size_t for the start and end of struct resource
  [PATCH] 64bit resource: fix up printks for resources in misc drivers
  [PATCH] 64bit resource: fix up printks for resources in arch and core code
  [PATCH] 64bit resource: fix up printks for resources in pcmcia drivers
  [PATCH] 64bit resource: fix up printks for resources in video drivers
  [PATCH] 64bit resource: fix up printks for resources in ide drivers
  [PATCH] 64bit resource: fix up printks for resources in mtd drivers
  [PATCH] 64bit resource: fix up printks for resources in pci core and hotplug drivers
  [PATCH] 64bit resource: fix up printks for resources in networks drivers
  [PATCH] 64bit resource: fix up printks for resources in sound drivers
  [PATCH] 64bit resource: C99 changes for struct resource declarations

Fixed up trivial conflict in drivers/ide/pci/cmd64x.c (the printk that
was changed by the 64-bit resources had been deleted in the meantime ;)
2006-06-29 10:49:17 -07:00
Ingo Molnar c0ad90a32f [PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)

NOTE: ia64 needs testing. i386 and x86_64 tested.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:23 -07:00
Ingo Molnar 0d7012a968 [PATCH] genirq: cleanup: turn ARCH_HAS_IRQ_PER_CPU into CONFIG_IRQ_PER_CPU
Cleanup: change ARCH_HAS_IRQ_PER_CPU into a Kconfig method.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:23 -07:00
Ingo Molnar d1bef4ed5f [PATCH] genirq: rename desc->handler to desc->chip
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.

While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.

The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.

This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.

As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.

The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.

We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.

This patch:

rename desc->handler to desc->chip.

Originally i did not want to do this, because it's a big patch.  But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.

I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.

So lets get over with this quickly.  The conversion was done automatically
via scripts and converts all the code in the kernel.

This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.

[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:21 -07:00
Mark A. Greer c220153654 [POWERPC] todc: add support for Time-Of-Day-Clock
This is a resubmit with a proper subject and with all comments addressed.
Applies cleanly to powerpc.git 649e857972

Mark
--

The todc code from arch/ppc supports many todc/rtc chips and is needed
in arch/powerpc.  This patch adds the todc code to arch/powerpc.

Signed-off-by: Mark A. Greer <mgreer@mvista.com>
--

 arch/powerpc/Kconfig         |    7
 arch/powerpc/sysdev/Makefile |    1
 arch/powerpc/sysdev/todc.c   |  392 ++++++++++++++++++++++++++++++++++
 include/asm-powerpc/todc.h   |  487 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 887 insertions(+)
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-29 16:22:46 +10:00
Zang Roy-r61911 2b9d7467a6 [POWERPC] Add tsi108 pci and platform device data register function
Add Tundra Semiconductor tsi108 pci and platform device data register
function support.

Signed-off-by: Alexandre Bounine <alexandreb@tundra.com>
Signed-off-by: Roy Zang	<tie-fei.zang@freescale.com>

 ---
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-29 16:20:36 +10:00
Paul Mackerras 1729dc7833 [POWERPC] Correct the MAX_CONTEXT definition
When we increased the address space per process to 2^44 bytes, the
number of contexts that we could actually use reduced, but we forgot
to decrease the MAX_CONTEXT definition.  (Fortunately this would only
cause problems if we actually had more than 512k user processes
running.)  This patch corrects the definition.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-29 16:16:15 +10:00
Paul Mackerras 489244498e Merge branch 'for_paulus' of master.kernel.org:/pub/scm/linux/kernel/git/galak/powerpc 2006-06-28 16:10:53 +10:00
Kumar Gala 9ad494f624 powerpc: minor cleanups for mpc86xx
* Remove duplicated cputable entry for 8641 (matches w/7448)
* Removed __init from function prototypes in mpc86xx.h
* Moved pci fixups into board specific code
* Moved mpc86xx_exclude_device to generic mpc86xx pci code
* Fixed sparse warnings in mpc86xx_smp.c
* Removed board specific header include from asm-powerpc/mpc86xx.h

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2006-06-28 00:37:45 -05:00
Paul Mackerras 3965f8c597 [POWERPC] Simplify the code defining the 64-bit CPU features
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 15:19:03 +10:00
Stephen Rothwell 9b47569a9d [POWERPC] update asm-powerpc/time.h
If we ever build a combined kernel including iSeries, then this will
be needed.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 15:18:56 +10:00
Stephen Rothwell 612f02d6d6 [POWERPC] Clean up it_lp_queue.h
No more StudlyCaps.
Remove from a couple of places it is no longer needed.
Use C style comments.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 15:18:55 +10:00
David Wilder c0ce7d0886 [POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
With this patch, kdump uses the firmware soft-reset NMI for two purposes:
1) Initiate the kdump (take a crash dump) by issuing a soft-reset.
2) Break a CPU out of a deadlock condition that is detected during kdump
processing.

When a soft-reset is initiated each CPU will enter
system_reset_exception() and set its corresponding bit in the global
bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set
in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first
CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs
are "secondary CPUs". The secondary CPU's pass through to
crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs
to enter via soft-reset then boots the kdump kernel (see
crash_soft_reset_check())

When the system crashes due to a panic or exception, crash_kexec() is
called by panic() or die(). The crashing CPU sends an IPI to all other
CPUs to notify them of the pending shutdown. If a CPU is in a deadlock
or hung state with interrupts disabled, the IPI will not be delivered.
The result being, that the kdump kernel is not booted. This problem is
solved with the use of a firmware generated soft-reset. After the
crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to
enter crash_ipi_callback(). A CPU signifies its entry to
crash_ipi_callback() by setting its corresponding bit in the
cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set
their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The
operator is then prompted to generate a soft-reset to break the
deadlock. Each CPU enters the soft reset handler as described above.

Two conditions must be handled at this point:
1) The system crashed because the operator generated a soft-reset. See
2) The system had crashed before the soft-reset was generated ( in the
case of a Panic or oops).

The first CPU to enter crash_kexec() uses the state of the kexec_lock to
determine this state. If kexec_lock is already held then condition 2 is
true and crash_kexec_secondary() is called, else; this CPU is flagged as
the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds
as described above.

Each additional CPUs responding to the soft-reset will pass through
crash_kexec() to kexec_secondary(). All secondary CPUs call
crash_ipi_callback() readying them self's for the shutdown. When ready
they clear their bit in cpus_in_sr. The crashing CPU waits in
kexec_secondary() until all other CPUs have cleared their bits in
cpus_in_sr. The kexec kernel boot is then started.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 15:18:52 +10:00
Michael Ellerman cc46bb98c0 [POWERPC] Add udbg support for RTAS console
Add udbg hooks for the RTAS console, based on the RTAS put-term-char
and get-term-char calls. Along with my previous patches, this should
enable debugging as soon as early_init_dt_scan_rtas() is called.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:48 +10:00
Michael Ellerman 458148c00b [POWERPC] Setup RTAS values earlier, to enable rtas_call() earlier
Althought RTAS is instantiated when we enter the kernel, we can't actually
call into it until we know its entry point address. Currently we grab that
in rtas_initialize(), however that's quite late in the boot sequence.

To enable rtas_call() earlier, we can grab the RTAS entry etc. values while
we're scanning the flattened device tree. There's existing code to retrieve
the values from /chosen, however we don't store them there anymore, so remove
that code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:48 +10:00
Michael Ellerman aa98c50dcb [POWERPC] Make kexec_setup() a regular initcall
There's no reason kexec_setup() needs to be called explicitly from
setup_system(), it can just be a regular initcall.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:47 +10:00
Michael Ellerman 7d0daae4ae [POWERPC] powerpc: Initialise ppc_md htab pointers earlier
Initialise the ppc_md htab callbacks earlier, in the probe routines. This
allows us to call htab_finish_init() from htab_initialize(), and makes it
private to hash_utils_64.c. Move htab_finish_init() and make_bl() above
htab_initialize() to avoid forward declarations.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:47 +10:00
Haren Myneni 5f50867b4f [POWERPC] kdump: Reserve the existing TCE mappings left by the first kernel
During kdump boot, noticed some machines checkstop on dma protection
fault for ongoing DMA left in the first kernel. Instead of initializing
TCE entries in iommu_init() for the kdump boot, this patch fixes this
issue by walking through the each TCE table and checks whether the
entries are in use by the first kernel. If so, reserve those entries by
setting the corresponding bit in tbl->it_map such that these entries
will not be available for the kdump boot.

However it could be possible that all TCE entries might be used up due
to the driver bug that does continuous mapping. My observation is around
1700 TCE  entries are used on some systems (Ex: P4) at some point of
time during kdump boot and saving dump (either write into the disk or
sending to remote machine). Hence, this patch will make sure that
minimum of 2048 entries will be available such that kdump boot could be
successful in some cases.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:59:46 +10:00
Jon Loeliger f93d6d071f [POWERPC] Remove obsolete #include <linux/config.h>.
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-28 11:52:22 +10:00
Siddha, Suresh B 5c45bf279d [PATCH] sched: mc/smt power savings sched policy
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.

Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains.  When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics...  see OLS
2005 CMP kernel scheduler paper for more details..)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:45 -07:00
Greg Kroah-Hartman e31dd6e452 [PATCH] 64bit resource: change pci core and arch code to use resource_size_t
Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>

Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-27 09:24:00 -07:00
Anil S Keshavamurthy e6f47f978b [PATCH] Notify page fault call chain
With this patch Kprobes now registers for page fault notifications only when
their is an active probe registered.  Once all the active probes are
unregistered their is no need to be notified of page faults and kprobes
unregisters itself from the page fault notifications.  Hence we will have ZERO
side effects when no probes are active.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:22 -07:00
Anil S Keshavamurthy 4f9e87c045 [PATCH] Notify page fault call chain for powerpc
Overloading of page fault notification with the notify_die() has performance
issues(since the only interested components for page fault is kprobes and/or
kdb) and hence this patch introduces the new notifier call chain exclusively
for page fault notifications their by avoiding notifying unnecessary
components in the do_page_fault() code path.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:22 -07:00
Paul Mackerras bfe5d83419 [PATCH] Define __raw_get_cpu_var and use it
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled.  For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Matt Mackall afedfd016a [PATCH] random: remove SA_SAMPLE_RANDOM from floppy driver
The floppy driver is already calling add_disk_randomness as it should, so this
was redundant.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:00 -07:00
Michael Hanselmann 5474c120aa [PATCH] Rewritten backlight infrastructure for portable Apple computers
This patch contains a total rewrite of the backlight infrastructure for
portable Apple computers.  Backward compatibility is retained.  A sysfs
interface allows userland to control the brightness with more steps than
before.  Userland is allowed to upload a brightness curve for different
monitors, similar to Mac OS X.

[akpm@osdl.org: add needed exports]
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:00:59 -07:00
Linus Torvalds 45c091bb2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
  [POWERPC] re-enable OProfile for iSeries, using timer interrupt
  [POWERPC] support ibm,extended-*-frequency properties
  [POWERPC] Extra sanity check in EEH code
  [POWERPC] Dont look for class-code in pci children
  [POWERPC] Fix mdelay badness on shared processor partitions
  [POWERPC] disable floating point exceptions for init
  [POWERPC] Unify ppc syscall tables
  [POWERPC] mpic: add support for serial mode interrupts
  [POWERPC] pseries: Print PCI slot location code on failure
  [POWERPC] spufs: one more fix for 64k pages
  [POWERPC] spufs: fail spu_create with invalid flags
  [POWERPC] spufs: clear class2 interrupt status before wakeup
  [POWERPC] spufs: fix Makefile for "make clean"
  [POWERPC] spufs: remove stop_code from struct spu
  [POWERPC] spufs: fix spu irq affinity setting
  [POWERPC] spufs: further abstract priv1 register access
  [POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
  [POWERPC] spufs: dont try to access SPE channel 1 count
  [POWERPC] spufs: use kzalloc in create_spu
  [POWERPC] spufs: fix initial state of wbox file
  ...

Manually resolved conflicts in:
	drivers/net/phy/Makefile
	include/asm-powerpc/spu.h
2006-06-22 22:11:30 -07:00
Bjorn Helgaas 4f1bcaf094 [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use
VGA_MAP_MEM translates to ioremap() on some architectures.  It makes sense
to do this to vga_vram_base, because we're going to access memory between
vga_vram_base and vga_vram_end.

But it doesn't really make sense to map starting at vga_vram_end, because
we aren't going to access memory starting there.  On ia64, which always has
to be different, ioremapping vga_vram_end gives you something completely
incompatible with ioremapped vga_vram_start, so vga_vram_size ends up being
nonsense.

As a bonus, we often know the size up front, so we can use ioremap()
correctly, rather than giving it a zero size.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:58 -07:00
Anton Blanchard 1e92a550e8 [POWERPC] Fix mdelay badness on shared processor partitions
On partitioned PPC64 systems where a partition is given 1/10 of a
processor, we have seen mdelay() delaying for 10 times longer than it
should.  The reason is that the generic mdelay(n) does n delays of 1
millisecond each.  However, with 1/10 of a processor, we only get a
one-millisecond timeslice every 10ms.  Thus each 1 millisecond delay
loop ends up taking 10ms elapsed time.

The solution is just to use the PPC64 udelay function, which uses the
timebase to ensure that the delay is based on elapsed time rather than
how much processing time the partition has been given.  (Yes, the
generic mdelay uses the PPC64 udelay, but the problem is that the
start time gets reset every millisecond, and each time it gets reset
we lose another 9ms.)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Andrew Morton <akpm@osdl.org>
2006-06-21 15:01:33 +10:00
Arnd Bergmann ddf5f75a16 [POWERPC] disable floating point exceptions for init
Floating point exceptions should not be enabled by default,
as this setting impacts the performance on some CPUs, in
particular the Cell BE. Since the bits are inherited from
parent processes, the place to change the default is the
thread struct used for init.

glibc sets this up correctly per thread in its fesetenv
function, so user space should not be impacted by this
setting. None of the other common libc implementations
(uClibc, dietlibc, newlib, klibc) has support for fp
exceptions, so they are unlikely to be hit by this either.

There is a small risk that somebody wrote their own
application that manually sets the fpscr bits instead
of calling fesetenv, without changing the MSR bits as well.
Those programs will break with this change.

It probably makes sense to change glibc in the future
to be more clever about FE bits, so that when running
on a CPU where this is expensive, it disables exceptions
ASAP, while it keeps them enabled on CPUs where running
with exceptions on is cheaper than changing the state
often.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:33 +10:00
Andreas Schwab 72abd54035 [POWERPC] Unify ppc syscall tables
Avoid duplication of the syscall table for the cell platform.  Based on an
idea from David Woodhouse.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:32 +10:00
Mark A. Greer 868ea0c925 [POWERPC] mpic: add support for serial mode interrupts
On Tue, Jun 20, 2006 at 02:01:26PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2006-06-19 at 13:08 -0700, Mark A. Greer wrote:
> > MPC10x-style interrupt controllers have a serial mode that allows
> > several interrupts to be clocked in through one INT signal.
> >
> > This patch adds the software support for that mode.
>
> You hard code the clock ratio... why not add a separate call to be
> called after mpic_init,
> something like mpic_set_serial_int(int mpic, int enable, int
> clock_ratio) ?

How's this?
--

MPC10x-style interrupt controllers have a serial mode that allows
several interrupts to be clocked in through one INT signal.

This patch adds the software support for that mode.

Signed-off-by: Mark A. Greer <mgreer@mvista.com>
--

 arch/powerpc/sysdev/mpic.c |   20 ++++++++++++++++++++
 include/asm-powerpc/mpic.h |   10 ++++++++++
 2 files changed, 30 insertions(+)
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:32 +10:00
arnd@arndb.de 379507181a [POWERPC] spufs: one more fix for 64k pages
The SPU context save/restore code is currently built
for a 4k page size and we provide a _shipped version
of it since most people don't have the spu toolchain
that is needed to rebuild that code.

This patch hardcodes the data structures to a 64k
page alignment, which also guarantees 4k alignment
but unfortunately wastes 60k of memory per SPU
context that is created in the running system.

We will follow up on this with another patch to
reduce that overhead or maybe redo the context
save/restore logic to do this part entirely different,
but for now it should make experimental systems
work with either page size.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:32 +10:00
Masato Noguchi 2eabbbd33e [POWERPC] spufs: remove stop_code from struct spu
This patch remove 'stop_code' -- discarded member of struct spu.
It is written at initialize and interrupt, but never read
in current implementation.

Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:31 +10:00
Geoff Levand a91942ae7e [POWERPC] spufs: fix spu irq affinity setting
This changes the hypervisor abstraction of setting cpu affinity to a
higher level to avoid platform dependent interrupt controller
routines.  I replaced spu_priv1_ops:spu_int_route_set() with a
new routine spu_priv1_ops:spu_cpu_affinity_set().

As a by-product, this change eliminated what looked like an
existing bug in the set affinity code where spu_int_route_set()
mistakenly called int_stat_get().

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:31 +10:00
Geoff Levand 540270d82d [POWERPC] spufs: further abstract priv1 register access
To support muti-platform binaries the spu hypervisor accessor
routines must have runtime binding.

I removed the existing statically linked routines in spu.h
and spu_priv1_mmio.c and created new accessor routines in spu_priv1.h
that operate indirectly through an ops struct spu_priv1_ops.
spu_priv1_mmio.c contains the instance of the accessor routines
for running on raw hardware.

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:31 +10:00
Jeremy Kerr 1d64093f66 [POWERPC] cell: register SPUs as sysdevs
SPUs are registered as system devices, exposing attributes through
sysfs. Since the sysdev includes a kref, we can remove the one in
struct spu (it isn't used at the moment anyway).

Currently only the interrupt source and numa node attributes are added.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:29 +10:00
Benjamin Herrenschmidt acf7d76827 [POWERPC] cell: add RAS support
This is a first version of support for the Cell BE "Reliability,
Availability and Serviceability" features.

It doesn't yet handle some of the RAS interrupts (the ones described in
iic_is/iic_irr), I'm still working on a proper way to expose these. They
are essentially a cascaded controller by themselves (sic !) though I may
just handle them locally to the iic driver. I need also to sync with
David Erb on the way he hooked in the performance monitor interrupt.

So that's all for 2.6.17 and I'll do more work on that with my rework of
the powerpc interrupt layer that I'm hacking on at the moment.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:29 +10:00
Jon Loeliger 6b54340405 [POWERPC] Add 8641 Register space and IRQ definitions.
Signed-off-by: Jeff Brown <Jeff.Brown@freescale.com>
Signed-off-by: Xianghua Xiao <x.xiao@freescale.com>
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-21 15:01:28 +10:00
Linus Torvalds cee4cca740 Merge git://git.infradead.org/hdrcleanup-2.6
* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
  [S390] __FD_foo definitions.
  Switch to __s32 types in joystick.h instead of C99 types for consistency.
  Add <sys/types.h> to headers included for userspace in <linux/input.h>
  Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
  Remove struct fddi_statistics from user view in <linux/if_fddi.h>
  Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
  Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
  Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h>
  Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too
  Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>
  Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h>
  Use __uXX types in <linux/divert.h> for struct divert_blk et al.
  Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
  Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely
  Use __uXX types in user-visible structures in <linux/nbd.h>
  Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
  Use __uXX types for S390 DASD volume label definitions which are user-visible
  S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
  Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h>
  Fix private integer types used in V4L2 ioctls.
  ...

Manually resolve conflict in include/linux/mtd/physmap.h
2006-06-20 15:10:08 -07:00
David Woodhouse 6a87a86d06 Add Kbuild file for PowerPC 'make headers_install'
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-18 12:17:47 +01:00
Arnd Bergmann ce221982e0 [PATCH] powerpc: enable CPU_FTR_CI_LARGE_PAGE for cell
Reflect the fact that the Cell Broadband Engine supports 64k
pages by adding the bit to the CPU features.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Dave C Boutcher 368a6ba5d1 [POWERPC] check firmware state before suspending
Currently the kernel blindly halts all the processors and calls the
ibm,suspend-me rtas call.  If the firmware is not in the correct
state, we then re-start all the processors and return.  It is much
smarter to first check the firmware state, and only if it is waiting,
call the ibm,suspend-me call.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:27 +10:00
Anton Blanchard 9e6e3c2c79 [POWERPC] Fix HV bit handling on non partitioned machines
On non partitioned machines we currently set the HV bit in kernel space
only. It turns out we are supposed to maintain the HV bit in both user
and kernel space.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Anton Blanchard ca1588e71b [POWERPC] node local IOMMU tables
Allocate IOMMU tables local to the relevant node.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Anton Blanchard 357518fa34 [POWERPC] pcibus_to_node fixes
of_node_to_nid returns -1 if the associativity cannot be found. This
means pcibus_to_cpumask has to be careful not to pass a negative index into
node_to_cpumask.

Since pcibus_to_node could be used a lot, and of_node_to_nid is slow (it
walks a list doing strcmps), lets also cache the node in the
pci_controller struct.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Anton Blanchard 227318bbde [POWERPC] Remove stale 64bit on 32bit kernel code
Remove some stale POWER3/POWER4/970 on 32bit kernel support.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Anton Blanchard 3a2c48cfc9 [POWERPC] 64bit FPSCR support
Forthcoming machines will extend the FPSCR to 64 bits.  We already
had a 64-bit save area for the FPSCR, but we need to use a new form
of the mtfsf instruction.  Fortunately this new form is decoded as
an ordinary mtfsf by existing 64-bit processors.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:25 +10:00
Jake Moilanen 204face4fb [POWERPC] MSI abstraction
Instead of trying to make PPC64 MSI fit in a Intel-centric MSI layer, a
simple short-term solution is to hook the pci_{en/dis}able_msi() calls
and make a machdep call.

The rest of the MSI functions are superfluous for what is needed at this
time.  Many of which can have machdep calls added as needed.

Ben and Michael Ellerman are looking into rewrite the MSI layer to be
more generic.  However, in the meantime this works as a interim
solution.

Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:25 +10:00
Paul Mackerras bf72aeba2f powerpc: Use 64k pages without needing cache-inhibited large pages
Some POWER5+ machines can do 64k hardware pages for normal memory but
not for cache-inhibited pages.  This patch lets us use 64k hardware
pages for most user processes on such machines (assuming the kernel
has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
start out using 64k pages and get switched to 4k pages if they use any
non-cacheable mappings.

With this, we use 64k pages for the vmalloc region and 4k pages for
the imalloc region.  If anything creates a non-cacheable mapping in
the vmalloc region, the vmalloc region will get switched to 4k pages.
I don't know of any driver other than the DRM that would do this,
though, and these machines don't have AGP.

When a region gets switched from 64k pages to 4k pages, we do not have
to clear out all the 64k HPTEs from the hash table immediately.  We
use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
was hashed in as a 64k page or a set of 4k pages.  If hash_page is
trying to insert a 4k page for a Linux PTE and it sees that it has
already been inserted as a 64k page, it first invalidates the 64k HPTE
before inserting the 4k HPTE.  The hash invalidation routines also use
the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
set of 4k HPTEs to remove.  With those two changes, we can tolerate a
mix of 4k and 64k HPTEs in the hash table, and they will all get
removed when the address space is torn down.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 10:45:18 +10:00
Paul Mackerras 4306443128 powerpc: Remove unused paca->pgdir field
The pgdir field in the paca was a leftover from the dynamic VSIDs
patch, and is not used in the current kernel code.  This removes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-12 18:38:21 +10:00
Paul Mackerras e9370ae15d [PATCH] powerpc: Implement PR_[GS]ET_UNALIGN prctls for powerpc
This gives the ability to control whether alignment exceptions get
fixed up or reported to the process as a SIGBUS, using the existing
PR_SET_UNALIGN and PR_GET_UNALIGN prctls.  We do not implement the
option of logging a message on alignment exceptions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:24:16 +10:00
Paul Mackerras fab5db97e4 [PATCH] powerpc: Implement support for setting little-endian mode via prctl
This adds the PowerPC part of the code to allow processes to change
their endian mode via prctl.

This also extends the alignment exception handler to be able to fix up
alignment exceptions that occur in little-endian mode, both for
"PowerPC" little-endian and true little-endian.

We always enter signal handlers in big-endian mode -- the support for
little-endian mode does not amount to the creation of a little-endian
user/kernel ABI.  If the signal handler returns, the endian mode is
restored to what it was when the signal was delivered.

We have two new kernel CPU feature bits, one for PPC little-endian and
one for true little-endian.  Most of the classic 32-bit processors
support PPC little-endian, and this is reflected in the CPU feature
table.  There are two corresponding feature bits reported to userland
in the AT_HWCAP aux vector entry.

This is based on an earlier patch by Anton Blanchard.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:24:15 +10:00
Michael Neuling e78dbc800c [PATCH] powerpc: oprofile support for POWER6
POWER6 moves some of the MMCRA bits and also requires some bits to be
cleared each PMU interrupt.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:24:05 +10:00
Christoph Hellwig 8eb6c6e3b9 [PATCH] powerpc: node-aware dma allocations
Make sure dma_alloc_coherent allocates memory from the local node.  This
is important on Cell where we avoid going through the slow cpu
interconnect.

Note:  I could only test this patch on Cell, it should be verified on
some pseries machine by those that have the hardware.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:24:01 +10:00
Christoph Hellwig 318facbee0 [PATCH] powerpc: implement pcibus_to_node and pcibus_to_cpumask
On 64bit powerpc we can find out what node a pci bus hangs off, so
implement the topology.h macros that export this information.

For 32bit this seems a little more difficult, but I don't know of 32bit
powerpc NUMA machines either, so let's leave it out for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:21:08 +10:00
John Rose 507279db18 [PATCH] powerpc: reorg RTAS delay code
This patch attempts to handle RTAS "busy" return codes in a more simple
and consistent manner.  Typical callers of RTAS shouldn't have to
manage wait times and delay calls.

This patch also changes the kernel to use msleep() rather than udelay()
when a runtime delay is necessary.  This will avoid CPU soft lockups
for extended delay conditions.

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:21:06 +10:00
Benjamin Herrenschmidt c5cf0e30bf [PATCH] powerpc: Fix buglet with MMU hash management
Our MMU hash management code would not set the "C" bit (changed bit) in
the hardware PTE when updating a RO PTE into a RW PTE. That would cause
the hardware to possibly to a write back to the hash table to set it on
the first store access, which in addition to being a performance issue,
might also hit a bug when running with native hash management (non-HV)
as our code is specifically optimized for the case where no write back
happens.

Thus there is a very small therocial window were a hash PTE can become
corrupted if that HPTE has just been upgraded to read write, a store
access happens on it, and that races with another processor evicting
that same slot. Since eviction (caused by an almost full hash) is
extremely rare, the bug is very unlikely to happen fortunately.

This fixes by allowing the updating of the protection bits in the native
hash handling to also set (but not clear) the "C" bit, and, in order to
also improve performances in the general case, by always setting that
bit on newly inserted hash PTE so that writeback really never happens.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:20:59 +10:00
Benjamin Herrenschmidt a5bba930d8 [PATCH] powerpc vdso updates
This patch cleans up some locking & error handling in the ppc vdso and
moves the vdso base pointer from the thread struct to the mm context
where it more logically belongs. It brings the powerpc implementation
closer to Ingo's new x86 one and also adds an arch_vma_name() function
allowing to print [vsdo] in /proc/<pid>/maps if Ingo's x86 vdso patch is
also applied.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:20:57 +10:00
Renzo Davoli 98a90c0279 [PATCH] powerpc: enable PPC_PTRACE_[GS]ETREGS on ppc32
I have tested PPC_PTRACE_GETREGS and PPC_PTRACE_SETREGS on umview.

I do not understand why historically these tags has been defined as
PPC_PTRACE_GETREGS and PPC_PTRACE_SETREGS instead of simply
PTRACE_[GS]ETREGS. The other "originality" is that the address must be
put into the "addr" field instead of the "data" field as stated in the
manual.

Signed-off-by: renzo davoli <renzo@cs.unibo.it>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-09 21:20:51 +10:00
Paul Mackerras c029cc66cb Merge branch 'merge' 2006-06-01 19:05:23 +10:00
Paul Mackerras 6bf08cb246 [PATCH] Add CMSPAR to termbits.h for powerpc and alpha
Some driver wants to use CMSPAR, but it was missing on alpha and powerpc.
This adds it, with the same value as every other architecture uses.

(akpm: fixes the build of an upcoming gregkh USB patch)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-26 11:55:46 -07:00
David Woodhouse 66643de455 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-powerpc/unistd.h
	include/asm-sparc/unistd.h
	include/asm-sparc64/unistd.h

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24 09:22:21 +01:00
Jon Mason 0a9cb46a73 [PATCH] remove powerpc bitops in favor of existing generic bitops
There already exists a big endian safe bitops implementation in
lib/find_next_bit.c.  The code in it is 90%+ common with the powerpc
specific version, so the powerpc version is redundant.  This patch
makes the necessary changes to use the generic bitops in powerpc, and
removes the powerpc specific version.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-24 16:08:58 +10:00
Stephen Rothwell 403fac4f83 [PATCH] powerpc: remove LogicalSlot from pci_dn
As we now store enough information in the device_node.

Also the Flags field was not used either, do remove that.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-24 16:08:56 +10:00
Stephen Rothwell b025279316 [PATCH] powerpc: remove Irq from pci_dn
As we now store enough information in the device_node to allocate the
irq number in pcibios_final_fixup.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-24 16:08:56 +10:00
Stephen Rothwell 96ff6afaf1 [PATCH] powerpc: remove iSeries_Global_Device_List
We can now scan the list of device nodes instead.  This also allows us
to remove the Device_list member of struct pci_dn.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-24 16:08:56 +10:00
David Woodhouse 0f04108237 [PATCH] powerpc: wire up sys_[gs]et_robust_list
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23 10:35:32 -07:00
Jeremy Kerr d4ad66faec [PATCH] powerpc: Add of_parse_dma_window()
Add a function for generic parsing of dma-window properties (ie,
ibm,dma-window and ibm,my-dma-window) of pci and virtual device nodes.

This function will also be used by cell.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:21 +10:00
jimix@watson.ibm.com 8ae5b2801a [PATCH] powerpc: udbg_printf() formatting attribute
This patch allows the compiler to catch any printf-like mismatches for
udbg_printf().  After some brute force building I've only found issues
with my own code and lparcfg.c It could break some developers, but
IMHO that would be goodness.

Signed-off-by: Jimi Xenidis <jimix@watson.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:19 +10:00
Michael Ellerman 35dd54326e [PATCH] powerpc: Move crashkernel= handling into the kernel.
This was missing a quilt ref.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:18 +10:00
Michael Ellerman 473104134b [PATCH] powerpc: Kdump header cleanup
We need to know the base address of the kdump kernel even when we're not a
kdump kernel, so add a #define for it. Move the logic that sets the kdump
kernelbase into kdump.h instead of page.h.

Rename kdump_setup() to setup_kdump_trampoline() to make it clearer what it's
doing, and add an empty definition for the !CRASH_DUMP case to avoid a

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:16 +10:00
Michael Ellerman 2babf5c2ec [PATCH] powerpc: Unify mem= handling
We currently do mem= handling in three seperate places. And as benh pointed out
I wrote two of them. Now that we parse command line parameters earlier we can
clean this mess up.

Moving the parsing out of prom_init means the device tree might be allocated
above the memory limit. If that happens we'd have to move it. As it happens
we already have logic to do that for kdump, so just genericise it.

This also means we might have reserved regions above the memory limit, if we
do the bootmem allocator will blow up, so we have to modify
lmb_enforce_memory_limit() to truncate the reserves as well.

Tested on P5 LPAR, iSeries, F50, 44p. Tested moving device tree on P5 and
44p and F50.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 15:02:15 +10:00
Michael Neuling d6b89a196d [PATCH] powerpc: whitespace cleanup in reg.h
In reg.h we mostly have #define<space> but there are a few #define<tab>
around.  Clean these up so we use space exclusively.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-19 14:35:25 +10:00
David Woodhouse 5047f09b56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-06 19:59:18 +01:00
Paul Mackerras f18fc729cd Merge ../linux-2.6 2006-05-05 15:45:48 +10:00
David Woodhouse 5ee882f153 Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-04 12:39:12 +01:00
Paul Mackerras 6bfd93c32a powerpc: Fix incorrect might_sleep in __get_user/__put_user on kernel addresses
We have a case where __get_user and __put_user can validly be used
on kernel addresses in interrupt context - namely, the alignment
exception handler, as our get/put_unaligned just do a single access
and rely on the alignment exception handler to fix things up in the
rare cases where the cpu can't handle it in hardware.  Thus we can
get alignment exceptions in the network stack at interrupt level.
The alignment exception handler does a __get_user to read the
instruction and blows up in might_sleep().

Since a __get_user on a kernel address won't actually ever sleep,
this makes the might_sleep conditional on the address being less
than PAGE_OFFSET.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-03 23:06:46 +10:00
Jeremy Kerr 8261aa6009 [PATCH] powerpc: cell: Add numa id to struct spu
Add an nid member to the spu structure, and store the numa id of the spu there
on creation.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:46 -07:00
Jeremy Kerr 953039c8df [PATCH] powerpc: Allow devices to register with numa topology
Change of_node_to_nid() to traverse the device tree, looking for a numa id.
Cell uses this to assign ids to SPUs, which are children of the CPU node.
Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(),
which doesn't do the traversal.

Export an attach_sysdev_to_node() function, allowing system devices (eg.
SPUs) to link themselves into the numa topology in sysfs.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:46 -07:00
David Woodhouse b07019f293 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-30 20:34:39 +01:00
Olof Johansson bc97ce951c [PATCH] powerpc: kill union tce_entry
It's been long overdue to kill the union tce_entry in the pSeries/iSeries
TCE code, especially since I asked the Summit guys to do it on the code
they copied from us.

Also, while I was at it, I cleaned up some whitespace.

Built and booted on pSeries, built on iSeries.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-29 18:07:54 +10:00
Stephen Rothwell c7f0e8cb56 [PATCH] powerpc: merge the rest of the vio code
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-29 18:02:02 +10:00
Stephen Rothwell dd721ffd95 [PATCH] powerpc: use a common vio_match_device routine
This requires the compatible properties having vaules that are empty
strings instead of just being empty properties.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-29 18:02:01 +10:00
Stephen Rothwell e10fa77368 [PATCH] powerpc: use the device tree for the iSeries vio bus probe
As an added bonus, since every vio_dev now has a device_node
associated with it, hotplug now works.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-29 18:02:00 +10:00
Paul Mackerras 29f147d746 Merge branch 'merge' 2006-04-29 16:15:57 +10:00
Anton Blanchard 03054d51a7 [PATCH] powerpc: Add cputable entry for POWER6
Add a cputable entry for the POWER6 processor.

The SIHV and SIPR bits in the mmcra have moved in POWER6, so disable
support for that until oprofile is fixed.

Also tell firmware that we know about POWER6.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-29 10:56:58 +10:00
David Woodhouse 5614253686 Remove unneeded _syscallX macros from user view in asm-*/unistd.h
These aren't needed by glibc or klibc, and they're broken in some cases
anyway. The uClibc folks are apparently switching over to stop using
them too (now that we agreed that they should be dropped, at least).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-29 01:51:47 +01:00
David Woodhouse d6754b401a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2006-04-29 01:42:26 +01:00
Andreas Schwab 2833c28aa0 [PATCH] powerpc: Wire up *at syscalls
Wire up *at syscalls.

This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 21:04:59 +10:00
David Woodhouse 1269277a5e [PATCH] powerpc: Use check_legacy_ioport() on ppc32 too.
Some people report that we die on some Macs when we are expecting to
catch machine checks after poking at some random I/O address. I'd seen
it happen on my dual G4 with serial ports until we fixed those to use
OF, but now other users are reporting it with i8042.

This expands the use of check_legacy_ioport() to avoid that situation
even on 32-bit kernels.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 21:04:55 +10:00
David Gibson f10a04c034 [PATCH] powerpc: Fix pagetable bloat for hugepages
At present, ARCH=powerpc kernels can waste considerable space in
pagetables when making large hugepage mappings.  Hugepage PTEs go in
PMD pages, but each PMD page maps 256M and so contains only 16
hugepage PTEs (128 bytes of data), but takes up a 1024 byte
allocation.  With CONFIG_PPC_64K_PAGES enabled (64k base page size),
the situation is worse.  Now hugepage PTEs are at the PTE page level
(also mapping 256M), so we store 16 hugepage PTEs in a 64k allocation.

The PowerPC MMU already means that any 256M region is either all
hugepage, or all normal pages.  Thus, with some care, we can use a
different allocation for the hugepage PTE tables and only allocate the
128 bytes necessary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 15:02:51 +10:00
David Woodhouse 62c4f0a2d5 Don't include linux/config.h from anywhere else in include/
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-26 12:56:16 +01:00
Jens Axboe 912d35f867 [PATCH] Add support for the sys_vmsplice syscall
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.

This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:59:21 +02:00
David Woodhouse dd02ec3ac2 Remove user-visible references to PAGE_SIZE in include/asm-powerpc/elf.h
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-25 13:51:52 +01:00
Paul Mackerras 916a3d5729 Merge branch 'merge' 2006-04-23 10:55:56 +10:00
Paul Mackerras d0e15bed84 powerpc: Fix define_machine so machine_is() works from modules
machine_is() was always returning 0 when used in a module, because
we weren't exporting the machine definitions.  This was why sound
wasn't working on powermacs when CONFIG_SND_POWERMAC=m.  Original
fix from Ben Herrenschmidt, further fixed by me.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-23 10:42:04 +10:00
Linas Vepstas ac325acd50 [PATCH] powerpc/pseries: clear PCI failure counter if no new failures
The current PCI error recovery system keeps track of the number of PCI card
resets, and refuses to bring a card back up if this number is too large.
The goal of doing this was to avoid an infinite loop of resets if a card is
obviously dead.  However, if the failures are rare, but the machine has a
high uptime, this mechanism might still be triggered; this is too harsh.

This patch will avoids this problem by decrementing the fail count after an
hour.  Thus, as long as a pci card BSOD's less than 6 times an hour, it
will continue to be reset indefinitely.  If it's failure rate is greater
than that, it will be taken off-line permanently.

This patch is larger than it might otherwise be because it changes
indentation by removing a pointless while-loop.  The while loop is not
needed, as the handler is invoked once fo each event (by schedule_work());
the loop is leftover cruft from an earlier implementation.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22 18:46:13 +10:00
Anton Blanchard c256f4b959 [PATCH] powerpc: remove io_page_mask
Cleanup patch which removes the io_page_mask.  It fixes the reset on
some e1000 devices which is needed for clean kexec reboots.  The legacy
devices which broke with this patch (parallel port and PC speaker) have
now been fixed in Linus' tree.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22 18:45:05 +10:00
Olof Johansson 7daa411b81 [PATCH] powerpc: IOMMU support for honoring dma_mask
Some devices don't support full 32-bit DMA address space, which we currently
assume. Add the required mask-passing to the IOMMU allocators.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-21 22:28:55 +10:00
Linus Torvalds 6fbe85f914 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
  powerpc: Use correct sequence for putting CPU into nap mode
  [PATCH] spufs: fix context-switch decrementer code
  [PATCH] powerpc32: Set cpu explicitly in kernel compiles
  [PATCH] powerpc/pseries: bugfix: balance calls to pci_device_put
  [PATCH] powerpc: Fix machine detection in prom_init.c
  [PATCH] ppc32: Fix string comparing in platform_notify_map
  [PATCH] powerpc: Avoid __initcall warnings
  [PATCH] powerpc: Ensure runlatch is off in the idle loop
  powerpc: Fix CHRP booting - needs a define_machine call
  powerpc: iSeries has only 256 IRQs
2006-04-18 10:34:24 -07:00
Paul Mackerras f39224a8c1 powerpc: Use correct sequence for putting CPU into nap mode
We weren't using the recommended sequence for putting the CPU into
nap mode.  When I changed the idle loop, for some reason 7447A cpus
started hanging when we put them into nap mode.  Changing to the
recommended sequence fixes that.

The complexity here is that the recommended sequence is a loop that
keeps putting the cpu back into nap mode.  Clearly we need some way
to break out of the loop when an interrupt (external interrupt,
decrementer, performance monitor) occurs.  Here we use a bit in
the thread_info struct to indicate that we need this, and the exception
entry code notices this and arranges for the exception to return
to the value in the link register, thus breaking out of the loop.
We use a new `local_flags' field in the thread_info which we can
alter without needing to use an atomic update sequence.

The PPC970 has the same recommended sequence, so we do the same thing
there too.

This also fixes a bug in the kernel stack overflow handling code on
32-bit, since it was causing a value that we needed in a register to
get trashed.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-18 21:49:11 +10:00
Jens Axboe 70524490ee [PATCH] splice: add support for sys_tee()
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.

Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:51:17 +02:00
Yasunori Goto c80d79d746 [PATCH] Configurable NODES_SHIFT
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch.  Its definition is sometimes configurable.  Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree.  But it looks a bit messy.

SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config.  Suitable node's number may be changed in the
future even if it is other architecture.  So, I wrote configurable node's
number.

This patch set defines just default value for each arch which needs multi
nodes except ia64.  But, it is easy to change to configurable if necessary.

On ia64 the number of nodes can be already configured in generic ia64 and SN2
config.  But, NODES_SHIFT is defined for DIG64 and HP'S machine too.  So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT.  It
would be simpler.

See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:39 -07:00
Stephen Rothwell 7d01c88085 powerpc: iSeries has only 256 IRQs
The iSeries Hypervisor only allows us to specify IRQ numbers up to 255 (it
has a u8 field to pass it in).  This patch allows platforms to specify a
maximum to the virtual IRQ numbers we will use and has iSeries set that
to 255.  If not set, the maximum is NR_IRQS - 1 (as before).

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2006-04-04 14:49:48 +10:00
Nathan Fontenot 794e085e56 [PATCH] powerpc/pseries: EEH Cleanup
This patch removes unnecessary exports, marks functions as static when
possible, and simplifies some list-related code.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:37:11 +11:00
Heiko J Schick b13a96cfb0 [PATCH] powerpc: Extends HCALL interface for InfiniBand usage
This extends the HCALL interface for InfiniBand usage. I've
made the patch against the linux-2.6 git tree and Segher's patch:
[PATCH] Change H_StudlyCaps to H_SHOUTING_CAPS

We moved this into the common powerpc code based on comments we
got after posting the first eHCA InfiniBand device driver patch.

Signed-off-by: Heiko j Schick <schickhj@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:37:00 +11:00
Segher Boessenkool 706c8c93ba [PATCH] powerpc/pseries: Change H_StudlyCaps to H_SHOUTING_CAPS
Also cleans up some nearby whitespace problems.

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:36:57 +11:00
Linus Torvalds 4b75679f60 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NET]: Allow skb headroom to be overridden
  [TCP]: Kill unused extern decl for tcp_v4_hash_connecting()
  [NET]: add SO_RCVBUF comment
  [NET]: Deinline some larger functions from netdevice.h
  [DCCP]: Use NULL for pointers, comfort sparse.
  [DECNET]: Fix refcount
2006-03-31 12:52:30 -08:00
Anton Blanchard 025be81e83 [NET]: Allow skb headroom to be overridden
Previously we added NET_IP_ALIGN so an architecture can override the
padding done to align headers. The next step is to allow the skb
headroom to be overridden.

We currently always reserve 16 bytes to grow into, meaning all DMAs
start 16 bytes into a cacheline. On ppc64 we really want DMA writes to
start on a cacheline boundary, so we increase that headroom to one
cacheline.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-31 02:27:06 -08:00
Jens Axboe 5274f052e7 [PATCH] Introduce sys_splice() system call
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).

From the splice.c comments:

   "splice": joining two ropes together by interweaving their strands.

   This is the "extended pipe" functionality, where a pipe is used as
   an arbitrary in-memory buffer. Think of a pipe as a small kernel
   buffer that you can use to transfer data from one end to the other.

   The traditional unix read/write is extended with a "splice()" operation
   that transfers data buffers to or from a pipe buffer.

   Named by Larry McVoy, original implementation from Linus, extended by
   Jens to support splicing to files and fixing the initial implementation
   bugs.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 12:28:18 -08:00
Anton Blanchard 15e812ad84 [PATCH] powerpc: Remove oprofile spinlock backtrace code
Remove oprofile spinlock backtrace code now we have proper calltrace
support. Also make MMCRA sihv and sipr bits a variable since they may
change in future cpus. Finally, MMCRA should be a 64bit quantity.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:16 +11:00
Brian Rogan 6c6bd754bf [PATCH] powerpc: Add oprofile calltrace support
Add oprofile calltrace support to powerpc. Disable spinlock backtracing
now we can use calltrace info.

(Updated to work on both 32bit and 64bit by me).

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:16 +11:00
KAMEZAWA Hiroyuki 0e5519548f [PATCH] for_each_possible_cpu: powerpc
for_each_cpu() actually iterates across all possible CPUs.  We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs.  This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:15 +11:00
Paul Mackerras bac30d1a78 Merge ../linux-2.6 2006-03-29 13:24:50 +11:00
Benjamin Herrenschmidt e8222502ee [PATCH] powerpc: Kill _machine and hard-coded platform numbers
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism.  With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants.  This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 23:15:54 +11:00
Andrew Morton 872345b715 [PATCH] git-powerpc: WARN was a dumb idea
There are at least 14 different implementations of WARN() in the tree already.
The build fails all over the place.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 20:48:54 +11:00
Stephen Rothwell b239cbe957 [PATCH] powerpc: make ISA floppies work again
We used to assume that a DMA mapping request with a NULL dev was for
ISA DMA.  This assumption was broken at some point.  Now we explicitly
pass the detected ISA PCI device in the floppy setup.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:36 +11:00
Ryan S. Arnold 45d607ed92 [PATCH] powerpc: hvc_console updates
These are some updates from both Ryan and Arnd for the hvc_console
driver:

The main point is to enable the inclusion of a console driver
for rtas, which is currrently needed for the cell platform.

Also shuffle around some data-type declarations and moves some
functions out of include/asm-ppc64/hvconsole.h and into a new
drivers/char/hvc_console.h file.

Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com>
Signed-off-by: Arnd Bergmann <abergman@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:26 +11:00
Michael Ellerman d0160bf0b3 [PATCH] powerpc: Rename and export ppc64_firmware_features
We need to export ppc64_firmware_features for modules. Before we do that
I think we should probably rename it to powerpc_firmware_features.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:20 +11:00
Anton Blanchard 2f25194dbe [PATCH] powerpc: export validate_sp for oprofile calltrace
Export validate_sp so we can use it in the oprofile calltrace code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:19:52 +11:00
Anton Blanchard 72533db012 [PATCH] powerpc: Remove some ifdefs in oprofile_impl.h
- No one uses op_counter_config.valid, so remove it
- No need to ifdef around function protypes.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:19:49 +11:00
Paul Mackerras 0a26b1364f ppc: Remove CHRP, POWER3 and POWER4 support from arch/ppc
32-bit CHRP machines are now supported only in arch/powerpc, as are
all 64-bit PowerPC processors.  This means that we don't use
Open Firmware on any platform in arch/ppc any more.

This makes PReP support a single-platform option like every other
platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM
is gone from arch/ppc.  CONFIG_PPC_PREP is the option that selects
PReP support and is generally what has replaced
CONFIG_PPC_MULTIPLATFORM within arch/ppc.

_machine is all but dead now, being #defined to 0.

Updated Makefiles, comments and Kconfig options generally to reflect
these changes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 10:22:10 +11:00
Alan Stern e041c68341 [PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe.  There is no
protection against entries being added to or removed from a chain while the
chain is in use.  The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

	"Blocking" chains are always called from a process context
	and the callout routines are allowed to sleep;

	"Atomic" chains can be called from an atomic context and
	the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.  Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name).  New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain.  The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed.  For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections.  (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with.  For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem.  Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain.  (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications.  None
of them use the raw API, so for the moment it is only a placeholder.

  ATOMIC CHAINS
  -------------
arch/i386/kernel/traps.c:		i386die_chain
arch/ia64/kernel/traps.c:		ia64die_chain
arch/powerpc/kernel/traps.c:		powerpc_die_chain
arch/sparc64/kernel/traps.c:		sparc64die_chain
arch/x86_64/kernel/traps.c:		die_chain
drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
kernel/panic.c:				panic_notifier_list
kernel/profile.c:			task_free_notifier
net/bluetooth/hci_core.c:		hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
net/ipv6/addrconf.c:			inet6addr_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
net/netlink/af_netlink.c:		netlink_chain

  BLOCKING CHAINS
  ---------------
arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
arch/s390/kernel/process.c:		idle_chain
arch/x86_64/kernel/process.c		idle_notifier
drivers/base/memory.c:			memory_chain
drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
drivers/macintosh/adb.c:		adb_client_list
drivers/macintosh/via-pmu.c		sleep_notifier_list
drivers/macintosh/via-pmu68k.c		sleep_notifier_list
drivers/macintosh/windfarm_core.c	wf_client_list
drivers/usb/core/notify.c		usb_notifier_list
drivers/video/fbmem.c			fb_notifier_list
kernel/cpu.c				cpu_chain
kernel/module.c				module_notify_list
kernel/profile.c			munmap_notifier
kernel/profile.c			task_exit_notifier
kernel/sys.c				reboot_notifier_list
net/core/dev.c				netdev_chain
net/decnet/dn_dev.c:			dnaddr_chain
net/ipv4/devinet.c:			inetaddr_chain

It's possible that some of these classifications are wrong.  If they are,
please let us know or submit a patch to fix them.  Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar 8f17d3a504 [PATCH] lightweight robust futexes updates
- fix: initialize the robust list(s) to NULL in copy_process.

- doc update

- cleanup: rename _inuser to _inatomic

- __user cleanups and other small cleanups

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar e9056f13bf [PATCH] lightweight robust futexes: arch defaults
This patchset provides a new (written from scratch) implementation of robust
futexes, called "lightweight robust futexes".  We believe this new
implementation is faster and simpler than the vma-based robust futex solutions
presented before, and we'd like this patchset to be adopted in the upstream
kernel.  This is version 1 of the patchset.

  Background
  ----------

What are robust futexes?  To answer that, we first need to understand what
futexes are: normal futexes are special types of locks that in the
noncontended case can be acquired/released from userspace without having to
enter the kernel.

A futex is in essence a user-space address, e.g.  a 32-bit lock variable
field.  If userspace notices contention (the lock is already owned and someone
else wants to grab it too) then the lock is marked with a value that says
"there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to
wait for the other guy to release it.  The kernel creates a 'futex queue'
internally, so that it can later on match up the waiter with the waker -
without them having to know about each other.  When the owner thread releases
the futex, it notices (via the variable value) that there were waiter(s)
pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up.  Once all
waiters have taken and released the lock, the futex is again back to
'uncontended' state, and there's no in-kernel state associated with it.  The
kernel completely forgets that there ever was a futex at that address.  This
method makes futexes very lightweight and scalable.

"Robustness" is about dealing with crashes while holding a lock: if a process
exits prematurely while holding a pthread_mutex_t lock that is also shared
with some other process (e.g.  yum segfaults while holding a pthread_mutex_t,
or yum is kill -9-ed), then waiters for that lock need to be notified that the
last owner of the lock exited in some irregular way.

To solve such types of problems, "robust mutex" userspace APIs were created:
pthread_mutex_lock() returns an error value if the owner exits prematurely -
and the new owner can decide whether the data protected by the lock can be
recovered safely.

There is a big conceptual problem with futex based mutexes though: it is the
kernel that destroys the owner task (e.g.  due to a SEGFAULT), but the kernel
cannot help with the cleanup: if there is no 'futex queue' (and in most cases
there is none, futexes being fast lightweight locks) then the kernel has no
information to clean up after the held lock!  Userspace has no chance to clean
up after the lock either - userspace is the one that crashes, so it has no
opportunity to clean up.  Catch-22.

In practice, when e.g.  yum is kill -9-ed (or segfaults), a system reboot is
needed to release that futex based lock.  This is one of the leading
bugreports against yum.

To solve this problem, 'Robust Futex' patches were created and presented on
lkml: the one written by Todd Kneisel and David Singleton is the most advanced
at the moment.  These patches all tried to extend the futex abstraction by
registering futex-based locks in the kernel - and thus give the kernel a
chance to clean up.

E.g.  in David Singleton's robust-futex-6.patch, there are 3 new syscall
variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER.
The kernel attaches such robust futexes to vmas (via
vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are
searched to see whether they have a robust_head set.

Lots of work went into the vma-based robust-futex patch, and recently it has
improved significantly, but unfortunately it still has two fundamental
problems left:

 - they have quite complex locking and race scenarios.  The vma-based
   patches had been pending for years, but they are still not completely
   reliable.

 - they have to scan _every_ vma at sys_exit() time, per thread!

The second disadvantage is a real killer: pthread_exit() takes around 1
microsecond on Linux, but with thousands (or tens of thousands) of vmas every
pthread_exit() takes a millisecond or more, also totally destroying the CPU's
L1 and L2 caches!

This is very much noticeable even for normal process sys_exit_group() calls:
the kernel has to do the vma scanning unconditionally!  (this is because the
kernel has no knowledge about how many robust futexes there are to be cleaned
up, because a robust futex might have been registered in another task, and the
futex variable might have been simply mmap()-ed into this process's address
space).

This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than
that: the overhead makes robust futexes impractical for any type of generic
Linux distribution.

So it became clear to us, something had to be done.  Last week, when Thomas
Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he
found a handful of new races and we were talking about it and were analyzing
the situation.  At that point a fundamentally different solution occured to
me.  This patchset (written in the past couple of days) implements that new
solution.  Be warned though - the patchset does things we normally dont do in
Linux, so some might find the approach disturbing.  Parental advice
recommended ;-)

  New approach to robust futexes
  ------------------------------

At the heart of this new approach there is a per-thread private list of robust
locks that userspace is holding (maintained by glibc) - which userspace list
is registered with the kernel via a new syscall [this registration happens at
most once per thread lifetime].  At do_exit() time, the kernel checks this
user-space list: are there any robust futex locks to be cleaned up?

In the common case, at do_exit() time, there is no list registered, so the
cost of robust futexes is just a simple current->robust_list != NULL
comparison.  If the thread has registered a list, then normally the list is
empty.  If the thread/process crashed or terminated in some incorrect way then
the list might be non-empty: in this case the kernel carefully walks the list
[not trusting it], and marks all locks that are owned by this thread with the
FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any).

The list is guaranteed to be private and per-thread, so it's lockless.  There
is one race possible though: since adding to and removing from the list is
done after the futex is acquired by glibc, there is a few instructions window
for the thread (or process) to die there, leaving the futex hung.  To protect
against this possibility, userspace (glibc) also maintains a simple per-thread
'list_op_pending' field, to allow the kernel to clean up if the thread dies
after acquiring the lock, but just before it could have added itself to the
list.  Glibc sets this list_op_pending field before it tries to acquire the
futex, and clears it after the list-add (or list-remove) has finished.

That's all that is needed - all the rest of robust-futex cleanup is done in
userspace [just like with the previous patches].

Ulrich Drepper has implemented the necessary glibc support for this new
mechanism, which fully enables robust mutexes.  (Ulrich plans to commit these
changes to glibc-HEAD later today.)

Key differences of this userspace-list based approach, compared to the vma
based method:

 - it's much, much faster: at thread exit time, there's no need to loop
   over every vma (!), which the VM-based method has to do.  Only a very
   simple 'is the list empty' op is done.

 - no VM changes are needed - 'struct address_space' is left alone.

 - no registration of individual locks is needed: robust mutexes dont need
   any extra per-lock syscalls.  Robust mutexes thus become a very lightweight
   primitive - so they dont force the application designer to do a hard choice
   between performance and robustness - robust mutexes are just as fast.

 - no per-lock kernel allocation happens.

 - no resource limits are needed.

 - no kernel-space recovery call (FUTEX_RECOVER) is needed.

 - the implementation and the locking is "obvious", and there are no
   interactions with the VM.

  Performance
  -----------

I have benchmarked the time needed for the kernel to process a list of 1
million (!) held locks, using the new method [on a 2GHz CPU]:

 - with FUTEX_WAIT set [contended mutex]: 130 msecs
 - without FUTEX_WAIT set [uncontended mutex]: 30 msecs

I have also measured an approach where glibc does the lock notification [which
it currently does for !pshared robust mutexes], and that took 256 msecs -
clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do.

(1 million held locks are unheard of - we expect at most a handful of locks to
be held at a time.  Nevertheless it's nice to know that this approach scales
nicely.)

  Implementation details
  ----------------------

The patch adds two new syscalls: one to register the userspace list, and one
to query the registered list pointer:

 asmlinkage long
 sys_set_robust_list(struct robust_list_head __user *head,
                     size_t len);

 asmlinkage long
 sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
                     size_t __user *len_ptr);

List registration is very fast: the pointer is simply stored in
current->robust_list.  [Note that in the future, if robust futexes become
widespread, we could extend sys_clone() to register a robust-list head for new
threads, without the need of another syscall.]

So there is virtually zero overhead for tasks not using robust futexes, and
even for robust futex users, there is only one extra syscall per thread
lifetime, and the cleanup operation, if it happens, is fast and
straightforward.  The kernel doesnt have any internal distinction between
robust and normal futexes.

If a futex is found to be held at exit time, the kernel sets the highest bit
of the futex word:

	#define FUTEX_OWNER_DIED        0x40000000

and wakes up the next futex waiter (if any). User-space does the rest of
the cleanup.

Otherwise, robust futexes are acquired by glibc by putting the TID into the
futex field atomically.  Waiters set the FUTEX_WAITERS bit:

	#define FUTEX_WAITERS           0x80000000

and the remaining bits are for the TID.

  Testing, architecture support
  -----------------------------

I've tested the new syscalls on x86 and x86_64, and have made sure the parsing
of the userspace list is robust [ ;-) ] even if the list is deliberately
corrupted.

i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the
new glibc code (on x86_64 and i386), and it works for his robust-mutex
testcases.

All other architectures should build just fine too - but they wont have the
new syscalls yet.

Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline
function before writing up the syscalls (that function returns -ENOSYS right
now).

This patch:

Add placeholder futex_atomic_cmpxchg_inuser() implementations to every
architecture that supports futexes.  It returns -ENOSYS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
KAMEZAWA Hiroyuki 659e35051b [PATCH] unify pfn_to_page: powerpc pfn_to_page
PowerPC can use generic ones.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
Paul Mackerras 9618edab82 powerpc: Fix event-scan code for 32-bit CHRP
On CHRP machines we are supposed to call into firmware (RTAS)
periodically, to give it a chance to check for errors and other
events.  Under ppc we had some special code in timer_interrupt
to do this, but that didn't get transferred over to arch/powerpc.
Instead, we use an array of timer_list structs, one per CPU,
and use add_timer_on to make sure each one gets called on the
appropriate CPU.

With this we can remove the heartbeat_* elements of the ppc_md
struct.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 21:48:57 +11:00
Paul Mackerras fbd7740fdf powerpc: Simplify pSeries idle loop
Since pSeries only wants to do something different in the idle loop when
there is no work to do, we can simplify the code by implementing
ppc_md.power_save functions instead of complete idle loops.  There are
two versions: one for shared-processor partitions and one for dedicated-
processor partitions.

With this we also do a cede_processor() call on dedicated processor
partitions if the poll_pending() call indicates that the hypervisor
has work it wants to do.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 15:06:20 +11:00
Paul Mackerras a0652fc9a2 powerpc: Unify the 32 and 64 bit idle loops
This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
loops.  It brings over the concept of having a ppc_md.power_save
function from 32-bit to ARCH=powerpc, which lets us get rid of
native_idle().  With this we will also be able to simplify the idle
handling for pSeries and cell.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 15:03:03 +11:00
Anton Blanchard 4df20460a3 [PATCH] powerpc: Allow non zero boot cpuids
We currently have a hack to flip the boot cpu and its secondary thread
to logical cpuid 0 and 1. This means the logical - physical mapping will
differ depending on which cpu is boot cpu. This is most apparent on
kexec, where we might kexec on any cpu and therefore change the mapping
from boot to boot.

The patch below does a first pass early on to work out the logical cpuid
of the boot thread. We then fix up some paca structures to match.

Ive also removed the boot_cpuid_phys variable for ppc64, to be
consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:48 +11:00