linux-2.6

Archived

Author	SHA1	Message	Date
Christoph Hellwig	92198f7eaa	[PATCH] pass iocb to dio_iodone_t XFS will have to look at iocb->private to fix aio+dio. No other filesystem is using the blockdev_direct_IO* end_io callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	3e30148c3d	[PATCH] Keys: Make request-key create an authorisation key The attached patch makes the following changes: (1) There's a new special key type called ".request_key_auth". This is an authorisation key for when one process requests a key and another process is started to construct it. This type of key cannot be created by the user; nor can it be requested by kernel services. Authorisation keys hold two references: (a) Each refers to a key being constructed. When the key being constructed is instantiated the authorisation key is revoked, rendering it of no further use. (b) The "authorising process". This is either: (i) the process that called request_key(), or: (ii) if the process that called request_key() itself had an authorisation key in its session keyring, then the authorising process referred to by that authorisation key will also be referred to by the new authorisation key. This means that the process that initiated a chain of key requests will authorise the lot of them, and will, by default, wind up with the keys obtained from them in its keyrings. (2) request_key() creates an authorisation key which is then passed to /sbin/request-key in as part of a new session keyring. (3) When request_key() is searching for a key to hand back to the caller, if it comes across an authorisation key in the session keyring of the calling process, it will also search the keyrings of the process specified therein and it will use the specified process's credentials (fsuid, fsgid, groups) to do that rather than the calling process's credentials. This allows a process started by /sbin/request-key to find keys belonging to the authorising process. (4) A key can be read, even if the process executing KEYCTL_READ doesn't have direct read or search permission if that key is contained within the keyrings of a process specified by an authorisation key found within the calling process's session keyring, and is searchable using the credentials of the authorising process. This allows a process started by /sbin/request-key to read keys belonging to the authorising process. (5) The magic KEY_SPEC__KEYRING key IDs when passed to KEYCTL_INSTANTIATE or KEYCTL_NEGATE will specify a keyring of the authorising process, rather than the process doing the instantiation. (6) One of the process keyrings can be nominated as the default to which request_key() should attach new keys if not otherwise specified. This is done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_ constants. The current setting can also be read using this call. (7) request_key() is partially interruptible. If it is waiting for another process to finish constructing a key, it can be interrupted. This permits a request-key cycle to be broken without recourse to rebooting. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	7888e7ff4e	[PATCH] Keys: Pass session keyring to call_usermodehelper() The attached patch makes it possible to pass a session keyring through to the process spawned by call_usermodehelper(). This allows patch 3/3 to pass an authorisation key through to /sbin/request-key, thus permitting better access controls when doing just-in-time key creation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
David Howells	76d8aeabfe	[PATCH] keys: Discard key spinlock and use RCU for key payload The attached patch changes the key implementation in a number of ways: (1) It removes the spinlock from the key structure. (2) The key flags are now accessed using atomic bitops instead of write-locking the key spinlock and using C bitwise operators. The three instantiation flags are dealt with with the construction semaphore held during the request_key/instantiate/negate sequence, thus rendering the spinlock superfluous. The key flags are also now bit numbers not bit masks. (3) The key payload is now accessed using RCU. This permits the recursive keyring search algorithm to be simplified greatly since no locks need be taken other than the usual RCU preemption disablement. Searching now does not require any locks or semaphores to be held; merely that the starting keyring be pinned. (4) The keyring payload now includes an RCU head so that it can be disposed of by call_rcu(). This requires that the payload be copied on unlink to prevent introducing races in copy-down vs search-up. (5) The user key payload is now a structure with the data following it. It includes an RCU head like the keyring payload and for the same reason. It also contains a data length because the data length in the key may be changed on another CPU whilst an RCU protected read is in progress on the payload. This would then see the supposed RCU payload and the on-key data length getting out of sync. I'm tempted to drop the key's datalen entirely, except that it's used in conjunction with quota management and so is a little tricky to get rid of. (6) Update the keys documentation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
David S. Miller	a8acfbac75	[TCP]: Need to declare 'tcp_reno' in net/tcp.h Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 23:45:02 -07:00
Thomas Graf	d675c989ed	[PKT_SCHED]: Packet classification based on textsearch (ematch) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:58 -07:00
Thomas Graf	3fc7e8a6d8	[NET]: skb_find_text() - Find a text pattern in skb data Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:17 -07:00
Thomas Graf	677e90eda3	[NET]: Zerocopy sequential reading of skb data Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' / if abort then / abort read if we don't wait for * skb_seq_read() to return 0 / skb_abort_seq_read(&state) return endif / not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:51 -07:00
Thomas Graf	6408f79cce	[LIB]: Naive finite state machine based textsearch A finite state machine consists of n states (struct ts_fsm_token) representing the pattern as a finite automation. The data is read sequentially on a octet basis. Every state token specifies the number of recurrences and the type of value accepted which can be either a specific character or ctype based set of characters. The available type of recurrences include 1, (0\|1), [0 n], and [1 n]. The algorithm differs between strict/non-strict mode specyfing whether the pattern has to start at the first octect. Strict mode is enabled by default and can be disabled by inserting TS_FSM_HEAD_IGNORE as the first token in the chain. The runtime performance of the algorithm should be around O(n), however while in strict mode the average runtime can be better. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:16 -07:00
Thomas Graf	2de4ff7bd6	[LIB]: Textsearch infrastructure. The textsearch infrastructure provides text searching facitilies for both linear and non-linear data. Individual search algorithms are implemented in modules and chosen by the user. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:49:30 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	51b0bdedb8	[NET]: Separate two usages of netdev_max_backlog. Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:14:40 -07:00
Stephen Hemminger	31aa02c53c	[NET]: Eliminate netif_rx massive packet drops. Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:12:48 -07:00
Stephen Hemminger	34008d8c63	[NET]: Remove obsolete netif_rx congestion sensing mechanism. Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:10:00 -07:00
Stephen Hemminger	c1ebcdb8c4	[NET]: Remove obsolete fastroute stats. Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:08:59 -07:00
Linus Torvalds	16822e6205	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6	2005-06-23 17:27:24 -07:00
David Mosberger-Tang	e608a8072b	[IA64] Fix pfn_to_nid() so the kernel compiles again for !CONFIG_DISCONTIGMEM. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-06-23 14:52:51 -07:00
Stephen Hemminger	056ede6cfa	[TCP]: Report congestion control algorithm in tcp_diag. Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:21:28 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00
Adrian Bunk	4749f32da9	[PATCH] better USB_MON dependencies This makes the USB_MON less confusing. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 10:04:15 -07:00
Linus Torvalds	24665cd00d	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/paulus/ppc64-2.6	2005-06-23 09:49:55 -07:00
Alexey Dobriyan	bfb07599da	[PATCH] Introduce tty_unregister_ldisc() It's a bit strange to see tty_register_ldisc call in modules' exit functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:35 -07:00
Benjamin LaHaise	c43dc2fd88	[PATCH] aio: make wait_queue ->task ->private In the upcoming aio_down patch, it is useful to store a private data pointer in the kiocb's wait_queue. Since we provide our own wake up function and do not require the task_struct pointer, it makes sense to convert the task pointer into a generic private pointer. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:34 -07:00
Christoph Hellwig	9a59f452ab	[PATCH] remove <linux/xattr_acl.h> This file duplicates <linux/posix_acl_xattr.h>, using slightly different names. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Hellwig	f9fd27a253	[PATCH] acl endianess annotations Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Lameter	45778ca819	[PATCH] Remove f_error field from struct file The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by: Christoph Lameter <christoph@lameter.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Arnd Bergmann	bb93e3a52f	[PATCH] block: add unlocked_ioctl support for block devices This patch allows block device drivers to convert their ioctl functions to unlocked_ioctl() like character devices and other subsystems. All functions that were called with the BKL held before are still used that way, but I would not be surprised if it could be removed from the ioctl functions in drivers/block/ioctl.c themselves. As a side note, I found that compat_blkdev_ioctl() acquires the BKL as well, which looks like a bug. I have checked that every user of disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so it could easily be removed from compat_blkdev_ioctl(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:32 -07:00
Stephen Rothwell	0d77e5a2c2	[PATCH] compat: introduce compat_time_t This patch is based on work by Carlos O'Donell and Matthew Wilcox. It introduces/updates the compat_time_t type and uses it for compat siginfo structures. I have built this on ppc64 and x86_64. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:32 -07:00
Daniel Ritz	fa912bcb06	[PATCH] yenta TI: turn off interrupts during card power-on #2 - make boot-up card recognition more reliable (ie. redo interrogation always if there is no valid 'card inserted' state) (and yes, i saw it happening on an o2micro controller that both CB_CBARD and CB_16BITCARD bits were set at the same time) - also redo interrogation before probing the ISA interrupts. it's safer to do the probing with the socket in a clean state. - make card insert detect more reliable. yenta_get_status() now returns SS_PENDING as long as the card is not completley inserted and one of the voltage bits is set. also !CB_CBARD doesn't mean CB_16BITCARD. there is CB_NOTACARD as well, so make an explicit check for CB_16BITCARD. - for TI bridges: disable IRQs during power-on. in all-serial and tied interrupt mode the interrupts are always disabled for single-slot controllers. for two-slot contollers the disabling is only done when the other slot is empty. to force disabling there is a new module parameter now: pwr_irqs_off=Y (which is a regression for working setups. that's why it's an option, only use when required) - modparm to disable ISA interrupt probing (isa_probe, defaults to on) - remove unneeded code/cleanups (ie. merge yenta_events() into yenta_interrupts()) Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:31 -07:00
Peter Osterlund	46c271bedd	[PATCH] Improve CD/DVD packet driver write performance This patch improves write performance for the CD/DVD packet writing driver. The logic for switching between reading and writing has been changed so that streaming writes are no longer interrupted by read requests. Signed-off-by: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:30 -07:00
Jan Beulich	11c80c8367	[PATCH] adjust per_cpu definition in non-SMP case Fix (in the architectures I'm actually building for) the UP definition of per_cpu so that the cpu specified may be any expression, not just an identifier or a suffix expression. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:28 -07:00
Yoav Zach	ef3daeda7b	[PATCH] Don't force O_LARGEFILE for 32 bit processes on ia64 In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This is problematic for execution of 32 bit processes, which are not largefile aware, either by SW emulation or by HW execution. For such processes, the problem is two-fold: 1) When trying to open a file that is larger than 4G the operation should fail, but it's not 2) Writing to offset larger than 4G should fail, but it's not The proposed patch takes advantage of the way 32 bit processes are identified in ia64 systems. Such processes have PER_LINUX32 for their personality. With the patch, the ia64 kernel will not enforce the O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior for all other architectures remains unchanged. Signed-off-by: Yoav Zach <yoav.zach@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:28 -07:00
Alan Cox	d6e7114481	[PATCH] setuid core dump Add a new `suid_dumpable' sysctl: This value can be used to query and set the core dump mode for setuid or otherwise protected/tainted binaries. The modes are 0 - (default) - traditional behaviour. Any process which has changed privilege levels or is execute only will not be dumped 1 - (debug) - all processes dump core when possible. The core dump is owned by the current user and no security is applied. This is intended for system debugging situations only. Ptrace is unchecked. 2 - (suidsafe) - any binary which normally would not be dumped is dumped readable by root only. This allows the end user to remove such a dump but not access it directly. For security reasons core dumps in this mode will not overwrite one another or other files. This mode is appropriate when adminstrators are attempting to debug problems in a normal environment. (akpm: > > +EXPORT_SYMBOL(suid_dumpable); > > EXPORT_SYMBOL_GPL? No problem to me. > > if (current->euid == current->uid && current->egid == current->gid) > > current->mm->dumpable = 1; > > Should this be SUID_DUMP_USER? Actually the feedback I had from last time was that the SUID_ defines should go because its clearer to follow the numbers. They can go everywhere (and there are lots of places where dumpable is tested/used as a bool in untouched code) > Maybe this should be renamed to `dump_policy' or something. Doing that > would help us catch any code which isn't using the #defines, too. Fair comment. The patch was designed to be easy to maintain for Red Hat rather than for merging. Changing that field would create a gigantic diff because it is used all over the place. ) Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:26 -07:00
Prasanna S Panchamukhi	ea32c65cc2	[PATCH] kprobes: Temporary disarming of reentrant probe In situations where a kprobes handler calls a routine which has a probe on it, then kprobes_handler() disarms the new probe forever. This patch removes the above limitation by temporarily disarming the new probe. When the another probe hits while handling the old probe, the kprobes_handler() saves previous kprobes state and handles the new probe without calling the new kprobes registered handlers. kprobe_post_handler() restores back the previous kprobes state and the normal execution continues. However on x86_64 architecture, re-rentrancy is provided only through pre_handler(). If a routine having probe is referenced through post_handler(), then the probes on that routine are disarmed forever, since the exception stack is gets changed after the processor single steps the instruction of the new probe. This patch includes generic changes to support temporary disarming on reentrancy of probes. Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:24 -07:00
Anil S Keshavamurthy	1674eafcbd	[PATCH] Kprobes IA64: cmp ctype unc support The current Kprobes when patching the original instruction with the break instruction tries to retain the original qualifying predicate(qp), however for cmp.crel.ctype where ctype == unc, which is a special instruction always needs to be executed irrespective of qp. Hence, if the instruction we are patching is of this type, then we should not copy the original qp to the break instruction, this is because we always want the break fault to happen so that we can emulate the instruction. This patch is based on the feedback given by David Mosberger Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Rusty Lynch	8bc76772ad	[PATCH] Kprobes ia64 cleanup A cleanup of the ia64 kprobes implementation such that all of the bundle manipulation logic is concentrated in arch_prepare_kprobe(). With the current design for kprobes, the arch specific code only has a chance to return failure inside the arch_prepare_kprobe() function. This patch moves all of the work that was happening in arch_copy_kprobe() and most of the work that was happening in arch_arm_kprobe() into arch_prepare_kprobe(). By doing this we can add further robustness checks in arch_arm_kprobe() and refuse to insert kprobes that will cause problems. Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy	cd2675bf65	[PATCH] Kprobes/IA64: support kprobe on branch/call instructions This patch is required to support kprobe on branch/call instructions. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy	b2761dc262	[PATCH] Kprobes/IA64: architecture specific JProbes support This patch adds IA64 architecture specific JProbes support on top of Kprobes Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy	fd7b231ff9	[PATCH] Kprobes/IA64: arch specific handling This is an IA64 arch specific handling of Kprobes Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy	7213b25218	[PATCH] Kprobes/IA64: kdebug die notification mechanism As many of you know that kprobes exist in the main line kernel for various architecture including i386, x86_64, ppc64 and sparc64. Attached patches following this mail are a port of Kprobes and Jprobes for IA64. I have tesed this patches for kprobes and Jprobes and this seems to work fine. I have tested this patch by inserting kprobes on various slots and various templates including various types of branch instructions. I have also tested this patch using the tool http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the kprobes for IA64 works great. Here is list of TODO things and pathes for the same will appear soon. 1) Support kprobes on "mov r1=ip" type of instruction 2) Support Kprobes and Jprobes to exist on the same address 3) Support Return probes 3) Architecture independent cleanup of kprobes This patch adds the kdebug die notification mechanism needed by Kprobes. For break instruction on Branch type slot, imm21 is ignored and value zero is placed in IIM register, hence we need to handle kprobes for switch case zero. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> From: Rusty Lynch <rusty.lynch@intel.com> At the point in traps.c where we recieve a break with a zero value, we can not say if the break was a result of a kprobe or some other debug facility. This simple patch changes the informational string to a more correct "break 0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes patches that were just recently included for the next mm cut. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Hien Nguyen	0aa55e4d7d	[PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task This patch moves the lock/unlock of the arch specific kprobe_flush_task() to the non-arch specific kprobe_flusk_task(). Signed-off-by: Hien Nguyen <hien@us.ibm.com> Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Rusty Lynch	7e1048b11c	[PATCH] Move kprobe [dis]arming into arch specific code The architecture independent code of the current kprobes implementation is arming and disarming kprobes at registration time. The problem is that the code is assuming that arming and disarming is a just done by a simple write of some magic value to an address. This is problematic for ia64 where our instructions look more like structures, and we can not insert break points by just doing something like: p->addr = BREAKPOINT_INSTRUCTION; The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent functions: void arch_arm_kprobe(struct kprobe p) void arch_disarm_kprobe(struct kprobe *p) and then adds the new functions for each of the architectures that already implement kprobes (spar64/ppc64/i386/x86_64). I thought arch_[dis]arm_kprobe was the most descriptive of what was really happening, but each of the architectures already had a disarm_kprobe() function that was really a "disarm and do some other clean-up items as needed when you stumble across a recursive kprobe." So... I took the liberty of changing the code that was calling disarm_kprobe() to call arch_disarm_kprobe(), and then do the cleanup in the block of code dealing with the recursive kprobe case. So far this patch as been tested on i386, x86_64, and ppc64, but still needs to be tested in sparc64. Signed-off-by: Rusty Lynch <rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Rusty Lynch	73649dab0f	[PATCH] x86_64 specific function return probes The following patch adds the x86_64 architecture specific implementation for function return probes. Function return probes is a mechanism built on top of kprobes that allows a caller to register a handler to be called when a given function exits. For example, to instrument the return path of sys_mkdir: static int sys_mkdir_exit(struct kretprobe_instance i, struct pt_regs regs) { printk("sys_mkdir exited\n"); return 0; } static struct kretprobe return_probe = { .handler = sys_mkdir_exit, }; <inside setup function> return_probe.kp.addr = (kprobe_opcode_t ) kallsyms_lookup_name("sys_mkdir"); if (register_kretprobe(&return_probe)) { printk(KERN_DEBUG "Unable to register return probe!\n"); / do error path / } <inside cleanup function> unregister_kretprobe(&return_probe); The way this works is that: At system initialization time, kernel/kprobes.c installs a kprobe on a function called kretprobe_trampoline() that is implemented in the arch/x86_64/kernel/kprobes.c (More on this later) * When a return probe is registered using register_kretprobe(), kernel/kprobes.c will install a kprobe on the first instruction of the targeted function with the pre handler set to arch_prepare_kretprobe() which is implemented in arch/x86_64/kernel/kprobes.c. * arch_prepare_kretprobe() will prepare a kretprobe instance that stores: - nodes for hanging this instance in an empty or free list - a pointer to the return probe - the original return address - a pointer to the stack address With all this stowed away, arch_prepare_kretprobe() then sets the return address for the targeted function to a special trampoline function called kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c * The kprobe completes as normal, with control passing back to the target function that executes as normal, and eventually returns to our trampoline function. * Since a kprobe was installed on kretprobe_trampoline() during system initialization, control passes back to kprobes via the architecture specific function trampoline_probe_handler() which will lookup the instance in an hlist maintained by kernel/kprobes.c, and then call the handler function. * When trampoline_probe_handler() is done, the kprobes infrastructure single steps the original instruction (in this case just a top), and then calls trampoline_post_handler(). trampoline_post_handler() then looks up the instance again, puts the instance back on the free list, and then makes a long jump back to the original return instruction. So to recap, to instrument the exit path of a function this implementation will cause four interruptions: - A breakpoint at the very beginning of the function allowing us to switch out the return address - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) - A breakpoint in the trampoline function where our instrumented function returned to - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Hien Nguyen	b94cce926b	[PATCH] kprobes: function-return probes This patch adds function-return probes to kprobes for the i386 architecture. This enables you to establish a handler to be run when a function returns. 1. API Two new functions are added to kprobes: int register_kretprobe(struct kretprobe rp); void unregister_kretprobe(struct kretprobe rp); 2. Registration and unregistration 2.1 Register To register a function-return probe, the user populates the following fields in a kretprobe object and calls register_kretprobe() with the kretprobe address as an argument: kp.addr - the function's address handler - this function is run after the ret instruction executes, but before control returns to the return address in the caller. maxactive - The maximum number of instances of the probed function that can be active concurrently. For example, if the function is non- recursive and is called with a spinlock or mutex held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. maxactive is used to determine how many kretprobe_instance objects to allocate for this particular probed function. If maxactive <= 0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 * NR_CPUS) else maxactive=NR_CPUS) For example: struct kretprobe rp; rp.kp.addr = /* entrypoint address / rp.handler = /return probe handler / rp.maxactive = / e.g., 1 or NR_CPUS or 0, see the above explanation */ register_kretprobe(&rp); The following field may also be of interest: nmissed - Initialized to zero when the function-return probe is registered, and incremented every time the probed function is entered but there is no kretprobe_instance object available for establishing the function-return probe (i.e., because maxactive was set too low). 2.2 Unregister To unregiter a function-return probe, the user calls unregister_kretprobe() with the same kretprobe object as registered previously. If a probed function is running when the return probe is unregistered, the function will return as expected, but the handler won't be run. 3. Limitations 3.1 This patch supports only the i386 architecture, but patches for x86_64 and ppc64 are anticipated soon. 3.2 Return probes operates by replacing the return address in the stack (or in a known register, such as the lr register for ppc). This may cause __builtin_return_address(0), when invoked from the return-probed function, to return the address of the return-probes trampoline. 3.3 This implementation uses the "Multiprobes at an address" feature in 2.6.12-rc3-mm3. 3.4 Due to a limitation in multi-probes, you cannot currently establish a return probe and a jprobe on the same function. A patch to remove this limitation is being tested. This feature is required by SystemTap (http://sourceware.org/systemtap), and reflects ideas contributed by several SystemTap developers, including Will Cohen and Ananth Mavinakayanahalli. Signed-off-by: Hien Nguyen <hien@us.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Christoph Hellwig	84de856ed3	[PATCH] quota: consolidate code surrounding vfs_quota_on_mount Move some code duplicated in both callers into vfs_quota_on_mount Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:20 -07:00
Neil Horman	ac20427ef6	[PATCH] add check to /proc/devices read routines Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads of /proc/devices from spilling over the provided page if more than 4096 bytes of string data are generated from all the registered character and block devices in a system Signed-off-by: Neil Horman <nhorman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:19 -07:00
Jesper Juhl	dcd497f99a	[PATCH] streamline preempt_count type across archs The preempt_count member of struct thread_info is currently either defined as int, unsigned int or __s32 depending on arch. This patch makes the type of preempt_count an int on all archs. Having preempt_count be an unsigned type prevents the catching of preempt_count < 0 bugs, and using int on some archs and __s32 on others is not exactely "neat" - much nicer when it's just int all over. A previous version of this patch was already ACK'ed by Robert Love, and the only change in this version of the patch compared to the one he ACK'ed is that this one also makes sure the preempt_count member is consistently commented. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:19 -07:00
Nick Piggin	35a82d1a53	[PATCH] optimise loop driver a bit Looks like locking can be optimised quite a lot. Increase lock widths slightly so lo_lock is taken fewer times per request. Also it was quite trivial to cover lo_pending with that lock, and remove the atomic requirement. This also makes memory ordering explicitly correct, which is nice (not that I particularly saw any mem ordering bugs). Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K block size, 4K page size). System is 2 socket Xeon with HT (4 thread). intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh Before: 0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k After: 0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Paulo Marques	543537bd92	[PATCH] create a kstrdup library function This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Alexander Viro	991114c6fa	[PATCH] fix for prune_icache()/forced final iput() races Based on analysis and a patch from Russ Weight <rweight@us.ibm.com> There is a race condition that can occur if an inode is allocated and then released (using iput) during the ->fill_super functions. The race condition is between kswapd and mount. For most filesystems this can only happen in an error path when kswapd is running concurrently. For isofs, however, the error can occur in a more common code path (which is how the bug was found). The logic here is "we want final iput() to free inode now instead of letting it sit in cache if fs is going down or had not quite come up". The problem is with kswapd seeing such inodes in the middle of being killed and happily taking over. The clean solution would be to tell kswapd to leave those inodes alone and let our final iput deal with them. I.e. add a new flag (I_FORCED_FREEING), set it before write_inode_now() there and make prune_icache() leave those alone. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:17 -07:00
Oleg Nesterov	fd450b7318	[PATCH] timers: introduce try_to_del_timer_sync() This patch splits del_timer_sync() into 2 functions. The new one, try_to_del_timer_sync(), returns -1 when it hits executing timer. It can be used in interrupt context, or when the caller hold locks which can prevent completion of the timer's handler. NOTE. Currently it can't be used in interrupt context in UP case, because ->running_timer is used only with CONFIG_SMP. Should the need arise, it is possible to kill #ifdef CONFIG_SMP in set_running_timer(), it is cheap. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Oleg Nesterov	55c888d6d0	[PATCH] timers fixes/improvements This patch tries to solve following problems: 1. del_timer_sync() is racy. The timer can be fired again after del_timer_sync have checked all cpus and before it will recheck timer_pending(). 2. It has scalability problems. All cpus are scanned to determine if the timer is running on that cpu. With this patch del_timer_sync is O(1) and no slower than plain del_timer(pending_timer), unless it has to actually wait for completion of the currently running timer. The only restriction is that the recurring timer should not use add_timer_on(). 3. The timers are not serialized wrt to itself. If CPU_0 does mod_timer(jiffies+1) while the timer is currently running on CPU 1, it is quite possible that local interrupt on CPU_0 will start that timer before it finished on CPU_1. 4. The timers locking is suboptimal. __mod_timer() takes 3 locks at once and still requires wmb() in del_timer/run_timers. The new implementation takes 2 locks sequentially and does not need memory barriers. Currently ->base != NULL means that the timer is pending. In that case ->base.lock is used to lock the timer. __mod_timer also takes timer->lock because ->base can be == NULL. This patch uses timer->entry.next != NULL as indication that the timer is pending. So it does __list_del(), entry->next = NULL instead of list_del() when the timer is deleted. The ->base field is used for hashed locking only, it is initialized in init_timer() which sets ->base = per_cpu(tvec_bases). When the tvec_bases.lock is locked, it means that all timers which are tied to this base via timer->base are locked, and the base itself is locked too. So __run_timers/migrate_timers can safely modify all timers which could be found on ->tvX lists (pending timers). When the timer's base is locked, and the timer removed from ->entry list (which means that _run_timers/migrate_timers can't see this timer), it is possible to set timer->base = NULL and drop the lock: the timer remains locked. This patch adds lock_timer_base() helper, which waits for ->base != NULL, locks the ->base, and checks it is still the same. __mod_timer() schedules the timer on the local CPU and changes it's base. However, it does not lock both old and new bases at once. It locks the timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base, and adds this timer. This simplifies the code, because AB-BA deadlock is not possible. __mod_timer() also ensures that the timer's base is not changed while the timer's handler is running on the old base. __run_timers(), del_timer() do not change ->base anymore, they only clear pending flag. So del_timer_sync() can test timer->base->running_timer == timer to detect whether it is running or not. We don't need timer_list->lock anymore, this patch kills it. We also don't need barriers. del_timer() and __run_timers() used smp_wmb() before clearing timer's pending flag. It was needed because __mod_timer() did not lock old_base if the timer is not pending, so __mod_timer()->list_add() could race with del_timer()->list_del(). With this patch these functions are serialized through base->lock. One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch adds global struct timer_base_s { spinlock_t lock; struct timer_list *running_timer; } __init_timer_base; which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s struct are replaced by struct timer_base_s t_base. It is indeed ugly. But this can't have scalability problems. The global __init_timer_base.lock is used only when __mod_timer() is called for the first time AND the timer was compile time initialized. After that the timer migrates to the local CPU. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Tejun Heo	f7d37d028d	[PATCH] blk: remove BLK_TAGS_{PER_LONG\|MASK} Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Tejun Heo	fa72b903f7	[PATCH] blk: remove blk_queue_tag->real_max_depth optimization blk_queue_tag->real_max_depth was used to optimize out unnecessary allocations/frees on tag resize. However, the whole thing was very broken - tag_map was never allocated to real_max_depth resulting in access beyond the end of the map, bits in [max_depth..real_max_depth] were set when initializing a map and copied when resizing resulting in pre-occupied tags. As the gain of the optimization is very small, well, almost nill, remove the whole thing. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Vincent Hanquez	e9129e56e9	[PATCH] xen: x86_64: Add macro for debugreg Add 2 macros to set and get debugreg on x86_64. This is useful for Xen because it will need only to redefine each macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:14 -07:00
Vincent Hanquez	fa1e1bdf78	[PATCH] xen: x86: Rename usermode macro Rename user_mode to user_mode_vm and add a user_mode macro similar to the x86-64 one. This is useful for Xen because the linux xen kernel does not runs on the same priviledge that a vanilla linux kernel, and with this we just need to redefine user_mode(). Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:14 -07:00
Vincent Hanquez	f5012310e3	[PATCH] xen: x86: add macro for debugreg Add 2 macros to set and get debugreg on x86. This is useful for Xen because it will need only to redefine each macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:13 -07:00
Jan Beulich	32ecd42b6f	[PATCH] eliminate duplicate rdpmc definition Eliminate duplicate definition of rdpmc in x86-64's mtrr.h. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:13 -07:00
Andrew Morton	a3a255e744	[PATCH] x86: cpu_khz type fix x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use unsigned long. So make cpu_khz unsigned int on x86 as well. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:11 -07:00
Alexey Dobriyan	a9ed881796	[PATCH] x86: #include asm/uaccess.h in asm/checksum.h csum_and_copy_to_user is static inline and uses VERIFY_WRITE. Patch allows to remove asm/uaccess.h from i386_ksyms.c without dependency surprises. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:11 -07:00
Christoph Lameter	b5d23e5b8c	[PATCH] ia64: Selectable Timer Interrupt Frequency It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ. Reducing the timer frequency may have important performance benefits on large systems. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:10 -07:00
Christoph Lameter	5912100372	[PATCH] i386: Selectable Frequency of the Timer Interrupt Make the timer frequency selectable. The timer interrupt may cause bus and memory contention in large NUMA systems since the interrupt occurs on each processor HZ times per second. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:10 -07:00
Natalie Protasevich	ca05fea6db	[PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386) This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and use heuristics to prevent unique I/O APIC ID check for systems that don't need it. The patch disables unique I/O APIC ID check for Xeon-based and other platforms that don't use serial APIC bus for interrupt delivery. Andi stated that AMD systems don't need unique IO_APIC_IDs either. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:09 -07:00
Christoph Lameter	1946089a10	[PATCH] NUMA aware block device control structure allocation Patch to allocate the control structures for for ide devices on the node of the device itself (for NUMA systems). The patch depends on the Slab API change patch by Manfred and me (in mm) and the pcidev_to_node patch that I posted today. Does some realignment too. Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Pravin Shelar <pravin@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:09 -07:00
Christoph Lameter	8c5a09082f	[PATCH] x86/x86_64: pcibus_to_node Define pcibus_to_node to be able to figure out which NUMA node contains a given PCI device. This defines pcibus_to_node(bus) in include/linux/topology.h and adjusts the macros for i386 and x86_64 that already provided a way to determine the cpumask of a pci device. x86_64 was changed to not build an array of cpumasks anymore. Instead an array of nodes is build which can be used to generate the cpumask via node_to_cpumask. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Christoph Lameter	e164f5573b	[PATCH] ppc64: pcibus_to_node fix asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to provide the fall back pcibus_to_node function. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Hirokazu Takata	4a35293667	[PATCH] m32r: build fix for asm-m32r/topology.h Use asm-generic/topology.h to fix yet another pcibus_to_node() build error. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Venkatesh Pallipadi	8a9e1b0f56	[PATCH] Platform SMIs and their interferance with tsc based delay calibration Issue: Current tsc based delay_calibration can result in significant errors in loops_per_jiffy count when the platform events like SMIs (System Management Interrupts that are non-maskable) are present. This could lead to potential kernel panic(). This issue is becoming more visible with 2.6 kernel (as default HZ is 1000) and on platforms with higher SMI handling latencies. During the boot time, SMIs are mostly used by BIOS (for things like legacy keyboard emulation). Description: The psuedocode for current delay calibration with tsc based delay looks like (0) Estimate a value for loops_per_jiffy (1) While (loops_per_jiffy estimate is accurate enough) (2) wait for jiffy transition (jiffy1) (3) Note down current tsc (tsc1) (4) loop until tsc becomes tsc1 + loops_per_jiffy (5) check whether jiffy changed since jiffy1 or not and refine loops_per_jiffy estimate Consider the following cases Case 1: If SMIs happen between (2) and (3) above, we can end up with a loops_per_jiffy value that is too low. This results in shorted delays and kernel can panic () during boot (Mostly at IOAPIC timer initialization timer_irq_works() as we don't have enough timer interrupts in a specified interval). Case 2: If SMIs happen between (3) and (4) above, then we can end up with a loops_per_jiffy value that is too high. And with current i386 code, too high lpj value (greater than 17M) can result in a overflow in delay.c:__const_udelay() again resulting in shorter delay and panic(). Solution: The patch below makes the calibration routine aware of asynchronous events like SMIs. We increase the delay calibration time and also identify any significant errors (greater than 12.5%) in the calibration and notify it to user. Patch below changes both i386 and x86-64 architectures to use this new and improved calibrate_delay_direct() routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Matt Tolentino	bbfceef47f	[PATCH] add x86-64 specific support for sparsemem This patch adds in the necessary support for sparsemem such that x86-64 kernels may use sparsemem as an alternative to discontigmem for NUMA kernels. Note that this does no preclude one from continuing to build NUMA kernels using discontigmem, but merely allows the option to build NUMA kernels with sparsemem. Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels results in reduced text size for otherwise equivalent kernels as shown in the example builds below: text data bss dec hex filename 2371036 765884 1237108 4374028 42be0c vmlinux.discontig 2366549 776484 1302772 4445805 43d66d vmlinux.sparse Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:07 -07:00
Matt Tolentino	2b97690f4c	[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options In order to use the alternative sparsemem implmentation for NUMA kernels, we need to reorganize the config options. This patch effectively abstracts out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus, the discontigmem implementation may be employed as always, but the sparsemem implementation may be used alternatively. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:06 -07:00
Andy Whitcroft	145e664231	[PATCH] ppc64: sparsemem memory model Provide the architecture specific implementation for SPARSEMEM for PPC64 systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part) Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:06 -07:00
Andy Whitcroft	510f8fa7ba	[PATCH] ppc64: add early_pfn_to_nid Provide an implementation of early_pfn_to_nid for PPC64. This is used by memory models to determine the node from which to take allocations before the memory allocators are fully initialised. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	29751f6991	[PATCH] sparsemem hotplug base Make sparse's initalization be accessible at runtime. This allows sparse mappings to be created after boot in a hotplug situation. This patch is separated from the previous one just to give an indication how much of the sparse infrastructure is just for hotplug memory. The section_mem_map doesn't really store a pointer. It stores something that is convenient to do some math against to get a pointer. It isn't valid to just do section_mem_map, so I don't think it should be stored as a pointer. There are a couple of things I'd like to store about a section. First of all, the fact that it is !NULL does not mean that it is present. There could be such a combination where section_mem_map is* NULL, but the math gets you properly to a real mem_map. So, I don't think that check is safe. Since we're storing 32-bit-aligned structures, we have a few bits in the bottom of the pointer to play with. Use one bit to encode whether there's really a mem_map there, and the other one to tell whether there's a valid section there. We need to distinguish between the two because sometimes there's a gap between when a section is discovered to be present and when we can get the mem_map for it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	641c767389	[PATCH] sparsemem swiss cheese numa layouts The part of the sparsemem patch which modifies memmap_init_zone() has recently become a problem. It changes behavior so that there is a call to pfn_to_page() for each individual page inside of a node's range: node_start_pfn through node_end_pfn. It used to simply do this once, at the beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside of a node made it necessary to change. Mike Kravetz recently wrote a patch which made the NUMA code accept some new kinds of layouts. The system's memory was laid out like this, with node 0's memory in two pieces: one before and one after node 1's memory: Node 0: +++++ +++++ Node 1: +++++ Previous behavior before Mike's patch was to assign nodes like this: Node 0: 00000 XXXXX Node 1: 11111 Where the 'X' areas were simply thrown away. The new behavior was to make the pg_data_t span node 0 across all of its areas, including areas that are really node 1's: Node 0: 000000000000000 Node 1: 11111 This wastes a little bit of mem_map space, but ends up being OK, and more fully utilizes the system's memory. memmap_init_zone() initializes all of the "struct page"s for node 0, even for the "hole", but those never get used, because there is no pfn_to_page() that resolves to those pages. However, only calling pfn_to_page() once, memmap_init_zone() always uses the pages that were allocated for node0->node_mem_map because: struct page *start = pfn_to_page(start_pfn); // effectively start = &node->node_mem_map[0] for (page = start; page < (start + size); page++) { init_page_here();... page++; } Slow, and wasteful, but generally harmless. But, modify that to call pfn_to_page() for each loop iteration (like sparsemem does): for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) { page = pfn_to_page(pfn); } And you end up trying to initialize node 1's pages too early, along with bogus data from node 0. This patch checks for those weird layouts and declines to touch the pages, making the more frequent pfn_to_page() calls OK to do. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	05b79bdcb4	[PATCH] sparsemem memory model for i386 Provide the architecture specific implementation for SPARSEMEM for i386 SMP and NUMA systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	d41dee369b	[PATCH] sparsemem memory model Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of mem_map[] is needed by discontiguous memory machines (like in the old CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually become a complete replacement. A significant advantage over DISCONTIGMEM is that it's completely separated from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA and DISCONTIG are often confused. Another advantage is that sparse doesn't require each NUMA node's ranges to be contiguous. It can handle overlapping ranges between nodes with no problems, where DISCONTIGMEM currently throws away that memory. Sparsemem uses an array to provide different pfn_to_page() translations for each SECTION_SIZE area of physical memory. This is what allows the mem_map[] to be chopped up. In order to do quick pfn_to_page() operations, the section number of the page is encoded in page->flags. Part of the sparsemem infrastructure enables sharing of these bits more dynamically (at compile-time) between the page_zone() and sparsemem operations. However, on 32-bit architectures, the number of bits is quite limited, and may require growing the size of the page->flags type in certain conditions. Several things might force this to occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of memory), an increase in the physical address space, or an increase in the number of used page->flags. One thing to note is that, once sparsemem is present, the NUMA node information no longer needs to be stored in the page->flags. It might provide speed increases on certain platforms and will be stored there if there is room. But, if out of room, an alternate (theoretically slower) mechanism is used. This patch introduces CONFIG_FLATMEM. It is used in almost all cases where there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM often have to compile out the same areas of code. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Andy Whitcroft	b159d43fbf	[PATCH] generify early_pfn_to_nid Provide a default implementation for early_pfn_to_nid returning node 0. Allow architectures to override this with their own implementation out of asm/mmzone.h. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Dave Hansen	93b7504e3e	[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG There is some confusion that arose when working on SPARSEMEM patch between what is needed for DISCONTIG vs. NUMA. Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently. All of the current NUMA implementations require an implementation of DISCONTIG. Because of this, quite a lot of code which is really needed for NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n case. Introducing this new NEED_MULTIPLE_NODES config option allows code that is needed for both NUMA or DISCONTIG to be separated out from code that is specific to DISCONTIG. One great advantage of this approach is that it doesn't require every architecture to be converted over. All of the current implementations should "just work", only the ones implementing SPARSEMEM will have to be fixed up. The change to free_area_init() makes it work inside, or out of the new config option. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:03 -07:00
Dave Hansen	5b505b90b2	[PATCH] sparsemem base: teach discontig about sparse ranges discontig.c has some assumptions that mem_map[]s inside of a node are contiguous. Teach it to make sure that each region that it's bringing online is actually made up of valid ranges of ram. Written-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	348f8b6c48	[PATCH] sparsemem base: reorganize page->flags bit operations Generify the value fields in the page_flags. The aim is to allow the location and size of these fields to be varied. Additionally we want to move away from fixed allocations per field whilst still enforcing the overall bit utilisation limits. We rely on the compiler to spot and optimise the accessor functions. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	6f167ec721	[PATCH] sparsemem base: simple NUMA remap space allocator Introduce a simple allocator for the NUMA remap space. This space is very scarce, used for structures which are best allocated node local. This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep the pgdat->node_mem_map initialized in a consistent place for all architectures. Issues: o alloc_remap takes a node_id where we might expect a pgdat which was intended to allow us to allocate the pgdat's using this mechanism; which we do not yet do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	408fde81c1	[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map This patch effectively eliminates direct use of pgdat->node_mem_map outside of the DISCONTIG code. On a flat memory system, these fields aren't currently used, neither are they on a sparsemem system. There was also a node_mem_map(nid) macro on many architectures. Its use along with the use of ->node_mem_map itself was not consistent. It has been removed in favor of two new, more explicit, arch-independent macros: pgdat_page_nr(pgdat, pagenr) nid_page_nr(nid, pagenr) I called them "pgdat" and "nid" because we overload the term "node" to mean "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I believe the newer names are much clearer. These macros can be overridden in the sparsemem case with a theoretically slower operation using node_start_pfn and pfn_to_page(), instead. We could make this the only behavior if people want, but I don't want to change too much at once. One thing at a time. This patch removes more code than it adds. Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386 generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/ Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin J. Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:00 -07:00
John Rose	dad32bbf43	[PATCH] pSeries - read irqs dynamically For I/O DLPAR to work properly, the kernel needs to allow for dynamic assignment of the irq field of the pci_dev structure upon dynamic bus addition. This patch moves the assignment of that field from pSeries_final_fixup() to pcibios_fixup_bus(), which enables dynamic assignment for the children of a newly added bus. Currently, pci_devs receive their irq numbers in one of two ways. The irq line is either read at boot for all pci_devs, or read by the rpaphp module at slot enable time. The latter is no longer sufficient for DLPAR addition of slots that don't qualify as PCI-hotplug capable. This solution handles the cases of boot and dynamic add. Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 17:09:54 +10:00
Linus Torvalds	060de20e82	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-22 23:11:50 -07:00
Shaun Pereira	ebc3f64b86	[X25]: Fast select with no restriction on response This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:16:17 -07:00
Shaun Pereira	cb65d506c3	[X25]: Selective sub-address matching with call user data. From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:15:01 -07:00
Jeff Moyer	fbeec2e155	[NETPOLL]: allow multiple netpoll_clients to register against one interface This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:59 -07:00
Jeff Moyer	115c1d6e61	[NETPOLL]: Introduce a netpoll_info struct This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:31 -07:00
Jeff Moyer	6ca4f65e6b	[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock() This trivial patch moves the assignment of poll_owner to -1 inside of the lock. This fixes a potential SMP race in the code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:04:55 -07:00
Arnd Bergmann	fef1c772fa	[PATCH] ppc64: add BPA platform type This adds the basic support for running on BPA machines. So far, this is only the IBM workstation, and it will not run on others without a little more generalization. It should be possible to configure a kernel for any combination of CONFIG_PPC_BPA with any of the other multiplatform targets. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:37 +10:00
Utz Bacher	5f5b4e669a	[PATCH] ppc64: add a minimal nvram driver The firmware provides the location and size of the nvram in the device tree, so it does not really contain any hardware specific bits and could be used on other machines as well. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:31 +10:00
Arnd Bergmann	6566c6f1f1	[PATCH] ppc64: pSeries_progress -> rtas_progress The pSeries_progress function is called from some places in the rtas code, which may also be used by non-pSeries platforms. Though pSeries is currently the only platform type that implements display-character, the code is actually generic enough to be part of the rtas subsystem. I hit a bug here because the generic rtas code tried calling ppc_md.progress, which points to an __init function on most platforms. We could also clear the ppc_md.progress pointer when freeing the init memory to make it more explicit that ppc_md.progress must not be called after bootup. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:28 +10:00
Arnd Bergmann	773bf9c469	[PATCH] ppc64: rename pSeries rtc functions into rtas_* The rtc rtas functions are not pSeries specific but can also be used by BPA and other SLOF based platforms Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:18 +10:00
Arnd Bergmann	10f7e7c15e	[PATCH] ppc64: consolidate calibrate_decr implementations pSeries and maple have almost the same code for calibrate_decr, and BPA would need yet another copy. Instead, I'm moving the code to arch/ppc64/kernel/time.c. Some of the related declarations were missing from header files, so I'm moving those as well. It makes sense to merge this with the pmac function of the same name, so we end up having just one implemetation for iSeries and one for Open Firmware based machines. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:07 +10:00
Linus Torvalds	a493604400	Merge master.kernel.org:/home/rmk/linux-2.6-arm	2005-06-22 14:51:06 -07:00
Linus Torvalds	9092131f7e	Merge rsync://client.linux-nfs.org/pub/linux/nfs-2.6	2005-06-22 14:32:15 -07:00
Trond Myklebust	8d0a8a9d0e	[PATCH] NFSv4: Clean up nfs4 lock state accounting Ensure that lock owner structures are not released prematurely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	ecdbf769b2	[PATCH] NLM: fix a client-side race on blocking locks. If the lock blocks, the server may send us a GRANTED message that races with the reply to our LOCK request. Make sure that we catch the GRANTED by queueing up our request on the nlm_blocked list before we send off the first LOCK rpc call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	3da28eb1c6	[PATCH] NFS: Replace nfs_page insertion sort with a radix sort Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	c6a556b88a	[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	fe51beecc5	[PATCH] NFS: Ensure that fstat() always returns the correct mtime Even if the file is open for writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	7d52e86274	[PATCH] NFS: Cleanup of caching code, and slight optimization of writes. Unless we're doing O_APPEND writes, we really don't care about revalidating the file length. Just make sure that we catch any page cache invalidations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	951a143b3f	[PATCH] NFS: Fix the file size revalidation Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:36 -04:00
Trond Myklebust	f0dd2136da	[PATCH] NFS: Clean up readdir changes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:34 -04:00
Olivier Galibert	00a9264227	[PATCH] NFS: Hide NFS server-generated readdir cookies from userland NFSv3 currently returns the unsigned 64-bit cookie directly to userspace. The following patch causes the kernel to generate loff_t offsets for the benefit of userland. The current server-generated READDIR cookie is cached in the nfs_open_context instead of in filp->f_pos, so we still end up work correctly under directory insertions/deletion. Signed-off-by: Olivier Galibert <galibert@pobox.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:33 -04:00
Andreas Gruenbacher	5c6a9f7d92	[PATCH] NFS: Cache the NFSv3 acls. Attach acls to inodes in the icache to avoid unnecessary GETACL RPC round-trips. As long as the client doesn't retrieve any acls itself, only the default acls of exiting directories and the default and access acls of new directories will end up in the cache, which preserves some memory compared to always caching the access and default acl of all files. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:25 -04:00
Andreas Gruenbacher	055ffbea05	[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present. NFSv3 has no concept of a umask on the server side: The client applies the umask locally, and sends the effective permissions to the server. This behavior is wrong when files are created in a directory that has a default ACL. In this case, the umask is supposed to be ignored, and only the default ACL determines the file's effective permissions. Usually its the server's task to conditionally apply the umask. But since the server knows nothing about the umask, we have to do it on the client side. This patch tries to fetch the parent directory's default ACL before creating a new file, computes the appropriate create mode to send to the server, and finally sets the new file's access and default acl appropriately. Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial version of this patch, as well as for arguing why we need this change. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	b7fa0554cf	[PATCH] NFS: Add support for NFSv3 ACLs This adds acl support fo nfs clients via the NFSACL protocol extension, by implementing the getxattr, listxattr, setxattr, and removexattr iops for the system.posix_acl_access and system.posix_acl_default attributes. This patch implements a dumb version that uses no caching (and thus adds some overhead). (Another patch in this patchset adds caching as well.) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	a257cdd0e2	[PATCH] NFSD: Add server support for NFSv3 ACLs. This adds functions for encoding and decoding POSIX ACLs for the NFSACL protocol extension, and the GETACL and SETACL RPCs. The implementation is compatible with NFSACL in Solaris. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:23 -04:00
Andreas Gruenbacher	9ba02638e4	[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port The NFS and NFSACL programs run on the same RPC transport. This patch adds support for this by converting svc_program into a chained list of programs (server-side). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:22 -04:00
Andreas Gruenbacher	bd8100e7ed	[PATCH] RPC: Encode and decode arbitrary XDR arrays Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:20 -04:00
Trond Myklebust	7e06b53d79	[PATCH] RPC: fix accounting bug in the case of a truncated RPC message Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Olaf Kirch	e053d1ab62	[PATCH] RPC: Lazy RPC receive buffer allocation Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Andreas Gruenbacher	007e251f2b	[PATCH] RPC: Allow multiple RPC client programs to share the same transport Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:18 -04:00
J. Bruce Fields	e50a1c2e1f	[PATCH] NFSv4: client-side caching NFSv4 ACLs Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache acls of size up to a page. Also prepare for up to a page of acl data even when the user doesn't pass in a buffer, as when they want to get the acl length to decide what size buffer to allocate. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:15 -04:00
J. Bruce Fields	23ec6965c2	[PATCH] NFSv4: Client-side xdr for writing NFSv4 acls Client-side support for NFSv4 acls: xdr encoding and decoding routines for writing acls Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:13 -04:00
J. Bruce Fields	029d105e66	[PATCH] NFSv4: Client-side xdr for reading NFSv4 acls Client-side support for NFSv4 acls: xdr encoding and decoding routines for reading acls Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:12 -04:00
Trond Myklebust	ada70d9425	[PATCH] NFS: Add hooks to allow common NFS attribute code to clear cached acls Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:09 -04:00
J. Bruce Fields	92cfc62cb8	[PATCH] NFS: Allow NFS versions to support different sets of inode operations. ACL support will require supporting additional inode operations in v4 (getxattr, setxattr, listxattr). This patch allows different protocol versions to support different inode operations by adding a file_inode_ops to the nfs_rpc_ops (to match the existing dir_inode_ops). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:09 -04:00
Trond Myklebust	464a98bd70	[PATCH] NFS: cleanup: shrink struct nfs_open_context Remove the wait queue, and replace the functions that depended on it with wait_on_bit(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:08 -04:00
Trond Myklebust	96651ab341	[PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:07 -04:00
Trond Myklebust	a656db9987	[PATCH] NFS: Remove unused NFS inode field readdir_timestamp. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:07 -04:00
Trond Myklebust	4ce79717ce	[PATCH] NFS: Header file cleanup... - Move NFSv4 state definitions into a private header file. - Clean up gunk in nfs_fs.h Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:06 -04:00
Trond Myklebust	5ee0ed7d3a	[PATCH] RPC: Make rpc_create_client() probe server for RPC program+version support Ensure that we don't create an RPC client without checking that the server does indeed support the RPC program + version that we are trying to set up. This enables us to immediately return an error to "mount" if it turns out that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:04 -04:00
Russell King	bdf042486a	[PATCH] ARM: Factor out common pmd_populate functionality Both pmd_populate variants set two pmd entries before ensuring that they are flushed from the cache. Separate this functionality into __pmd_populate(). Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-22 20:58:29 +01:00
Harald Welte	dd7f0b8092	[NETFILTER]: Fix "iptables -D" rule deletion with ipt_CLUSTERIP target. The patch just changes the order of structure members. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 12:38:33 -07:00
Linus Torvalds	fb7a0e3653	Merge kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git Do arch/ia64/defconfig by hand.	2005-06-22 12:22:12 -07:00
Linus Torvalds	4e93d3e885	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6	2005-06-22 10:42:54 -07:00
Takashi Iwai	b636a71d9b	[ALSA] Add const prefix Control Midlevel Add const prefix to snd_kcontrol_new_t pointer for better protection. Signed-off-by: Takashi Iwai <tiwai@suse.de>	2005-06-22 12:28:54 +02:00
Takashi Iwai	763f356cd8	[ALSA] Add HDSP MADI driver HDSPM driver,PCI drivers,RME9652 driver Added RME Hammerfall DSP MADI driver by Winfried Ritsch. (Moved from alsa-driver tree to mainline.) Signed-off-by: Takashi Iwai <tiwai@suse.de>	2005-06-22 12:28:11 +02:00
Jaroslav Kysela	69ad07cf98	[ALSA] AC97 - renamed vendor/device to subvendor/subdevice where appropriate AC97 Codec,ATIIXP driver,VIA82xx driver To avoid confusion, the structure members vendor/device were renamed to subvendor/subdevice, because we compare them with PCI subsystem vendor and subsystem device. Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2005-06-22 12:27:34 +02:00
Jaroslav Kysela	da04b128cf	Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git	2005-06-22 12:19:24 +02:00
Randy Vinson	c124a78d8c	[PATCH] I2C: Add support for Maxim/Dallas DS1374 Real-Time Clock Chip (1/2) Add support for Maxim/Dallas DS1374 Real-Time Clock Chip This change adds support for the Maxim/Dallas DS1374 RTC chip. This chip is an I2C-based RTC that maintains a simple 32-bit binary seconds count with battery backup support. Signed-off-by: Randy Vinson <rvinson@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:52:06 -07:00
Jean Delvare	10c08f8100	[PATCH] I2C: rename i2c-sysfs.h to hwmon-sysfs.h This patch renames the new linux/i2c-sysfs.h header file to linux/hwmon-sysfs.h. This names seems to be more appropriate since this file defines macros and structures not related to i2c but to hardware monitoring drivers. The patch also updates the five hardware monitoring driver which include that header file already. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:52:05 -07:00
David Brownell	72cd799544	[PATCH] I2C: add i2c driver for TPS6501x This adds an I2C driver for the TPS6501x series of power management chips. It's used on many OMAP based boards, and this driver has been widely used in the Linux-OMAP trees over the last year or so. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:52:01 -07:00
Sebastian Witt	3886246a25	[PATCH] I2C: i2c-vid.h: Support for VID to reg conversion Adds conversion from VID (mV) to register value. Used by the atxp1 I2C module. Removed uneeded switch case. Signed-off-by: Sebastian Witt <se.witt@gmx.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-06-21 21:51:49 -07:00
Jean Delvare	b3d5496ea5	[PATCH] I2C: Kill address ranges in non-sensors i2c chip drivers Some months ago, you killed the address ranges mechanism from all sensors i2c chip drivers (both the module parameters and the in-code address lists). I think it was a very good move, as the ranges can easily be replaced by individual addresses, and this allowed for significant cleanups in the i2c core (let alone the impressive size shrink for all these drivers). Unfortunately you did not do the same for non-sensors i2c chip drivers. These need the address ranges even less, so we could get rid of the ranges here as well for another significant i2c core cleanup. Here comes a patch which does just that. Since the process is exactly the same as what you did for the other drivers set already, I did not split this one in parts. A documentation update is included. The change saves 308 bytes in the i2c core, and an average 1382 bytes for chip drivers which use I2C_CLIENT_INSMOD, 126 bytes for those which do not. This change is required if we want to merge the sensors and non-sensors i2c code (and we want to do this). Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Index: gregkh-2.6/Documentation/i2c/writing-clients ===================================================================	2005-06-21 21:51:48 -07:00
NeilBrown	39730960d9	[PATCH] Two small fixes for md verion-1 superblocks. 1/ Must typecast int to (sector_t) before inverting or we might not invert enough bits. 2/ When "bitmap_offset" was added to mdp_superblock_1, we didn't increase the count of words-used (96 to 100). Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	7bfa19f274	[PATCH] md: allow md to update multiple superblocks in parallel. currently, md updates all superblocks (one on each device) in series. It waits for one write to complete before starting the next. This isn't a big problem as superblock updates don't happen that often. However it is neater to do it in parallel, and if the drives in the array have gone to "sleep" after a period of idleness, then waking them is parallel is faster (and someone else should be worrying about power drain). Futher, we will need parallel superblock updates for a future patch which keeps the intent-logging bitmap near the superblock. Also remove the silly code that retired superblock updates 100 times. This simply never made sense. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	a654b9d8f8	[PATCH] md: allow md intent bitmap to be stored near the superblock. This provides an alternate to storing the bitmap in a separate file. The bitmap can be stored at a given offset from the superblock. Obviously the creator of the array must make sure this doesn't intersect with data.... After is good for version-0.90 superblocks. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:47 -07:00
NeilBrown	3d310eb7b3	[PATCH] md: fix deadlock due to md thread processing delayed requests. Before completing a 'write' the md superblock might need to be updated. This is best done by the md_thread. The current code schedules this up and queues the write request for later handling by the md_thread. However some personalities (Raid5/raid6) will deadlock if the md_thread tries to submit requests to its own array. So this patch changes things so the processes submitting the request waits for the superblock to be written and then submits the request itself. This fixes a recently-created deadlock in raid5/raid6 Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	41158c7eb2	[PATCH] md: optimise reconstruction when re-adding a recently failed drive. When an array is degraded, bit in the intent-bitmap are never cleared. So if a recently failed drive is re-added, we only need to reconstruct the block that are still reflected in the bitmap. This patch adds support for this re-adding. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	191ea9b2c7	[PATCH] md: raid1 support for bitmap intent logging Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:46 -07:00
NeilBrown	77ad4bc706	[PATCH] md: enable the bitmap write-back daemon and wait for it. Currently we don't wait for updates to the bitmap to be flushed to disk properly. The infrastructure all there, but it isn't being used.... A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each page as we cannot get callbacks when a page write completes. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:45 -07:00
NeilBrown	32a7627cf3	[PATCH] md: optimised resync using Bitmap based intent logging With this patch, the intent to write to some block in the array can be logged to a bitmap file. Each bit represents some number of sectors and is set before any update happens, and only cleared when all writes relating to all sectors are complete. After an unclean shutdown, information in this bitmap can be used to optimise resync - only sectors which could be out-of-sync need to be updated. Also if a drive is removed and then added back into an array, the recovery can make use of the bitmap to optimise reconstruction. This is not implemented in this patch. Currently the bitmap is stored in a file which must (obviously) be stored on a separate device. The patch only provided infrastructure. It does not update any personalities to bitmap intent logging. Md arrays can still be used with no bitmap file. This patch has minimal impact on such arrays. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
NeilBrown	57afd89f98	[PATCH] md: improve the interface to sync_request 1/ change the return value (which is number-of-sectors synced) from 'int' to 'sector_t'. The number of sectors is usually easily small enough to fit in an int, but if resync needs to abort, it may want to return the total number of remaining sectors, which could be large. Also errors cannot be returned as negative numbers now, so use 0 instead 2/ Add a 'skipped' return parameter to allow the array to report that it skipped the sectors. This allows md to take this into account in the speed calculations. Currently there is no important skipping, but the bitmap-based-resync that is coming will use this. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
NeilBrown	06d91a5fe0	[PATCH] md: improve locking on 'safemode' and move superblock writes When md marks the superblock dirty before a write, it calls generic_make_request (to write the superblock) from within generic_make_request (to write the first dirty block), which could cause problems later. With this patch, the superblock write is always done by the helper thread, and write request are delayed until that write completes. Also, the locking around marking the array dirty and writing the superblock is improved to avoid possible races. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:43 -07:00
James Simmons	f1ab5dac25	[PATCH] fbdev: stack reduction Shrink the stack when calling the drawing alignment functions. Signed-off-by: James Simmons <jsimmons@www.infradead.org> Cc: "Antonino A. Daplas" <adaplas@hotpop.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:41 -07:00
Jurriaan	303b86d991	[PATCH] New framebuffer fonts + updated 12x22 font available Improve the fonts for use with the framebuffer. I've added all the characters marked 'FIXME' in the sun12x22 font and created a 10x18 font (based on the sun12x22 font) and a 7x14 font (based on the vga8x16 font). This patch is non-intrusive, no options are enabled by default so most users won't notice a thing. I am placing my changes under the GPL, however, I've not seen any copyright notices on the sun12x22 font and the vga8x16 font which I derived my new fonts from so I don't know what the copyright status is. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:41 -07:00
James Simmons	d5881eb488	[PATCH] fbdev: new pci id for chipsfb Patch adds pci ID for CT 69000 chipset. Signed-off-by: James Simmons <jsimmons@www.infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:41 -07:00
Jaya Kumar	1154ea7dcd	[PATCH] Framebuffer driver for Arc LCD board Add support for the Arc monochrome LCD board. The board uses KS108 controllers to drive individual 64x64 LCD matrices. The board can be paneled in a variety of setups such as 2x1=128x64, 4x4=256x256 and so on. The board/host interface is through GPIO. Signed-off-by: Jaya Kumar <jayalk@intworks.biz> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: <linux-fbdev-devel@lists.sourceforge.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:41 -07:00
James Simmons	f5a9951c94	[PATCH] fbdev: iomove removal Since no one is using the inbuf, outbuf of struct fb_pixmap I removed their use in the framebuffer console. The idea is instead move the pixmap functionality below the accelerated functions intead of on top as the way it is now. If there is no objection please apply. This is against Linus latestr GIT tree. Thank you. Signed-off-by: James Simmons <jsimmons@www.infradead.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:39 -07:00
Ian Kent	8a96619145	[PATCH] autofs4: subversion bump to identify these changes Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:36 -07:00
Paolo 'Blaisorblade' Giarrusso	b77d6adc92	[PATCH] uml: make hw_controller_type->release exist only for archs needing it With Chris Wedgwood <cw@f00f.org> As suggested by Chris, we can make the "just added" method ->release conditional to UML only (better: to archs requesting it, i.e. only UML currently), so that other archs don't get this unneeded crud, and if UML won't need it any more we can kill this. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:32 -07:00
Paolo 'Blaisorblade' Giarrusso	dbce706e25	[PATCH] uml: add and use generic hw_controller_type->release With Chris Wedgwood <cw@f00f.org> Currently UML must explicitly call the UML-specific free_irq_by_irq_and_dev() for each free_irq call it's done. This is needed because ->shutdown and/or ->disable are only called when the last "action" for that irq is removed. Instead, for UML shared IRQs (UML IRQs are very often, if not always, shared), for each dev_id some setup is done, which must be cleared on the release of that fd. For instance, for each open console a new instance (i.e. new dev_id) of the same IRQ is requested(). Exactly, a fd is stored in an array (pollfds), which is after read by a host thread and passed to poll(). Each event registered by poll() triggers an interrupt. So, for each free_irq() we must remove the corresponding host fd from the table, which we do via this -release() method. In this patch we add an appropriate hook for this, and remove all uses of it by pointing the hook to the said procedure; this is safe to do since the said procedure. Also some cosmetic improvements are included. This is heavily based on some work by Chris Wedgwood, which however didn't get the patch merged for something I'd call a "misunderstanding" (the need for this patch wasn't cleanly explained, thus adding the generic hook was felt as undesirable). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:32 -07:00
Hirokazu Takata	5757b284a3	[PATCH] m32r: Use asm-generic/div64.h The current include/asm-m32r/div64.h of 2.6.12-rc5 looks buggy. Here is a patch for updating it to use asm-generic/div64.h for m32r like other architectures. Signed-off-by: Hitoshi Yamamoto <hitoshiy@isl.melco.co.jp> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:31 -07:00
Hirokazu Takata	0adbb44a14	[PATCH] m32r: Remove include/asm-m32r/m32102peri.h This patch removes an obsolete header file include/asm-m32r/m32102peri.h. In this header, there are some undesirable single character types, like V. And the header is almost no longer used. Signed-off-by: Hayato Fujiwara <fujiwara@linux-m32r.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:31 -07:00
Hirokazu Takata	2368086344	[PATCH] m32r: Support M3A-2170(Mappi-III) platform This patchset is for supporting a new m32r platform, M3A-2170(Mappi-III) evaluation board. An M32R chip multiprocessor is equipped on the board. http://http://www.linux-m32r.org/eng/platform/platform.html * arch/m32r/Kconfig: Support Mappi-III platform. * arch/m32r/kernel/Makefile: ditto. * arch/m32r/kernel/io_mappi3.c: ditto. * arch/m32r/kernel/setup.c: ditto. * arch/m32r/kernel/setup_mappi3.c: ditto. * include/asm-m32r/m32102.h: ditto. * include/asm-m32r/m32r.h: ditto. * include/asm-m32r/mappi3/mappi3_pld.h: ditto. * include/asm-m32r/ide.h: CF support for Mappi-III. * arch/m32r/kernel/setup_mappi3.c: ditto. * arch/m32r/mappi3/defconfig.smp: A default config file for Mappi-III. * arch/m32r/mappi3/dot.gdbinit: A default .gdbinit file for Mappi-III. * arch/m32r/boot/compressed/m32r_sio.c: Modified for Mappi-III - At boot time, m32r-g00ff bootloader makes MMU off for Mappi-III, on the contrary it makes MMU on for Mappi-II. * arch/m32r/kernel/io_mappi2.c: Update comments. * arch/m32r/kernel/setup_mappi2.c: ditto. Signed-off-by: Mamoru Sakugawa <sakugawa@linux-m32r.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 19:07:30 -07:00
Brent Casavant	d4c477ca54	[PATCH] ioc4: PCI bus speed detection Several hardware features of SGI's IOC4 I/O controller chip require timing-related driver calculations dependent upon the PCI bus speed. This patch enables the core IOC4 driver code to detect the actual bus speed and store a value that can later be used by the IOC4 subdrivers as needed. Signed-off-by: Brent Casavant <bcasavan@sgi.com> Acked-by: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:32 -07:00
Brent Casavant	22329b511a	[PATCH] ioc4: Core driver rewrite This series of patches reworks the configuration and internal structure of the SGI IOC4 I/O controller device drivers. These changes are motivated by several factors: - The IOC4 chip PCI resources are of mixed use between functions (i.e. multiple functions are handled in the same address range, sometimes within the same register), muddling resource ownership and initialization issues. Centralizing this ownership in a core driver is desirable. - The IOC4 chip implements multiple functions (serial, IDE, others not yet implemented in the mainline kernel) but is not a multifunction PCI device. In order to properly handle device addition and removal as well as module insertion and deletion, an intermediary IOC4-specific driver layer is needed to handle these operations cleanly. - All IOC4 drivers are currently enabled by a single CONFIG value. As not all systems need all IOC4 functions, it is desireable to enable these drivers independently. - The current IOC4 core driver will trigger loading of all function-level drivers, as it makes direct calls to them. This situation should be reversed (i.e. function-level drivers cause loading of core driver) in order to maintain a clear and least-surprise driver loading model. - IOC4 hardware design necessitates some driver-level dependency on the PCI bus clock speed. Current code assumes a 66MHz bus, but the speed should be autodetected and appropriate compensation taken. This patch series effects the above changes by a newly and better designed IOC4 core driver with which the function-level drivers can register and deregister themselves upon module insertion/removal. By tracking these modules, device addition/removal is also handled properly. PCI resource management and ownership issues are centralized in this core driver, and IOC4-wide configuration actions such as bus speed detection are also handled in this core driver. This patch: The SGI IOC4 I/O controller chip implements multiple functions, though it is not a multi-function PCI device. Additionally, various PCI resources of the IOC4 are shared by multiple hardware functions, and thus resource ownership by driver is not clearly delineated. Due to the current driver design, all core and subordinate drivers must be loaded, or none, which is undesirable if not all IOC4 hardware features are being used. This patch reorganizes the IOC4 drivers so that the core driver provides a subdriver registration service. Through appropriate callbacks the subdrivers can now handle device addition and removal, as well as module insertion and deletion (though the IOC4 IDE driver requires further work before module deletion will work). The core driver now takes care of allocating PCI resources and data which must be shared between subdrivers, to clearly delineate module ownership of these items. Signed-off-by: Brent Casavant <bcasavan@sgi.com> Acked-by: Pat Gefre <pfg@sgi.com Acked-by: Jeremy Higdon <jeremy@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:32 -07:00
Yoichi Yuasa	e400bae984	[PATCH] mips: add vr41xx gpio support Add vr41xx gpio support. Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:32 -07:00
Stephen Rothwell	145d01e428	[PATCH] ppc64 iSeries: allow build with no PCI This patch allows iSeries to build with CONFIG_PCI=n. This is useful for partitions that have only virtual I/O. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:31 -07:00
Stephen Rothwell	7f74e79fe7	[PATCH] ppc64 iSeries: tidy up irq code after merge This patch just removes some dead code, fixes messages that referred to the file this code used to be in and inserts XmPciLpEvent_init into its caller. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:30 -07:00
Stephen Rothwell	0c3b4f1a8e	[PATCH] ppc64 iSeries: irq simple cleanups This patch is just simple cleanups to the iSeries irq code. - whitespace and comments - rearrange some functions to avoid forward declarations - remove XmPciLpEvent.h as its functions were declared elsewhere - remove decaration of function that no longer exists No semantic changes. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:30 -07:00
Stephen Rothwell	061c063efc	[PATCH] ppc64 iSeries: remove some more members of iSeries_Device_Node The AgentId, PhbId, FrameId, CardLocation and Location members of iSeries_Device_Node are stored early in the boot process just so that a message about the device can be printed later in the boot process. Remove them and construct the message by doing the VPD parsing at the time the message is printed. Also remove a few unused defines in iSeries_VpdInfo.c. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:30 -07:00
Stephen Rothwell	a2ebaf250f	[PATCH] ppc64 iSeries: remove IoRetry from iSeries_Device_Node The IoRetry member of iSeries_Devide_Node is really only used locally, so remove it and replace it with a local variable. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:29 -07:00
Stephen Rothwell	aab41dea80	[PATCH] ppc64 iSeries: iSeries_pci.h cleanups Remove no longer used things from iSeries_pci.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:29 -07:00
Stephen Rothwell	e7eb22d201	[PATCH] ppc64 iSeries: iommu.h cleanups The iommu_table_cb structure is iSeries specific, so move it to the header file that declares the function we pass it to. vio_tce_table and iommu_setup_iSeries no longer exist, so remove their declarations. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:29 -07:00
Stephen Rothwell	ea7190d0af	[PATCH] ppc64 iSeries: remove iSeries_pci_reset.c The file arch/ppc64/kernel/iSeries_pci_reset contains only one function that is not use anywhere (any more). Remove it. This function is the only user of the ReturnCode member of iSeries_Device_Node, so remove that as well. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:29 -07:00
Stephen Rothwell	c670b1acd0	[PATCH] ppc64 iSeries: misc header cleanups Last of this round of the iSeries header cleanups - don't have two defines for the same thing (HvMaxArchitectedLps and HvMaxArchitectedVirtualLans) - HvCallSc.h only needs linux/types.h - remove unused struct definition - add "extern" to some more function declarations Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:28 -07:00
Stephen Rothwell	4a5304f5ba	[PATCH] ppc64 iSeries: tidy up some includes and HvCall.h This patch removes some unused bits from HvCall.h and some unneeded #includes from other files. Also includes ItLpQueue.h in paca.h in preference to a stub declaration of struct ItLpQueue. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:28 -07:00
Stephen Rothwell	c92877e0a0	[PATCH] ppc64 iSeries: cleanup ItLpQueue.h Just white space cleaups and move process_iSeries_events into its only caller. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:28 -07:00
Stephen Rothwell	2310c977a9	[PATCH] ppc64 iSeries: remove HvCallCfg.h Now that the only users of things in HvCallCfg.h are in HvLpConfig.h, merge in the bit we need and remove HvCallCfg.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:28 -07:00
Stephen Rothwell	dd61ce9227	[PATCH] ppc64 iSeries: eliminate some unused inline functions This patch removes from the iSeries header files a large number of inline functions that are not used. It also changes the only caller of a HvCallCfg function that is outside HvLpConfig.h to its equivalent HvLpConfig function and no longer includes HvCallCfg.h where it is not needed. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:28 -07:00
Stephen Rothwell	0bc0ffd5f0	[PATCH] ppc64 iSeries: remove LparData.h include/asm-ppc64/iSeries/LparData.h just included a whole lot of other files to declare variables that would be better declared in those other files. So, remove it. This will reduce that number of things needed to be included in most cases to access the relevant variables. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:27 -07:00
Stephen Rothwell	6b7feecb2f	[PATCH] ppc64 iSeries: obvious code simplifications This patch does some obvious code cleanups in the iSeries headers files. - simplifies the bodies of lots of inline functions - parenthesises a macros result - removes C++ wrapping - adds "extern" to some function declarations There are no semantic changes. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:27 -07:00
Stephen Rothwell	fcee389526	[PATCH] ppc64 iSeries: more header file white space cleanups This patch just contains white space and comment cleanups in the iSeries headers files. There are no semantic changes. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:27 -07:00
Stephen Rothwell	45dc76aaf6	[PATCH] ppc64 iSeries: header file white space cleanups This patch just contains white space and comment cleanups in the iSeries headers files. There are no semantic changes. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:27 -07:00
Stephen Rothwell	0e3e4a1c4d	[PATCH] ppc64 iSeries: remove iSeries_proc.h include/asm-ppc64/iSeries/iSeries_proc.h just contains a declaration of a function that no longer exists. Remove it. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:26 -07:00
David Gibson	20cee16ced	[PATCH] ppc64: Abolish ioremap_mm Currently ppc64 has two mm_structs for the kernel, init_mm and also ioremap_mm. The latter really isn't necessary: this patch abolishes it, instead restricting vmallocs to the lower 1TB of the init_mm's range and placing io mappings in the upper 1TB. This simplifies the code in a number of places and eliminates an unecessary set of pagetables. It also tweaks the unmap/free path a little, allowing us to remove the unmap_im_area() set of page table walkers, replacing them with unmap_vm_area(). Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:26 -07:00
Kumar Gala	5be061eee9	[PATCH] ppc32: Clean up NUM_TLBCAMS usage for Freescale Book-E PPC's Made the number of TLB CAM entries private and converted the board consumers to use num_tlbcam_entries which is setup at boot time from configuration registers. This way the only consumers of the #define NUM_TLBCAMS are the arrays used to manage the TLB. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:24 -07:00
Kumar Gala	65145e060b	[PATCH] ppc32: Added support for all MPC8548 internal interrupts The MPC8548 has 48 internal interrupts and 12 external interrupts. The previous generation PowerQUICC III devices only had 32 internal and 12 external interrupts on the primary interrupt controller. Expanded the number of internal interrupts to 48 for all PowerQUICC III processors and moved the interrupt numbers for the external after the 48 internal interrupt lines, rather than putting the 12 new internal interrupts at the end and ifdef'ng the whole mess. As parted of this created a macro which represents the internal interrupt senses since they are the same on all PQ3 processors. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:24 -07:00
Kumar Gala	b264c35279	[PATCH] ppc32: Converted MPC10X bridge to use platform devices instead of OCP Converted the MPC10x bridge support (used by MPC10x and 8240/1/5) to used the standard platform device model. Signed-off-by: Matt McClintock <msm@freescale.com> Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:23 -07:00
Kumar Gala	c91999bba3	[PATCH] ppc32: Added preliminary support for the MPC8548 CDS board Adds support for using the MPC8548 processor on the CDS reference board. Currently all the major busses (PCI, PCI-X, PCI-Express, sRIO) and eTSEC3 and eTSEC4 are not supported. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:23 -07:00
Kumar Gala	5b37b700f7	[PATCH] ppc32: Added support for new MPC8548 family of PowerQUICC III processors Added descriptions of the new MPC8548 family processors, e500 core and peripherals. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:23 -07:00
Abhijit Karmarkar	b4955ce3dd	[PATCH] msync: check pte dirty earlier It's common practice to msync a large address range regularly, in which often only a few ptes have actually been dirtied since the previous pass. sync_pte_range then goes much faster if it tests whether pte is dirty before locating and accessing each struct page cacheline; and it is hardly slowed by ptep_clear_flush_dirty repeating that test in the opposite case, when every pte actually is dirty. But beware, s390's pte_dirty always says false, since its dirty bit is kept in the storage key, located via the struct page address. So skip this optimization in its case: use a pte_maybe_dirty macro which just says true if page_test_and_clear_dirty is implemented. Signed-off-by: Abhijit Karmarkar <abhijitk@veritas.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:21 -07:00
Bob Picco	400e65146c	[PATCH] ia64: pfn_to_nid() implementation pfn_to_nid is undefined. We haven't had this interface on ia64. The sys_mbind patches need it. Oh, the paddr_to_nid call could fail when DISCONTIG+NUMA is configured because there isn't any ACPI SRAT NUMA information. Signed-off-by: Bob Picco <bob.picco@hp.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:19 -07:00
Jes Sorensen	65ed0b337b	[PATCH] SN2 XPC build patches This patch contains the bits to make the XPC code use the uncached allocator rather than calling into the mspec driver. It also includes the mspec.h header which is required to build the XPC modules. Signed-off-by: Jes Sorensen <jes@wildopensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:18 -07:00
Jes Sorensen	f14f75b811	[PATCH] ia64 uncached alloc This patch contains the ia64 uncached page allocator and the generic allocator (genalloc). The uncached allocator was formerly part of the SN2 mspec driver but there are several other users of it so it has been split off from the driver. The generic allocator can be used by device driver to manage special memory etc. The generic allocator is based on the allocator from the sym53c8xx_2 driver. Various users on ia64 needs uncached memory. The SGI SN architecture requires it for inter-partition communication between partitions within a large NUMA cluster. The specific user for this is the XPC code. Another application is large MPI style applications which use it for synchronization, on SN this can be done using special 'fetchop' operations but it also benefits non SN hardware which may use regular uncached memory for this purpose. Performance of doing this through uncached vs cached memory is pretty substantial. This is handled by the mspec driver which I will push out in a seperate patch. Rather than creating a specific allocator for just uncached memory I came up with genalloc which is a generic purpose allocator that can be used by device drivers and other subsystems as they please. For instance to handle onboard device memory. It was derived from the sym53c7xx_2 driver's allocator which is also an example of a potential user (I am refraining from modifying sym2 right now as it seems to have been under fairly heavy development recently). On ia64 memory has various properties within a granule, ie. it isn't safe to access memory as uncached within the same granule as currently has memory accessed in cached mode. The regular system therefore doesn't utilize memory in the lower granules which is mixed in with device PAL code etc. The uncached driver walks the EFI memmap and pulls out the spill uncached pages and sticks them into the uncached pool. Only after these chunks have been utilized, will it start converting regular cached memory into uncached memory. Hence the reason for the EFI related code additions. Signed-off-by: Jes Sorensen <jes@wildopensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:18 -07:00
Christoph Lameter	4ae7c03943	[PATCH] Periodically drain non local pagesets The pageset array can potentially acquire a huge amount of memory on large NUMA systems. F.e. on a system with 512 processors and 256 nodes there will be 256*512 pagesets. If each pageset only holds 5 pages then we are talking about 655360 pages.With a 16K page size on IA64 this results in potentially 10 Gigabytes of memory being trapped in pagesets. The typical cases are much less for smaller systems but there is still the potential of memory being trapped in off node pagesets. Off node memory may be rarely used if local memory is available and so we may potentially have memory in seldom used pagesets without this patch. The slab allocator flushes its per cpu caches every 2 seconds. The following patch flushes the off node pageset caches in the same way by tying into the slab flush. The patch also changes /proc/zoneinfo to include the number of pages currently in each pageset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:18 -07:00
Benjamin LaHaise	c2f29ea111	[PATCH] __read_page_state(): pass unsigned long instead of unsigned By making the offset argument of __read_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:17 -07:00
Benjamin LaHaise	83e5d8f725	[PATCH] __mod_page_state(): pass unsigned long instead of unsigned By making the offset argument of __mod_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:17 -07:00
Darren Hart	1ad539b2bd	[PATCH] vm: try_to_free_pages unused argument try_to_free_pages accepts a third argument, order, but hasn't used it since before 2.6.0. The following patch removes the argument and updates all the calls to try_to_free_pages. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:17 -07:00
Badari Pulavarty	cbe37d0937	[PATCH] mm: remove PG_highmem Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll generate a little more code, but we don't use PageHigheMem() in many places. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:17 -07:00
Wolfgang Wander	1363c3cd86	[PATCH] Avoiding mmap fragmentation Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:16 -07:00
Christoph Lameter	e7c8d5c995	[PATCH] node local per-cpu-pages This patch modifies the way pagesets in struct zone are managed. Each zone has a per-cpu array of pagesets. So any particular CPU has some memory in each zone structure which belongs to itself. Even if that CPU is not local to that zone. So the patch relocates the pagesets for each cpu to the node that is nearest to the cpu instead of allocating the pagesets in the (possibly remote) target zone. This means that the operations to manage pages on remote zone can be done with information available locally. We play a macro trick so that non-NUMA pmachines avoid the additional pointer chase on the page allocator fastpath. AIM7 benchmark on a 32 CPU SGI Altix w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005 100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005 200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005 300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005 400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005 500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005 600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005 700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005 800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005 900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005 1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005 with slab API changes and pageset patch: Tasks jobs/min jti jobs/min/task real cpu 1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005 100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005 200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005 300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005 400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005 500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005 600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005 700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005 800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005 900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005 1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Shai Fultheim <Shai@Scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:16 -07:00
David Gibson	63551ae0fe	[PATCH] Hugepage consolidation A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch attempts to consolidate a lot of the code across the arch's, putting the combined version in mm/hugetlb.c. There are a couple of uglyish hacks in order to covert all the hugepage archs, but the result is a very large reduction in the total amount of code. It also means things like hugepage lazy allocation could be implemented in one place, instead of six. Tested, at least a little, on ppc64, i386 and x86_64. Notes: - this patch changes the meaning of set_huge_pte() to be more analagous to set_pte() - does SH4 need s special huge_ptep_get_and_clear()?? Acked-by: William Lee Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:15 -07:00
Martin Hicks	1e7e5a9048	[PATCH] VM: rate limit early reclaim When early zone reclaim is turned on the LRU is scanned more frequently when a zone is low on memory. This limits when the zone reclaim can be called by skipping the scan if another thread (either via kswapd or sync reclaim) is already reclaiming from the zone. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:14 -07:00
Martin Hicks	0c35bbadc5	[PATCH] VM: add __GFP_NORECLAIM When using the early zone reclaim, it was noticed that allocating new pages that should be spread across the whole system caused eviction of local pages. This adds a new GFP flag to prevent early reclaim from happening during certain allocation attempts. The example that is implemented here is for page cache pages. We want page cache pages to be spread across the whole system, and we don't want page cache pages to evict other pages to get local memory. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:14 -07:00
Martin Hicks	753ee72896	[PATCH] VM: early zone reclaim This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-21 18:46:14 -07:00

... 2 3 4 5 6 ...

866 commits