linux-2.6

dect

Archived

Author	SHA1	Message	Date
Peter Zijlstra	fbd90375d7	hrtimer: Remove cb_entry from struct hrtimer It's unused, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission>	2009-07-22 17:12:32 +02:00
Thomas Gleixner	6ff7041dbf	hrtimer: Fix migration expiry check The timer migration expiry check should prevent the migration of a timer to another CPU when the timer expires before the next event is scheduled on the other CPU. Migrating the timer might delay it because we can not reprogram the clock event device on the other CPU. But the code implementing that check has two flaws: - for !HIGHRES the check compares the expiry value with the clock events device expiry value which is wrong for CLOCK_REALTIME based timers. - the check is racy. It holds the hrtimer base lock of the target CPU, but the clock event device expiry value can be modified nevertheless, e.g. by an timer interrupt firing. The !HIGHRES case is easy to fix as we can enqueue the timer on the cpu which was selected by the load balancer. It runs the idle balancing code once per jiffy anyway. So the maximum delay for the timer is the same as when we keep the tick on the current cpu going. In the HIGHRES case we can get the next expiry value from the hrtimer cpu_base of the target CPU and serialize the update with the cpu_base lock. This moves the lock section in hrtimer_interrupt() so we can set next_event to KTIME_MAX while we are handling the expired timers and set it to the next expiry value after we handled the timers under the base lock. While the expired timers are processed timer migration is blocked because the expiry time of the timer is always <= KTIME_MAX. Also remove the now useless clockevents_get_next_event() function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-10 17:32:55 +02:00
Thomas Gleixner	7e0c5086c1	hrtimer: migration: do not check expiry time on current CPU The timer migration code needs to check whether the expiry time of the timer is before the programmed clock event expiry time when the timer is enqueued on another CPU because we can not reprogram the timer device on the other CPU. The current logic checks the expiry time even if we enqueue on the current CPU when nohz_get_load_balancer() returns current CPU. This might lead to an endless loop in the expiry check code when the expiry time of the timer is before the current programmed next event. Check whether nohz_get_load_balancer() returns current CPU and skip the expiry check if this is the case. The bug was triggered from the networking code. The patch fixes the regression http://bugzilla.kernel.org/show_bug.cgi?id=13738 (Soft-Lockup/Race in networking in 2.6.31-rc1+195) Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com Tested-by: Joao Correia <joaomiguelcorreia@gmail.com> Tested-by: Andres Freund <andres@anarazel.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-10 17:22:20 +02:00
Thomas Gleixner	a40f262cc2	timekeeping: Move ktime_get() functions to timekeeping.c The ktime_get() functions for GENERIC_TIME=n are still located in hrtimer.c. Move them to time/timekeeping.c where they belong. LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-07 13:00:31 +02:00
Martin Schwidefsky	951ed4d36b	timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y The generic ktime_get function defined in kernel/hrtimer.c is suboptimial for GENERIC_TIME=y: 0) \| ktime_get() { 0) \| ktime_get_ts() { 0) \| getnstimeofday() { 0) \| read_tod_clock() { 0) 0.601 us \| } 0) 1.938 us \| } 0) \| set_normalized_timespec() { 0) 0.602 us \| } 0) 4.375 us \| } 0) 5.523 us \| } Overall there are two read_seqbegin/read_seqretry loops and a lot of unnecessary struct timespec calculations. ktime_get returns a nano second value which is the sum of xtime, wall_to_monotonic and the nano second delta from the clock source. ktime_get can be optimized for GENERIC_TIME=y. The new version only calls clocksource_read: 0) \| ktime_get() { 0) \| read_tod_clock() { 0) 0.610 us \| } 0) 1.977 us \| } It uses a single read_seqbegin/readseqretry loop and just adds everthing to a nano second value. ktime_get_ts is optimized in a similar fashion. [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090707112728.3005244d@skybase> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-07 12:47:33 +02:00
Linus Torvalds	b7c142dbf1	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: start using hrtimers hrtimer: export ktime_add_safe UBIFS: do not forget to register BDI device UBIFS: allow sync option in rootflags UBIFS: remove dead code UBIFS: use anonymous device UBIFS: return proper error code if the compr is not present UBIFS: return error if link and unlink race UBIFS: reset no_space flag after inode deletion	2009-06-17 09:46:33 -07:00
Artem Bityutskiy	8daa21e61b	hrtimer: export ktime_add_safe We want to use hrtimers in UBIFS (for write-buffer write-back timer). We need the 'hrtimer_set_expires_range_ns()', which is an in-line function which uses 'ktime_add_safe()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:14:58 +03:00
Arun R Bharadwaj	eea08f32ad	timers: Logic to move non pinned timers * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: This patch migrates all non pinned timers and hrtimers to the current idle load balancer, from all the idle CPUs. Timers firing on busy CPUs are not migrated. While migrating hrtimers, care should be taken to check if migrating a hrtimer would result in a latency or not. So we compare the expiry of the hrtimer with the next timer interrupt on the target cpu and migrate the hrtimer only if it expires after the next interrupt on the target cpu. So, added a clockevents_get_next_event() helper function to return the next_event on the target cpu's clock_event_device. [ tglx: cleanups and simplifications ] Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-05-13 16:52:42 +02:00
Arun R Bharadwaj	597d027573	timers: Framework for identifying pinned timers * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: This patch creates a new framework for identifying cpu-pinned timers and hrtimers. This framework is needed because pinned timers are expected to fire on the same CPU on which they are queued. So it is essential to identify these and not migrate them, in case there are any. For regular timers, the currently existing add_timer_on() can be used queue pinned timers and subsequently mod_timer_pinned() can be used to modify the 'expires' field. For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are added to queue cpu-pinned hrtimer. [ tglx: use .._PINNED mode argument instead of creating tons of new functions ] Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-05-13 16:52:42 +02:00
Peter Zijlstra	7f1e2ca9f0	hrtimer: fix rq->lock inversion (again) It appears I inadvertly introduced rq->lock recursion to the hrtimer_start() path when I delegated running already expired timers to softirq context. This patch fixes it by introducing a __hrtimer_start_range_ns() method that will not use raise_softirq_irqoff() but __raise_softirq_irqoff() which avoids the wakeup. It then also changes schedule() to check for pending softirqs and do the wakeup then, I'm not quite sure I like this last bit, nor am I convinced its really needed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org LKML-Reference: <20090313112301.096138802@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-31 14:52:52 +02:00
Thomas Gleixner	b0a9b5111a	hrtimer: prevent negative expiry value after clock_was_set() Impact: prevent false positive WARN_ON() in clockevents_program_event() clock_was_set() changes the base->offset of CLOCK_REALTIME and enforces the reprogramming of the clockevent device to expire timers which are based on CLOCK_REALTIME. If the clock change is large enough then the subtraction of the timer expiry value and base->offset can become negative which triggers the warning in clockevents_program_event(). Check the subtraction result and set a negative value to 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-01-30 22:35:34 +01:00
Sebastien Dugue	94df7de028	hrtimers: allow the hot-unplugging of all cpus Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 22:35:29 +01:00
Frederic Weisbecker	7f22391cbe	hrtimers: increase clock min delta threshold while interrupt hanging Impact: avoid timer IRQ hanging slow systems While using the function graph tracer on a virtualized system, the hrtimer_interrupt can hang the system on an infinite loop. This can be caused in several situations: - the hardware is very slow and HZ is set too high - something intrusive is slowing the system down (tracing under emulation) ... and the next clock events to program are always before the current time. This patch implements a reasonable compromise: if such a situation is detected, we share the CPUs time in 1/4 to process the hrtimer interrupts. This is enough to let the system running without serious starvation. It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ with function graph tracer launched. On both cases, the clock events were increased until about 25 ms periodic ticks, which means 40 HZ. So we change a hard to debug hang into a warning message and a system that still manages to limp along. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 22:35:10 +01:00
Linus Torvalds	1e70c7f7a9	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix inconsistent lock state on resume in hres_timers_resume time-sched.c: tick_nohz_update_jiffies should be static locking, hpet: annotate false positive warning kernel/fork.c: unused variable 'ret' itimers: remove the per-cpu-ish-ness	2009-01-26 09:47:43 -08:00
Peter Zijlstra	1d4a7f1c4f	hrtimers: fix inconsistent lock state on resume in hres_timers_resume Andrey Borzenkov reported this lockdep assert: > [17854.688347] ================================= > [17854.688347] [ INFO: inconsistent lock state ] > [17854.688347] 2.6.29-rc2-1avb #1 > [17854.688347] --------------------------------- > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes: > [17854.688347] (&cpu_base->lock){++..}, at: [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] {in-hardirq-W} state was registered at: > [17854.688347] [<c01443cd>] __lock_acquire+0x79d/0x1930 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136e61>] hrtimer_run_queues+0x31/0x140 > [17854.688347] [<c0128d98>] run_local_timers+0x8/0x20 > [17854.688347] [<c0128dd3>] update_process_times+0x23/0x60 > [17854.688347] [<c013e274>] tick_periodic+0x24/0x80 > [17854.688347] [<c013e2e2>] tick_handle_periodic+0x12/0x70 > [17854.688347] [<c0104e24>] timer_interrupt+0x14/0x20 > [17854.688347] [<c01607b9>] handle_IRQ_event+0x29/0x60 > [17854.688347] [<c0161c59>] handle_level_irq+0x69/0xe0 > [17854.688347] [<ffffffff>] 0xffffffff > [17854.688347] irq event stamp: 55771 > [17854.688347] hardirqs last enabled at (55771): [<c0309125>] _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] hardirqs last disabled at (55770): [<c0309419>] _spin_lock_irqsave+0x19/0x80 > [17854.688347] softirqs last enabled at (54836): [<c0124f54>] __do_softirq+0xc4/0x110 > [17854.688347] softirqs last disabled at (54831): [<c01049ae>] do_softirq+0x8e/0xe0 > [17854.688347] > [17854.688347] other info that might help us debug this: > [17854.688347] 3 locks held by pm-suspend/18240: > [17854.688347] #0: (&buffer->mutex){--..}, at: [<c01dd4c5>] sysfs_write_file+0x25/0x100 > [17854.688347] #1: (pm_mutex){--..}, at: [<c015056f>] enter_state+0x4f/0x140 > [17854.688347] #2: (dpm_list_mtx){--..}, at: [<c027880f>] device_pm_lock+0xf/0x20 > [17854.688347] > [17854.688347] stack backtrace: > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1 > [17854.688347] Call Trace: > [17854.688347] [<c0306248>] ? printk+0x18/0x20 > [17854.688347] [<c0141fac>] print_usage_bug+0x16c/0x1d0 > [17854.688347] [<c0142bcf>] mark_lock+0x8bf/0xc90 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c01441b0>] __lock_acquire+0x580/0x1930 > [17854.688347] [<c030916d>] ? _spin_unlock+0x1d/0x20 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c013dd38>] ? clockevents_program_event+0x98/0x160 > [17854.688347] [<c0142fe8>] ? mark_held_locks+0x48/0x90 > [17854.688347] [<c0309125>] ? _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] [<c0143229>] ? trace_hardirqs_on_caller+0x139/0x190 > [17854.688347] [<c014328b>] ? trace_hardirqs_on+0xb/0x10 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c013711a>] hres_timers_resume+0xa/0x10 > [17854.688347] [<c013aa8e>] timekeeping_resume+0xee/0x150 > [17854.688347] [<c0273384>] __sysdev_resume+0x14/0x50 > [17854.688347] [<c0273407>] sysdev_resume+0x47/0x80 > [17854.688347] [<c02791ab>] device_power_up+0xb/0x20 > [17854.688347] [<c015043f>] suspend_devices_and_enter+0xcf/0x150 > [17854.688347] [<c0150c2f>] ? freeze_processes+0x3f/0x90 > [17854.688347] [<c0150614>] enter_state+0xf4/0x140 > [17854.688347] [<c01506dd>] state_store+0x7d/0xc0 > [17854.688347] [<c0150660>] ? state_store+0x0/0xc0 > [17854.688347] [<c0202da4>] kobj_attr_store+0x24/0x30 > [17854.688347] [<c01dd53c>] sysfs_write_file+0x9c/0x100 > [17854.688347] [<c019916c>] vfs_write+0x9c/0x160 > [17854.688347] [<c0103494>] ? restore_nocheck_notrace+0x0/0xe > [17854.688347] [<c01dd4a0>] ? sysfs_write_file+0x0/0x100 > [17854.688347] [<c01992ed>] sys_write+0x3d/0x70 > [17854.688347] [<c0103371>] sysenter_do_call+0x12/0x31 Andrey's analysis: > timekeeping_resume() is called via class ->resume > method; and according to comments in sysdev_resume() and > device_power_up(), they are called with interrupts disabled. > > Looking at suspend_enter, irqs are disabled at this point. > > So it actually looks like something (may be some driver) > unconditionally enabled irqs in resume path. Add a debug check to test this theory. If it triggers then it triggers because the resume code calls it with irqs enabled, which is a no-no not just for timekeeping_resume(), but also bad for a number of other resume handlers. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-18 21:31:37 +01:00
Heiko Carstens	58fd3aa288	[CVE-2009-0029] System call wrappers part 01 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:18 +01:00
Ingo Molnar	82c5b7b527	hrtimer: splitout peek ahead functionality, fix Impact: build fix on !CONFIG_HIGH_RES_TIMERS Fix: kernel/hrtimer.c:1586: error: implicit declaration of function '__hrtimer_peek_ahead_timers' Signen-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 14:11:10 +01:00
Thomas Gleixner	e3f1d88374	hrtimer: fixup comments Clean up the comments Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 13:14:34 +01:00
Peter Zijlstra	a6037b61c2	hrtimer: fix recursion deadlock by re-introducing the softirq Impact: fix rare runtime deadlock There are a few sites that do: spin_lock_irq(&foo) hrtimer_start(&bar) __run_hrtimer(&bar) func() spin_lock(&foo) which obviously deadlocks. In order to avoid this, never call __run_hrtimer() from hrtimer_start*() context, but instead defer this to softirq context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 13:14:33 +01:00
Thomas Gleixner	731a55ba0f	hrtimer: simplify hotplug migration Impact: cleanup No need for a smp function call, which is likely to run on the same CPU anyway. We can just call hrtimers_peek_ahead() in the interrupts disabled section of migrate_hrtimers(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 13:14:33 +01:00
Thomas Gleixner	d5fd43c4ae	hrtimer: fix HOTPLUG_CPU=n compile warning Impact: cleanup kernel/hrtimer.c: In function 'hrtimer_cpu_notify': kernel/hrtimer.c:1574: warning: unused variable 'dcpu' Introduced by commit `37810659ea` ("hrtimer: removing all ur callback modes, fix hotplug") from the timers. dcpu is only used if CONFIG_HOTPLUG_CPU is set. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 13:14:32 +01:00
Thomas Gleixner	8bdec955b0	hrtimer: splitout peek ahead functionality Impact: cleanup Provide a peek ahead function that assumes irqs disabled, allows for micro optimizations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-05 13:14:32 +01:00
Linus Torvalds	db200df0b3	Merge branch 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sparseirq: move __weak symbols into separate compilation unit sparseirq: work around __weak alias bug sparseirq: fix hang with !SPARSE_IRQ sparseirq: set lock_class for legacy irq when sparse_irq is selected sparseirq: work around compiler optimizing away __weak functions sparseirq: fix desc->lock init sparseirq: do not printk when migrating IRQ descriptors sparseirq: remove duplicated arch_early_irq_init() irq: simplify for_each_irq_desc() usage proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c irq: for_each_irq_desc() move to irqnr.h hrtimer: remove #include <linux/irq.h>	2008-12-31 09:00:59 -08:00
KOSAKI Motohiro	51bc39f4ba	hrtimer: remove #include <linux/irq.h> Impact: cleanup <linux/irq.h> can be removed and should be, because: - hrtimer doesn't use any irq feature. - <linux/irq.h> shouldn't be include from generic code. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-26 09:48:16 +01:00
Ingo Molnar	b2e3c0adec	hrtimers: fix warning in kernel/hrtimer.c this warning: kernel/hrtimer.c: In function ‘hrtimer_cpu_notify’: kernel/hrtimer.c:1574: warning: unused variable ‘dcpu’ is caused because 'dcpu' is only used in the CONFIG_HOTPLUG_CPU case. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 00:45:32 +01:00
Peter Zijlstra	a0a99b227d	hrtimer: removing all ur callback modes, fix > Ingo, this addition fixes the hotplug issue on my machine And because we're all human... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 17:20:38 +01:00
Peter Zijlstra	37810659ea	hrtimer: removing all ur callback modes, fix hotplug Impact: fix hrtimer locking (reported by lockdep) in the CPU hotplug case This addition fixes the hotplug locking issue on my machine Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-04 11:31:25 +01:00
Peter Zijlstra	ca109491f6	hrtimer: removing all ur callback modes Impact: cleanup, move all hrtimer processing into hardirq context This is an attempt at removing some of the hrtimer complexity by reducing the number of callback modes to 1. This means that all hrtimer callback functions will be ran from HARD-irq context. I went through all the 30 odd hrtimer callback functions in the kernel and saw only one that I'm not quite sure of, which is the one in net/can/bcm.c - hence I'm CC-ing the folks responsible for that code. Furthermore, the hrtimer core now calls callbacks directly with IRQs disabled in case you try to enqueue an expired timer. If this timer is a periodic timer (which should use hrtimer_forward() to advance its time) then it might be possible to end up in an inf. recursive loop due to the fact that hrtimer_forward() doesn't round up to the next timer granularity, and therefore keeps on calling the callback - obviously this needs a fix. Aside from that, this seems to compile and actually boot on my dual core test box - although I'm sure there are some bugs in, me not hitting any makes me certain :-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 15:45:46 +01:00
Peter Zijlstra	621a0d5207	hrtimer: clean up unused callback modes Impact: cleanup git grep HRTIMER_CB_IRQSAFE revealed half the callback modes are actually unused. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-12 09:54:40 +01:00
Gautham R Shenoy	5d5254f0d3	timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context Impact: fix incorrect locking triggered during hotplug-intense stress-tests While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline, we queue them on the cb_pending list, so that they won't go stale. Thus, when the callbacks of the timers run from the softirq context, they could run into potential deadlocks, since these callbacks assume that they're running with irq's disabled, thereby annoying lockdep! Fix this by emulating hardirq context while running these callbacks from the hrtimer softirq. ================================= [ INFO: inconsistent lock state ] 2.6.27 #2 -------------------------------- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes: (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc {in-hardirq-W} state was registered at: [<c014103c>] __lock_acquire+0x549/0x121e [<c0107890>] native_sched_clock+0x88/0x99 [<c013aa12>] clocksource_get_next+0x39/0x3f [<c0139abc>] update_wall_time+0x616/0x7df [<c0141d6b>] lock_acquire+0x5a/0x74 [<c0121724>] scheduler_tick+0x3a/0x18d [<c047ed45>] _spin_lock+0x1c/0x45 [<c0121724>] scheduler_tick+0x3a/0x18d [<c0121724>] scheduler_tick+0x3a/0x18d [<c012c436>] update_process_times+0x3a/0x44 [<c013c044>] tick_periodic+0x63/0x6d [<c013c062>] tick_handle_periodic+0x14/0x5e [<c010568c>] timer_interrupt+0x44/0x4a [<c0150c9f>] handle_IRQ_event+0x13/0x3d [<c0151c14>] handle_level_irq+0x79/0xbd [<c0105634>] do_IRQ+0x69/0x7d [<c01041e4>] common_interrupt+0x28/0x30 [<c047007b>] aac_probe_one+0x1a3/0x3f3 [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39 [<c01512b4>] setup_irq+0x1be/0x1f9 [<c065d70b>] start_kernel+0x259/0x2c5 [<ffffffff>] 0xffffffff irq event stamp: 50102 hardirqs last enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23 hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b softirqs last enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d other info that might help us debug this: no locks held by ksoftirqd/0/4. stack backtrace: Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2 [<c013f6cb>] print_usage_bug+0x13e/0x147 [<c013fef5>] mark_lock+0x493/0x797 [<c01410b1>] __lock_acquire+0x5be/0x121e [<c0141d6b>] lock_acquire+0x5a/0x74 [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c047ed45>] _spin_lock+0x1c/0x45 [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c01210fd>] finish_task_switch+0x41/0xbd [<c0107890>] native_sched_clock+0x88/0x99 [<c011dae6>] sched_rt_period_timer+0x0/0x1fc [<c0136dda>] run_hrtimer_pending+0x54/0xe5 [<c011dae6>] sched_rt_period_timer+0x0/0x1fc [<c0128afb>] __do_softirq+0x7b/0xef [<c0128ba6>] do_softirq+0x37/0x4d [<c0128c12>] ksoftirqd+0x56/0xc5 [<c0128bbc>] ksoftirqd+0x0/0xc5 [<c0134649>] kthread+0x38/0x5d [<c0134611>] kthread+0x0/0x5d [<c0104477>] kernel_thread_helper+0x7/0x10 ======================= Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-11 10:46:42 +01:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Thomas Gleixner	643bdf68f9	hrtimers: simplify hrtimer_peek_ahead_timers() Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 13:38:11 +02:00
Thomas Gleixner	e1dd7bc585	hrtimers: fix docbook comments hrtimer_start() and hrtimer_start_range_ns() handle relative and absolute timers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 13:33:36 +02:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Arjan van de Ven	651dab4264	Merge commit 'linus/master' into merge-linus Conflicts: arch/x86/kvm/i8254.c	2008-10-17 09:20:26 -07:00
Arjan van de Ven	dc4304f7de	rangetimers: fix the bug reported by Ingo for real and please hand me a brown paper bag (thanks to Thomas for pointing out this very obvious bug) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-10-13 11:08:34 -04:00
Arjan van de Ven	030aebd2e4	rangetimer: fix BUG_ON reported by Ingo There's a small race/chance that, while hrtimers are enabled globally, they're later not enabled when we're calling the hrtimer_interrupt() function, which then BUG_ON()'s for that. This patch closes that race/gap. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-10-11 12:25:45 -07:00
Thomas Gleixner	ccc7dadf73	hrtimer: prevent migration of per CPU hrtimers Impact: per CPU hrtimers can be migrated from a dead CPU The hrtimer code has no knowledge about per CPU timers, but we need to prevent the migration of such timers and warn when such a timer is active at migration time. Explicitely mark the timers as per CPU and use a more understandable mode descriptor for the interrupts safe unlocked callback mode, which is used by hrtimer_sleeper and the scheduler code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:14 +02:00
Thomas Gleixner	b00c1a99e7	hrtimer: mark migration state Impact: during migration active hrtimers can be seen as inactive The migration code removes the hrtimers from the queues of the dead CPU and sets the state temporary to INACTIVE. The enqueue code sets it to ACTIVE/PENDING again. Prevent that the wrong state can be seen by using a separate migration state bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:14 +02:00
Thomas Gleixner	41e1022eae	hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers Impact: Stale timers after a CPU went offline. commit `37bb6cb409` hrtimer: unlock hrtimer_wakeup changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due to locking problems. A result of this change is that when enqueue is called for an already expired hrtimer the callback function is not longer called directly from the enqueue code. The normal callers have been fixed in the code, but the migration code which moves hrtimers from a dead CPU to a live CPU was not made aware of this. This can be fixed by checking the timer state after the call to enqueue in the migration code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:14 +02:00
Thomas Gleixner	7659e34967	hrtimer: migrate pending list on cpu offline Impact: hrtimers which are on the pending list are not migrated at cpu offline and can be stale forever Add the pending list migration when CONFIG_HIGH_RES_TIMERS is enabled Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:13 +02:00
Mark McLoughlin	d7cfb60c5c	hrtimer: remove hrtimer_clock_base::get_softirq_time() Peter Zijlstra noticed this 8 months ago and I just noticed it again. hrtimer_clock_base::get_softirq_time() is currently unused in the entire tree. In fact, looking at the logs, it appears as if it was never used. Remove it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-22 13:52:42 +02:00
Arjan van de Ven	2e94d1f71f	hrtimer: peek at the timer queue just before going idle As part of going idle, we already look at the time of the next timer event to determine which C-state to select etc. This patch adds functionality that causes the timers that are past their soft expire time, to fire at this time, before we calculate the next wakeup time. This functionality will thus avoid wakeups by running timers before going idle rather than specially waking up for it. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-11 07:17:49 -07:00
Arjan van de Ven	3bd012060f	hrtimer: make the nanosleep() syscall use the per process slack This patch makes the nanosleep() system call use the per process slack value; with this users are able to externally control existing applications to reduce the wakeup rate. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-11 07:16:55 -07:00
Arjan van de Ven	da8f2e170e	hrtimer: add a hrtimer_start_range() function this patch adds a _range version of hrtimer_start() so that range timers can be created; the hrtimer_start() function is just a wrapper around this. In addition, hrtimer_start_expires() will now preserve existing ranges. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-07 10:58:01 -07:00
Arjan van de Ven	654c8e0b1c	hrtimer: turn hrtimers into range timers this patch turns hrtimers into range timers; they have 2 expire points 1) the soft expire point 2) the hard expire point the kernel will do it's regular best effort attempt to get the timer run at the hard expire point. However, if some other time fires after the soft expire point, the kernel now has the freedom to fire this timer at this point, and thus grouping the events and preventing a power-expensive wakeup in the future. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:27 -07:00
Arjan van de Ven	cc584b213f	hrtimer: convert kernel/* to the new hrtimer apis In order to be able to do range hrtimers we need to use accessor functions to the "expire" member of the hrtimer struct. This patch converts kernel/* to these accessors. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:13 -07:00
Arjan van de Ven	7bb67439bf	select: Introduce a hrtimeout function This patch adds a schedule_hrtimeout() function, to be used by select() and poll() in a later patch. This function works similar to schedule_timeout() in most ways, but takes a timespec rather than jiffies. With a lot of contributions/fixes from Thomas Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-05 21:34:53 -07:00
Oleg Nesterov	d82f0b0f6f	migrate_timers: add comment, use spinlock_irq() Add the comment to explain why the double lock in migrate_timers() can't deadlock. Change the code to use spinlock_irq() instead of local_irq_disable() + spin_lock(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 13:34:54 +02:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Linus Torvalds	da6e88f496	Merge branch 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: add PCI ID for 6300ESB force hpet x86: add another PCI ID for ICH6 force-hpet kernel-paramaters: document pmtmr= command line option acpi_pm clccksource: fix printk format warning nohz: don't stop idle tick if softirqs are pending. pmtmr: allow command line override of ioport nohz: reduce jiffies polling overhead hrtimer: Remove unused variables in ktime_divns() hrtimer: remove warning in hres_timers_resume posix-timers: print RT watchdog message	2008-07-15 10:39:57 -07:00
Linus Torvalds	85082fd7cb	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (241 commits) [ARM] 5171/1: ep93xx: fix compilation of modules using clocks [ARM] 5133/2: at91sam9g20 defconfig file [ARM] 5130/4: Support for the at91sam9g20 [ARM] 5160/1: IOP3XX: gpio/gpiolib support [ARM] at91: Fix NAND FLASH timings for at91sam9x evaluation kits. [ARM] 5084/1: zylonite: Register AC97 device [ARM] 5085/2: PXA: Move AC97 over to the new central device declaration model [ARM] 5120/1: pxa: correct platform driver names for PXA25x and PXA27x UDC drivers [ARM] 5147/1: pxaficp_ir: drop pxa_gpio_mode calls, as pin setting [ARM] 5145/1: PXA2xx: provide api to control IrDA pins state [ARM] 5144/1: pxaficp_ir: cleanup includes [ARM] pxa: remove pxa_set_cken() [ARM] pxa: allow clk aliases [ARM] Feroceon: don't disable BPU on boot [ARM] Orion: LED support for HP mv2120 [ARM] Orion: add RD88F5181L-FXO support [ARM] Orion: add RD88F5181L-GE support [ARM] Orion: add Netgear WNR854T support [ARM] s3c2410_defconfig: update for current build [ARM] Acer n30: Minor style and indentation fixes. ...	2008-07-14 16:06:58 -07:00
Linus Torvalds	666484f025	Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: remove irqs_disabled warning from local_bh_enable softirq: remove initialization of static per-cpu variable Remove argument from open_softirq which is always NULL	2008-07-14 15:28:42 -07:00
Thomas Gleixner	7dc9719682	Merge commit '900cfa46191a7d87cf1891924cb90499287fd235'; branches 'timers/nohz', 'timers/clocksource' and 'timers/posixtimers' into timers/for-linus	2008-07-14 18:09:05 +02:00
Russell King	f0006314d3	Merge branch 'imx' into devel Conflicts: arch/arm/mm/Kconfig	2008-07-10 16:41:50 +01:00
Steven Rostedt	ee3ece830f	hrtimer: prevent migration for raising softirq Due to a possible deadlock, the waking of the softirq was pushed outside of the hrtimer base locks. See commit `0c96c5979a` Unfortunately this allows the task to migrate after setting up the softirq and raising it. Since softirqs run a queue that is per-cpu we may raise the softirq on the wrong CPU and this will keep the queued softirq task from running. To solve this issue, this patch disables preemption around the releasing of the hrtimer lock and raising of the softirq. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-03 11:36:48 -07:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Carlos R. Mafra	900cfa4619	hrtimer: Remove unused variables in ktime_divns() The variables dns and inc are not used, remove them. Signed-off-by: Carlos R. Mafra <crmafra@gmail.com> Cc: tglx@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-26 23:26:55 +02:00
Jeremy Fitzhardinge	d031476408	hrtimer: remove warning in hres_timers_resume hres_timers_resume() warns if there appears to be more than one cpu online. This warning makes sense when the suspend/resume mechanism offlines all cpus but one during the suspend/resume process. However, Xen suspend does not need to offline the other cpus; it merely keeps them tied up in stop_machine() while the virtual machine is suspended. The warning hres_timers_resume issues is therefore spurious. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-26 23:26:32 +02:00
Carlos R. Mafra	962cf36c5b	Remove argument from open_softirq which is always NULL As git-grep shows, open_softirq() is always called with the last argument being NULL block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL); kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL); kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL); kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL); kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL); net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL); net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL); This observation has already been made by Matthew Wilcox in June 2002 (http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html) "I notice that none of the current softirq routines use the data element passed to them." and the situation hasn't changed since them. So it appears we can safely remove that extra argument to save 128 (54) bytes of kernel data (text). Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-25 07:43:15 +02:00
Russell King	ee9c578527	dyntick: Remove last reminants of dyntick support Remove the last reminants of dyntick support from the generic kernel. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2008-05-12 17:39:14 +01:00
Oliver Hartkopp	4346f65426	hrtimer: remove duplicate helper function The helper function hrtimer_callback_running() is used in kernel/hrtimer.c as well as in the updated net/can/bcm.c which now supports hrtimers. Moving the helper function to hrtimer.h removes the duplicate definition in the C-files. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-03 18:11:48 +02:00
Thomas Gleixner	237fc6e7a3	add hrtimer specific debugobjects code hrtimers have now dynamic users in the network code. Put them under debugobjects surveillance as well. Add calls to the generic object debugging infrastructure and provide fixup functions which allow to keep the system alive when recoverable problems have been detected by the object debugging core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:53 -07:00
Thomas Gleixner	0c96c5979a	hrtimer: raise softirq unlocked to avoid circular lock dependency The scheduler hrtimer bits in 2.6.25 introduced a circular lock dependency in a rare code path: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.25-sched-devel.git-x86-latest.git #19 ------------------------------------------------------- X/2980 is trying to acquire lock: (&rq->rq_lock_key#2){++..}, at: [<ffffffff80230146>] task_rq_lock+0x56/0xa0 but task is already holding lock: (&cpu_base->lock){++..}, at: [<ffffffff80257ae1>] lock_hrtimer_base+0x31/0x60 which lock already depends on the new lock. The scenario which leads to this is: posix-timer signal is delivered -> posix-timer is rearmed timer is already expired in hrtimer_enqueue() -> softirq is raised To prevent this we need to move the raise of the softirq out of the base->lock protected code path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2008-04-28 22:22:21 +02:00
Bodo Stroesser	d7b41a24bf	hrtimer: timeout too long when using HRTIMER_CB_SOFTIRQ When using hrtimer with timer->cb_mode == HRTIMER_CB_SOFTIRQ in some cases the clockevent is not programmed. This happens, if: - a timer is rearmed while it's state is HRTIMER_STATE_CALLBACK - hrtimer_reprogram() returns -ETIME, when it is called after CALLBACK is finished. This occurs if the new timer->expires is in the past when CALLBACK is done. In this case, the timer needs to be removed from the tree and put onto the pending list again. The patch is against 2.6.22.5, but AFAICS, it is relevant for 2.6.25 also (in run_hrtimer_pending()). Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-27 18:26:43 +02:00
Thomas Gleixner	259aae864c	hrtimer: optimize the softirq time optimization The previous optimization did not take the case into account where a clock provides its own softirq_get_time() function. Check for the availablitiy of the clock get time function first and then check if we need to retrieve the time for both clocks via hrtimer_softirq_gettime() to avoid a double evaluation of time in that case as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-21 07:59:51 +02:00
Dimitri Sivanich	833883d9ac	hrtimer: reduce calls to hrtimer_get_softirq_time() It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more often than it needs to. This can cause frequent contention on systems with large numbers of processors/cores. With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if there is a pending timer in one of the hrtimer bases, and only once. This also combines hrtimer_run_queues() and the inline run_hrtimer_queue() into one function. [ tglx@linutronix.de: coding style ] Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-21 07:59:51 +02:00
Oleg Nesterov	8e60e05fdc	hrtimers: simplify lockdep handling In order to avoid the false positive from lockdep, each per-cpu base->lock has the separate lock class and migrate_hrtimers() uses double_spin_lock(). This is overcomplicated: except for migrate_hrtimers() we never take 2 locks at once, and migrate_hrtimers() can use spin_lock_nested(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-17 12:22:31 +02:00
Thomas Gleixner	029a07e031	hrtimer: use nanosleep specific restart_block fields Convert all the nanosleep related users of restart_block to the new nanosleep specific restart_block fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-17 12:22:30 +02:00
Thomas Gleixner	63070a79ba	hrtimer: catch expired CLOCK_REALTIME timers early A CLOCK_REALTIME timer, which has an absolute expiry time less than the clock realtime offset calls with a negative delta into the clock events code and triggers the WARN_ON() there. This is a false positive and needs to be prevented. Check the result of timer->expires - timer->base->offset right away and return -ETIME right away. Thanks to Frans Pop, who reported the problem and tested the fixes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Frans Pop <elendil@planet.nl>	2008-02-14 22:08:30 +01:00
Thomas Gleixner	5a7780e725	hrtimer: check relative timeouts for overflow Various user space callers ask for relative timeouts. While we fixed that overflow issue in hrtimer_start(), the sites which convert relative user space values to absolute timeouts themself were uncovered. Instead of putting overflow checks into each place add a function which does the sanity checking and convert all affected callers to use it. Thanks to Frans Pop, who reported the problem and tested the fixes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Tested-by: Frans Pop <elendil@planet.nl>	2008-02-14 22:08:30 +01:00
Oleg Nesterov	c289b074b6	hrtimer: don't modify restart_block->fn in restart functions hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is pointless and complicates its usage. Note that if sys_restart_syscall() doesn't actually happen, we have a bogus "pending" restart->fn anyway, this is harmless. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-10 10:48:03 +01:00
Oleg Nesterov	080344b988	hrtimer: fix rmtp handling in hrtimer_nanosleep() Spotted by Pavel Emelyanov and Alexey Dobriyan. hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to the local variable which lives in the caller's stack frame. This means that if sys_restart_syscall() actually happens and it is interrupted as well, we don't update the user-space variable, but write into the already dead stack frame. Introduced by commit `04c227140f` hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier Change the callers to pass "__user rmtp" to hrtimer_nanosleep(), and change hrtimer_nanosleep() to use copy_to_user() to actually update rmtp. Small problem remains. man 2 nanosleep states that rtmp should be written if nanosleep() was interrupted (it says nothing whether it is OK to update rmtp if nanosleep returns 0), but (with or without this patch) we can dirty rem even if nanosleep() returns 0. NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other bugs. Fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> include/linux/hrtimer.h \| 2 - kernel/hrtimer.c \| 51 +++++++++++++++++++++++++----------------------- kernel/posix-timers.c \| 14 +------------ 3 files changed, 30 insertions(+), 37 deletions(-)	2008-02-10 10:48:03 +01:00
Davide Libenzi	4d672e7ac7	timerfd: new timerfd API This is the new timerfd API as it is implemented by the following patch: int timerfd_create(int clockid, int flags); int timerfd_settime(int ufd, int flags, const struct itimerspec utmr, struct itimerspec otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); The timerfd_create() API creates an un-programmed timerfd fd. The "clockid" parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME. The timerfd_settime() API give new settings by the timerfd fd, by optionally retrieving the previous expiration time (in case the "otmr" parameter is not NULL). The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit is set in the "flags" parameter. Otherwise it's a relative time. The timerfd_gettime() API returns the next expiration time of the timer, or {0, 0} if the timerfd has not been set yet. Like the previous timerfd API implementation, read(2) and poll(2) are supported (with the same interface). Here's a simple test program I used to exercise the new timerfd APIs: http://www.xmailserver.org/timerfd-test2.c [akpm@linux-foundation.org: coding-style cleanups] [akpm@linux-foundation.org: fix ia64 build] [akpm@linux-foundation.org: fix m68k build] [akpm@linux-foundation.org: fix mips build] [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds] [heiko.carstens@de.ibm.com: fix s390] [akpm@linux-foundation.org: fix powerpc build] [akpm@linux-foundation.org: fix sparc64 more] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Peter Zijlstra	3588a085cd	hrtimer: fix hrtimer_init_sleeper() users this patch: commit `37bb6cb409` Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Jan 25 21:08:32 2008 +0100 hrtimer: unlock hrtimer_wakeup Broke hrtimer_init_sleeper() users. It forgot to fix up the futex caller of this function to detect the failed queueing and messed up the do_nanosleep() caller in that it could leak a TASK_INTERRUPTIBLE state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-01 17:45:13 +01:00
Peter Zijlstra	37bb6cb409	hrtimer: unlock hrtimer_wakeup hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:32 +01:00
Peter Zijlstra	d3d74453c3	hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallback Currently all highres=off timers are run from softirq context, but HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context. Fix this up by splitting it similar to the highres=on case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:31 +01:00
Peter Zijlstra	2d44ae4d71	hrtimer: clean up cpu->base locking tricks In order to more easily allow for the scheduler to use timers, clean up the locking a bit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:31 +01:00
Randy Dunlap	0ec160dd48	hrtimer: fix section mismatch Fix section mismatch in hrtimer.c: WARNING: vmlinux.o(.text+0x50c61): Section mismatch: reference to .init.text: (between 'hrtimer_cpu_notify' and 'down_read_trylock') Noticed by Johannes Berg and confirmed by Sam Ravnborg. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@akpm@linux-foundation.org>	2008-01-21 19:39:41 -08:00
Thomas Gleixner	62f0f61e66	hrtimers: avoid overflow for large relative timeouts Relative hrtimers with a large timeout value might end up as negative timer values, when the current time is added in hrtimer_start(). This in turn is causing the clockevents_set_next() function to set an huge timeout and sleep for quite a long time when we have a clock source which is capable of long sleeps like HPET. With PIT this almost goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code sorts this out in the next timer interrupt, so we never noticed that problem which has been there since the first day of hrtimers. This bug became more apparent in 2.6.24 which activates HPET on more hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-12-07 19:16:17 +01:00
Michael Ellerman	edfed66e17	Quieten hrtimer printk: "Switched to high resolution mode .." Change the hrtimer printk "Switched to high resolution mode .." to be KERN_DEBUG, rather than KERN_INFO. If users need to see this they can pass "loglevel" or "debug" on the command line, or check dmesg. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> kernel/hrtimer.c \| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)	2007-10-29 09:39:38 +01:00
Uwe Kleine-König	6506f2aa66	fix comment: unlock_hrtimer_base is the counterpart of lock_hrtimer_base Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:56:53 +02:00
Robert P. J. Day	3a4fa0a25d	Fix misspellings of "system", "controller", "interrupt" and "necessary". Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:10:43 +02:00
Anton Blanchard	04c227140f	hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier Pull the copy_to_user out of hrtimer_nanosleep and into the callers (common_nsleep, sys_nanosleep) in preparation for converting compat_sys_nanosleep to use hrtimers. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-18 22:54:18 +02:00
Arnaldo Carvalho de Melo	a272378d11	[KTIME]: Introduce ktime_sub_ns and ktime_sub_us First user will be the DCCP transport networking protocol. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:12 -07:00
john stultz	17c38b7490	Cache xtime every call to update_wall_time This avoids xtime lag seen with dynticks, because while 'xtime' itself is still not updated often, we keep a 'xtime_cache' variable around that contains the approximate real-time that _is_ updated each time we do a 'update_wall_time()', and is thus never off by more than one tick. IOW, this restores the original semantics for 'xtime' users, as long as you use the proper abstraction functions (ie 'current_kernel_time()' or 'get_seconds()' depending on whether you want a timespec or just the seconds field). [ Updated Patch. As penance for my sins I've also yanked another #ifdef that was added to avoid the xtime lag w/ hrtimers. ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-25 10:17:44 -07:00
john stultz	2c6b47de17	Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). This avoids use of the kernel-internal "xtime" variable directly outside of the actual time-related functions. Instead, use the helper functions that we already have available to us. This doesn't actually change any behaviour, but this will allow us to fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ (because much of the realtime information is maintained as separate offsets to 'xtime'), which has caused interfaces that use xtime directly to get a time that is out of sync with the real-time clock by up to a third of a second or so. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-25 10:09:20 -07:00
Ingo Molnar	99bc2fcb28	hrtimer: speedup hrtimer_enqueue Speedup hrtimer_enqueue by evaluating the rbtree insertion result. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:15 -07:00
Ingo Molnar	820de5c39e	highres: improve debug output Add some more debug information to the hrtimer and clock events code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:15 -07:00
David Miller	7713a7d195	[HRTIMER] Fix cpu pointer arg to clockevents_notify() All of the clockevent notifiers expect a pointer to an "unsigned int" cpu argument, but hrtimer_cpu_notify() passes in a pointer to a long. [ Discussed with and ok by Thomas Gleixner ] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 17:29:56 -07:00
Rafael J. Wysocki	8bb7844286	Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Stas Sergeev	6bdb6b620e	export hrtimer_forward Other symbols of the hrtimers API are already exported. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:15 -07:00
David Howells	b8b8fd2dc2	[NET]: Fix networking compilation errors Fix miscellaneous networking compilation errors. () Export ktime_add_ns() for modules. () wext_proc_init() should have an ANSI declaration. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:31:24 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Ingo Molnar	995f054f2a	[PATCH] high-res timers: resume fix Soeren Sonnenburg reported that upon resume he is getting this backtrace: [<c0119637>] smp_apic_timer_interrupt+0x57/0x90 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0104d30>] apic_timer_interrupt+0x28/0x30 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0140068>] __kfifo_put+0x8/0x90 [<c0130fe5>] on_each_cpu+0x35/0x60 [<c0143538>] clock_was_set+0x18/0x20 [<c0135cdc>] timekeeping_resume+0x7c/0xa0 [<c02aabe1>] __sysdev_resume+0x11/0x80 [<c02ab0c7>] sysdev_resume+0x47/0x80 [<c02b0b05>] device_power_up+0x5/0x10 it turns out that on resume we mistakenly re-enable interrupts too early. Do the timer retrigger only on the current CPU. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Soeren Sonnenburg <kernel@nn7.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-07 10:03:43 -07:00
Ingo Molnar	935c631db8	[PATCH] hrtimers: fix reprogramming SMP race hrtimer_start() incorrectly set the 'reprogram' flag to enqueue_hrtimer(), which should only be 1 if the hrtimer is queued to the current CPU. Doing otherwise could result in a reprogramming of the current CPU's clockevents device, with a timer that is not queued to it - resulting in a bogus next expiry value. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-28 13:44:31 -07:00
Thomas Gleixner	ad28d94abb	[PATCH] hrtimer: fix up unlocked access to wall_to_monotonic commit `f4304ab215` (HZ free NTP) moved the access to wall_to_monotonic in hrtimer_get_softirq_time() out of the xtime_lock protection. Move it back into the seq_lock section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Thomas Gleixner	13788ccc41	[PATCH] hrtimer: prevent overrun DoS in hrtimer_forward() hrtimer_forward() does not check for the possible overflow of timer->expires. This can happen on 64 bit machines with large interval values and results currently in an endless loop in the softirq because the expiry value becomes negative and therefor the timer is expired all the time. Check for this condition and set the expiry value to the max. expiry time in the future. The fix should be applied to stable kernel series as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Thomas Gleixner	f8953856eb	[PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable clocksource and clock event sources are registered. The switch to high resolution mode happens inside of the TIMER_SOFTIRQ, but runs the softirq afterwards. That way the tick emulation timer, which was set up in the switch to highres might be executed in the softirq context, which is a BUG. The rbtree has not to be touched by the softirq after the highres switch. This BUG was observed by Andres Salomon, who provided the information to debug it. Return early from the softirq, when the switch was sucessful. [dilinger@debian.org: add debug warning] [akpm@linux-foundation.org: make debug warning compile] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andres Salomon <dilinger@debian.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:25 -08:00
Heiko Carstens	e81ce1f7ec	[PATCH] timer/hrtimer: take per cpu locks in sane order Doing something like this on a two cpu system # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online # echo 0 > /sys/devices/system/cpu/cpu1/online will give me this: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.21-rc2-g562aa1d4-dirty #7 ------------------------------------------------------- bash/1282 is trying to acquire lock: (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240 but task is already holding lock: (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240 which lock already depends on the new lock. This happens because we have the following code in kernel/hrtimer.c: migrate_hrtimers(int cpu) [...] old_base = &per_cpu(hrtimer_bases, cpu); new_base = &get_cpu_var(hrtimer_bases); [...] spin_lock(&new_base->lock); spin_lock(&old_base->lock); Which means the spinlocks are taken in an order which depends on which cpu gets shut down from which other cpu. Therefore lockdep complains that there might be an ABBA deadlock. Since migrate_hrtimers() gets only called on cpu hotplug it's safe to assume that it isn't executed concurrently on a The same problem exists in kernel/timer.c: migrate_timers(). As pointed out by Christian Borntraeger one possible solution to avoid the locking order complaints would be to make sure that the locks are always taken in the same order. E.g. by taking the lock of the cpu with the lower number first. To achieve this we introduce two new spinlock functions double_spin_lock and double_spin_unlock which lock or unlock two locks in a given order. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Christian Borntraeger <cborntra@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-05 07:57:53 -08:00

1 2 3 4

200 Commits