dect
/
linux-2.6
Archived
13
0
Fork 0
Commit Graph

16398 Commits

Author SHA1 Message Date
Boris Ostrovsky 36c46ca4f3 x86, microcode, AMD: Add support for family 16h processors
Add valid patch size for family 16h processors.

[ hpa: promoting to urgent/stable since it is hw enabling and trivial ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1353004910-2204-1-git-send-email-boris.ostrovsky@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-11-20 22:23:28 -08:00
H. Peter Anvin cb57a2b4cf x86-32: Export kernel_stack_pointer() for modules
Modules, in particular oprofile (and possibly other similar tools)
need kernel_stack_pointer(), so export it using EXPORT_SYMBOL_GPL().

Cc: Yang Wei <wei.yang@windriver.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Jun Zhang <jun.zhang@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-20 22:23:23 -08:00
Robert Richter 1022623842 x86-32: Fix invalid stack address while in softirq
In 32 bit the stack address provided by kernel_stack_pointer() may
point to an invalid range causing NULL pointer access or page faults
while in NMI (see trace below). This happens if called in softirq
context and if the stack is empty. The address at &regs->sp is then
out of range.

Fixing this by checking if regs and &regs->sp are in the same stack
context. Otherwise return the previous stack pointer stored in struct
thread_info. If that address is invalid too, return address of regs.

 BUG: unable to handle kernel NULL pointer dereference at 0000000a
 IP: [<c1004237>] print_context_stack+0x6e/0x8d
 *pde = 00000000
 Oops: 0000 [#1] SMP
 Modules linked in:
 Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch
 EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0
 EIP is at print_context_stack+0x6e/0x8d
 EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a
 ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
 Stack:
  000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c
  f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000
  f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c
 Call Trace:
  [<c1003723>] dump_trace+0x7b/0xa1
  [<c12dce1c>] x86_backtrace+0x40/0x88
  [<c12db712>] ? oprofile_add_sample+0x56/0x84
  [<c12db731>] oprofile_add_sample+0x75/0x84
  [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260
  [<c12dd40d>] profile_exceptions_notify+0x23/0x4c
  [<c1395034>] nmi_handle+0x31/0x4a
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c13950ed>] do_nmi+0xa0/0x2ff
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c13949e5>] nmi_stack_correct+0x28/0x2d
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c1003603>] ? do_softirq+0x4b/0x7f
  <IRQ>
  [<c102a06f>] irq_exit+0x35/0x5b
  [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a
  [<c1394746>] apic_timer_interrupt+0x2a/0x30
 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
 EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
 CR2: 000000000000000a
 ---[ end trace 62afee3481b00012 ]---
 Kernel panic - not syncing: Fatal exception in interrupt

V2:
* add comments to kernel_stack_pointer()
* always return a valid stack address by falling back to the address
  of regs

Reported-by: Yang Wei <wei.yang@windriver.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jun Zhang <jun.zhang@intel.com>
2012-11-20 22:23:20 -08:00
H. Peter Anvin c1ddb48204 Merge commit 'efi-for-3.7-v2' into x86/urgent 2012-11-20 16:49:15 -08:00
Matt Fleming 0f905a43ce x86, efi: Fix processor-specific memcpy() build error
Building for Athlon/Duron/K7 results in the following build error,

arch/x86/boot/compressed/eboot.o: In function `__constant_memcpy3d':
eboot.c:(.text+0x385): undefined reference to `_mmx_memcpy'
arch/x86/boot/compressed/eboot.o: In function `efi_main':
eboot.c:(.text+0x1a22): undefined reference to `_mmx_memcpy'

because the boot stub code doesn't link with the kernel proper, and
therefore doesn't have access to the 3DNow version of memcpy. So,
follow the example of misc.c and #undef memcpy so that we use the
version provided by misc.c.

See https://bugzilla.kernel.org/show_bug.cgi?id=50391

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Ryan Underwood <nemesis@icequake.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-11-20 20:52:07 +00:00
Cesar Eduardo Barros caaa8c6339 x86: remove dummy long from EFI stub
Commit 2e064b1 (x86, efi: Fix issue of overlapping .reloc section for
EFI_STUB) removed a dummy reloc added by commit 291f363 (x86, efi: EFI
boot stub support), but forgot to remove the dummy long used by that
reloc.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Lee G Rosenbaum <lee.g.rosenbaum@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-11-20 20:17:48 +00:00
David Howells 60606d4248 Merge branch 'x86-pre-uapi' into perf-uapi
David Howells (1):
      x86: Export asm/{svm.h,vmx.h,perf_regs.h}
2012-11-19 21:50:58 +00:00
Steven Rostedt 6c8d8b3c69 x86_32: Return actual stack when requesting sp from regs
As x86_32 traps do not save sp when taken in kernel mode, we need to
accommodate the sp when requesting to get the register.

This affects kprobes.

Before:

 # echo 'p:ftrace sys_read+4 s=%sp' > /debug/tracing/kprobe_events
 # echo 1 > /debug/tracing/events/kprobes/enable
 # cat trace
            sshd-1345  [000] d...   489.117168: ftrace: (sys_read+0x4/0x70) s=b7e96768
            sshd-1345  [000] d...   489.117191: ftrace: (sys_read+0x4/0x70) s=b7e96768
             cat-1447  [000] d...   489.117392: ftrace: (sys_read+0x4/0x70) s=5a7
             cat-1447  [001] d...   489.118023: ftrace: (sys_read+0x4/0x70) s=b77ad05f
            less-1448  [000] d...   489.118079: ftrace: (sys_read+0x4/0x70) s=b7762e06
            less-1448  [000] d...   489.118117: ftrace: (sys_read+0x4/0x70) s=b7764970

After:
            sshd-1352  [000] d...   362.348016: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            sshd-1352  [000] d...   362.348048: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            bash-1355  [001] d...   362.348081: ftrace: (sys_read+0x4/0x70) s=f5075fa8
            sshd-1352  [000] d...   362.348082: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            sshd-1352  [000] d...   362.690950: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            bash-1355  [001] d...   362.691033: ftrace: (sys_read+0x4/0x70) s=f5075fa8

Link: http://lkml.kernel.org/r/1342208654.30075.22.camel@gandalf.stny.rr.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19 05:31:37 -05:00
Greg Kroah-Hartman 1e619a1bf9 Merge 3.7-rc6 into tty-next 2012-11-16 18:26:00 -08:00
Takashi Iwai 29282fde80 KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()
The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
and this triggers kernel warnings like below on old CPUs:

    vmwrite error: reg 401e value a0568000 (err 12)
    Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
    Call Trace:
     [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
     [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
     [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
     [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
     [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
     [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
     [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
     [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
     [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
     [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
     [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
     [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
     [<ffffffff81139d76>] ? remove_vma+0x56/0x60
     [<ffffffff8113b708>] ? do_munmap+0x328/0x400
     [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
     [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
     [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f

This patch adds a check for the availability of secondary exec
control to avoid these warnings.

Cc: <stable@vger.kernel.org> [v3.6+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-16 20:25:18 -02:00
Rafael J. Wysocki bacaf7cd09 Revert "ACPI / x86: Add quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info"
This reverts commit 0a290ac425 on
the basis of the following comment from Bjorn Helgaas:

  Here's my reasoning: this is a CheckPoint product, and it looks
  like an appliance, not really a general-purpose machine.  The issue
  has apparently been there from day one, and the kernel shipped on
  the machine complains noisily about the issue, but apparently
  nobody bothered to investigate it.

  This corruption will clearly break other ACPI-related things.  We
  can sort of work around this one (though the workaround does
  prevent us from doing any PCI resource reassignment), but we have
  no idea what the other lurking ACPI issues are (and we have no
  assurance that *only* ACPI things are broken -- maybe the
  memory corruption affects other unknown things).  It may take
  significant debugging effort to identify the next problem.

  The only report I've seen (this one) is apparently from a
  CheckPoint employee, so it's not clear that anybody else is trying
  to run upstream Linux on it.  Being a CheckPoint employee, [...]
  is probably in a position to get the BIOS fixed.

  You might still be able to convince me, but it seems like the
  benefit to a quirk for this platform is small, and it does cost
  everybody else something in code size and complexity.

References: https://bugzilla.kernel.org/show_bug.cgi?id=47981#c36
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-16 11:08:31 +01:00
Fenghua Yu a71c8bc5df x86, topology: Debug CPU0 hotplug
CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
User can online CPU0 back after boot time. The default value of the switch is
off.

To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
cpu0_hotplug kernel parameter at boot.

It's safe and early place to take down CPU0 after all hotplug notifiers
are installed and SMP is booted.

Please note that some applications or drivers, e.g. some versions of udevd,
during boot time may put CPU0 online again in this CPU0 hotplug debug mode.

In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
when CPU0 is onlined back after boot time. This is because pending interrupt in
IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
other CPUs as well. It is harmless except the first CPU0 online takes a bit
longer time. And so this debug mode is useful to expose this issue. I'll send
a seperate patch to fix this generic warning issue.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:11 -08:00
Fenghua Yu 6f5298c213 x86/i387.c: Initialize thread xstate only on CPU0 only once
init_thread_xstate() is only called once to avoid overriding xstate_size during
boot time or during CPU hotplug.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-14-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:11 -08:00
Fenghua Yu 8d966a0410 x86, hotplug: Handle retrigger irq by the first available CPU
The first cpu in irq cfg->domain is likely to be CPU 0 and may not be available
when CPU 0 is offline. Instead of using CPU 0 to handle retriggered irq, we use
first available CPU which is online and in this irq's domain.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-13-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:11 -08:00
Fenghua Yu 30242aa602 x86, hotplug: The first online processor saves the MTRR state
Ask the first online CPU to save mtrr instead of asking BSP. BSP could be
offline when mtrr_save_state() is called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-12-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:10 -08:00
Fenghua Yu 27fd185f3d x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
Previously these functions were not run on the BSP (CPU 0, the boot processor)
since the boot processor init would only be executed before this functionality
was initialized.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-11-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:10 -08:00
Fenghua Yu e1c467e690 x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
code which is not a desired behavior for waking up BSP. To avoid the boot-strap
code, wake up CPU0 by NMI instead.

This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
physically hot removed and then hot added), NMI won't wake it up. We'll change
this code in the future to wake up hard offlined CPU0 if real platform and
request are available.

AP is still waken up as before by INIT, SIPI, SIPI sequence.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:28:03 -08:00
Andy Shevchenko 35e92b78c1 ACPI / x86: Export acpi_[un]register_gsi()
These functions might be called from modules as well so make sure
they are exported.

In addition, implement empty version of acpi_unregister_gsi() and
remove the one from pci_irq.c.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:28:00 +01:00
Mika Westerberg 06f64c8f23 driver core / ACPI: Move ACPI support to core device and driver types
With ACPI 5 we are starting to see devices that don't natively support
discovery but can be enumerated with the help of the ACPI namespace.
Typically, these devices can be represented in the Linux device driver
model as platform devices or some serial bus devices, like SPI or I2C
devices.

Since we want to re-use existing drivers for those devices, we need a
way for drivers to specify the ACPI IDs of supported devices, so that
they can be matched against device nodes in the ACPI namespace.  To
this end, it is sufficient to add a pointer to an array of supported
ACPI device IDs, that can be provided by the driver, to struct device.

Moreover, things like ACPI power management need to have access to
the ACPI handle of each supported device, because that handle is used
to invoke AML methods associated with the corresponding ACPI device
node.  The ACPI handles of devices are now stored in the archdata
member structure of struct device whose definition depends on the
architecture and includes the ACPI handle only on x86 and ia64. Since
the pointer to an array of supported ACPI IDs is added to struct
device_driver in an architecture-independent way, it is logical to
move the ACPI handle from archdata to struct device itself at the same
time.  This also makes code more straightforward in some places and
follows the example of Device Trees that have a poiter to struct
device_node in there too.

This changeset is based on Mika Westerberg's work.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:28:00 +01:00
Kristen Carlson Accardi 1bad2f19f7 ACPI / Sleep: add acpi_sleep=nonvs_s3 parameter
The ACPI specificiation would like us to save NVS at hibernation time,
but makes no mention of saving NVS over S3.  Not all versions of
Windows do this either, and it is clear that not all machines need NVS
saved/restored over S3.  Allow the user to improve their suspend/resume
time by disabling the NVS save/restore at S3 time, but continue to do
the NVS save/restore for S4 as specified.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:16:02 +01:00
Feng Tang 0a290ac425 ACPI / x86: Add quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info
This is to fix a regression https://bugzilla.kernel.org/show_bug.cgi?id=47981

The CheckPoint P-20-00 works ok before new machines (2008 and later) are
forced to use the bridge _CRS info by default in 2.6.34. Add this quirk
to restore its old way of working: not using bridge _CRS info.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:16:01 +01:00
Joonsoo Kim ddd32b4289 x86, mm: Correct vmflag test for checking VM_HUGETLB
commit 611ae8e3f5204f7480b3b405993b3352cfa16662('enable tlb flush range
support for x86') change flush_tlb_mm_range() considerably. After this,
we test whether vmflag equal to VM_HUGETLB and it may be always failed,
because vmflag usually has other flags simultaneously.
Our intention is to check whether this vma is for hughtlb, so correct it
according to this purpose.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1352740656-19417-1-git-send-email-js1304@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 15:03:20 -08:00
Greg Kroah-Hartman 54d5f88f25 Merge v3.7-rc5 into tty-next
This pulls in the 3.7-rc5 fixes into tty-next to make it easier to test.
2012-11-14 12:30:12 -08:00
Fenghua Yu 3e2a0cc3cd x86-32, hotplug: Add start_cpu0() entry point to head_32.S
start_cpu0() is defined in head_32.S for 32-bit. The function sets up stack and
jumps to start_secondary() for CPU0 wake up.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:52 -08:00
Fenghua Yu 42e78e9719 x86-64, hotplug: Add start_cpu0() entry point to head_64.S
start_cpu0() is defined in head_64.S for 64-bit. The function sets up stack and
jumps to start_secondary() for CPU0 wake up.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:51 -08:00
Fenghua Yu 209efae129 x86, hotplug, suspend: Online CPU0 for suspend or hibernate
Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
be executed if CPU0 is detected offline. To make suspend or hibernate and
further resume succeed, CPU0 must be online.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:49 -08:00
Fenghua Yu 30106c1743 x86, hotplug: Support functions for CPU0 online/offline
Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time.

Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after
it's offline.

Continue to online CPU0 in native_cpu_up().

Continue to offline CPU0 in native_cpu_disable().

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:48 -08:00
Fenghua Yu 4d25031a81 x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is enabled
by default.

If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug feature is not
enabled by default. The kernel parameter cpu0_hotplug can enable CPU0 hotplug
feature at boot.

Currently the feature is supported on Intel platforms only.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:47 -08:00
Fenghua Yu 80aa1dff65 x86, Kconfig: Add config switch for CPU0 hotplug
New config switch CONFIG_BOOTPARAM_HOTPLUG_CPU0 sets default state of whether
the CPU0 hotplug is on or off.

If the switch is off, CPU0 is not hotpluggable by default. But the CPU0 hotplug
feature can still be turned on by kernel parameter cpu0_hotplug at boot.

If the switch is on, CPU0 is always hotpluggable.

The default value of the switch is off.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-14 09:39:46 -08:00
David Sharp 8be0709f10 tracing: Format non-nanosec times from tsc clock without a decimal point.
With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13 15:48:40 -05:00
David Sharp 8cbd9cc625 tracing,x86: Add a TSC trace_clock
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13 15:48:27 -05:00
Andreas Herrmann 27d3a8a26a x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD
The patch is based on a patch submitted by Hans Rosenfeld.
See http://marc.info/?l=linux-kernel&m=133908777200931

Note that  CPUID Fn8000_001D_EAX slightly differs to Intel's CPUID function 4.

Bits 14-25 contain NumSharingCache. Actual number of cores sharing
           this cache. SW to add value of one to get result.

The corresponding bits on Intel are defined as "maximum number of threads
sharing this cache" (with a "plus 1" encoding).

Thus a different method to determine which cores are sharing a cache
level has to be used.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019090209.GG26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-13 11:22:31 -08:00
Andreas Herrmann 2e8458dfe4 x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD
Rely on CPUID 0x8000001d for cache information when AMD CPUID topology
extensions are available.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019090049.GF26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-13 11:22:30 -08:00
Andreas Herrmann 04a1541828 x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD
CPUID 0x8000001d works quite similar to Intels' CPUID function 4.
Use it to determine number of cache leafs.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019085933.GE26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-13 11:22:29 -08:00
Andreas Herrmann 193f3fcb3a x86: Add cpu_has_topoext
Introduce cpu_has_topoext to check for AMD's CPUID topology extensions
support. It indicates support for
CPUID Fn8000_001D_EAX_x[N:0]-CPUID Fn8000_001E_EDX

See AMD's CPUID Specification, Publication # 25481
(as of Rev. 2.34 September 2010)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019085813.GD26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-13 11:22:28 -08:00
Ingo Molnar 226f69a4b7 Fix problem in CMCI rediscovery code that was illegally
migrating worker threads to other cpus.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQkEqpAAoJEKurIx+X31iBZk0P/2h4IkLYz7DspI9gxVMXfMEm
 0lIWWIEaqAbOkFsi8VuGjlNrgU+7PabKs/2/++tfbq+hJdQYCCxyAKCGeWbdBw/R
 fUSTiyQYH84DEFySg6G1AJQwVB8nnRLNWm5wrUtMgX9/2E6D5dpFB0F301XLF+kg
 OMY7RaFPWJRiWwlOnWWnbY3czNMragaTAyHIudj7ZvsgwBNWw3bgGY/sjIjJ3yy5
 kyz0gYEsanOizSjT6Udr2MPFY2ol11co1MT6Ro4r7ORCvX2wSUTChUks2kZBzJ7l
 Jf9g22ymVlvAo2qsCs/DBzRwXw/Ck0MlUMH8QehvMPLD39yoBiUYDeEqRpadmsQE
 FLDyKBoxaH6nRzGCDJlTzD2FogHnChQaUtQ9nnyoSBNOjYt2lI8Dc3jEnXwWprim
 3P2giL10Gf4LRdHSjHZp/6+kXzbTKqNIs1qfSMPz0GDcujAmTYJ8edyHI7fme5So
 BgoSTBtjorxShNQjtg7fBVl3dp3oOnAFyOxDwToLUHWAVZKcXewQh5HkbgIawul4
 YoiAsveP2FBCKbJA2xBEbI2S3hMKgRauAvh33JNucgZOM7RqPwkCpiAARzbD6mpR
 tDNqhgXJZ+0F/3prIm4MzapaIivrlQ+LLxvVDTOYQtZyJi1Ba914zw+yUY2VMMHM
 IvWy1qsmB77XxhmvgWj5
 =tv13
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-tangchen' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent

Pull MCE fix from Tony Luck:

   "Fix problem in CMCI rediscovery code that was illegally
    migrating worker threads to other cpus."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-11-13 19:01:01 +01:00
Ingo Molnar 745040347d Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull syscall tracing fix from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-11-13 18:58:39 +01:00
Linus Torvalds 9924a1992a Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti:
 "A correction for user triggerable oops"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461)
2012-11-12 17:37:53 -08:00
Petr Matousek 6d1068b3a9 KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461)
On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
 [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
 [<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-12 21:16:45 -02:00
Linus Torvalds 0020dd0b8c Bug-fixes:
* Fix compile issues on ARM.
  * Fix hypercall fallback code for old hypervisors.
  * Print out which HVM parameter failed if it fails.
  * Fix idle notifier call after irq_enter.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQnQdGAAoJEFjIrFwIi8fJPBAIAMX1HRx3udqhv7fziynZvFTb
 hj47XYIJHOK7P4fK7vZoSNgMHjL6LW5cUqC8VN67G3zUSkX9JYFsPBj6v4bWn+rG
 b9CS+MW7hS80LGbbqkh1F+YSEfZ863RlF9PPX2acaHTw49MlIgIqwhxIo6hy+Nm6
 thu6SlbEIJkSUdhbYMOAmy5aH/3+UuuQg+oq3P7mzV8fZjEihnrrF0NlT4wOZK1o
 gsfrKYKJLVT526W9PF/L23/A/MCHMpvjNStpaDLOGNjV9sBMpJI8JRax6+657+q1
 0kXvN5mAwTKWOaXBl4LEC9R8n1IKB91TgOY6HJAcXkb1eoP5KAeNSmU8RbsZ2T0=
 =XZ+0
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "There are three ARM compile fixes (we forgot to export certain
  functions and if the drivers are built as an module - we go belly-up).

  There is also an mismatch of irq_enter() / exit_idle() calls sequence
  which were fixed some time ago in other piece of codes, but failed to
  appear in the Xen code.

  Lastly a fix for to help in the field with troubleshooting in case we
  cannot get the appropriate parameter and also fallback code when
  working with very old hypervisors."

Bug-fixes:
 - Fix compile issues on ARM.
 - Fix hypercall fallback code for old hypervisors.
 - Print out which HVM parameter failed if it fails.
 - Fix idle notifier call after irq_enter.

* tag 'stable/for-linus-3.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/arm: Fix compile errors when drivers are compiled as modules (export more).
  xen/arm: Fix compile errors when drivers are compiled as modules.
  xen/generic: Disable fallback build on ARM.
  xen/events: fix RCU warning, or Call idle notifier after irq_enter()
  xen/hvm: If we fail to fetch an HVM parameter print out which flag it is.
  xen/hypercall: fix hypercall fallback code for very old hypervisors
2012-11-10 06:56:21 +01:00
David Howells 6d369a09cc x86: Export asm/{svm.h,vmx.h,perf_regs.h}
Export asm/{svm.h,vmx.h,perf_regs.h} so that they can be disintegrated.

It looks from previous commits that the first two should have been exported,
but the header-y lines weren't added to the Kbuild.

I'm guessing that asm/perf_regs.h should be exported too.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-11-08 11:38:44 +00:00
Jan Beulich cf47a83fb0 xen/hypercall: fix hypercall fallback code for very old hypervisors
While copying the argument structures in HYPERVISOR_event_channel_op()
and HYPERVISOR_physdev_op() into the local variable is sufficiently
safe even if the actual structure is smaller than the container one,
copying back eventual output values the same way isn't: This may
collide with on-stack variables (particularly "rc") which may change
between the first and second memcpy() (i.e. the second memcpy() could
discard that change).

Move the fallback code into out-of-line functions, and handle all of
the operations known by this old a hypervisor individually: Some don't
require copying back anything at all, and for the rest use the
individual argument structures' sizes rather than the container's.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().]
[v3: Fix compile errors when modules use said hypercalls]
[v4: Add xen_ prefix to the HYPERCALL_..]
[v5: Alter the name and only EXPORT_SYMBOL_GPL one of them]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-04 10:40:42 -05:00
Oleg Nesterov 4dc316c645 uprobes/x86: Cleanup the single-stepping code
No functional changes.

Now that default arch_uprobe_enable/disable_step() helpers do nothing,
x86 has no reason to reimplement them. Change arch_uprobe_*_xol() hooks
to do the necessary work and remove the x86-specific hooks.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-11-03 17:15:12 +01:00
Jan Beulich 5074b85bdd x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
setup_hpet_msi_remapped() returns a negative error indicator on error
- check for this rather than for a boolean false indication, and pass
on that error code rather than a meaningless "-1".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Link: http://lkml.kernel.org/r/5093E00D02000078000A60E2@nat28.tlf.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-11-02 22:53:27 +01:00
Jan Beulich 6acf5a8c93 x86: hpet: Fix masking of MSI interrupts
HPET_TN_FSB is not a proper mask bit; it merely toggles between MSI and
legacy interrupt delivery. The proper mask bit is HPET_TN_ENABLE, so
use both bits when (un)masking the interrupt.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/5093E09002000078000A60E6@nat28.tlf.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-11-02 22:53:27 +01:00
Linus Torvalds 66b6a0c979 Bug-fixes:
* Use appropriate macros instead of hand-rolling our own (ARM).
  * Fixes if FB/KBD closed unexpectedly.
  * Fix memory leak in /dev/gntdev ioctl calls.
  * Fix overflow check in xenbus_file_write.
  * Document cleanup.
  * Performance optimization when migrating guests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQk9ngAAoJEFjIrFwIi8fJXOcH/jEmTaV2rbUCCnnivlQGj5B2
 AAXt03MM2F7Ohifo8IEHDhvJlUqQnglQq4wcku/8X/bqSkxtqJMfa/UAStmS2e6r
 605msiMws/GKiDPgKywWHjMPk7JJow/T7du9mpT2Swla12+DXc7e0P6Sqm6qGtB5
 tCBFYe3CS+j8Xi/siPhveAoLoDVmC8RpNzV8EWBdUKhNeD6U4s5M3+ChVexOrB/6
 43YkzurkY/FOsP+8YhNnKFSFrpYleRB1GdFcr8PN5mv85sNKts7vHCb4qJFzZdbk
 BMImdLrTUnKArE4y4FS0iqabOTGXaUplEXfyxDw5hweESGa1qzrd29ocyMQ5p/U=
 =LQxc
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.7-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
 - Use appropriate macros instead of hand-rolling our own (ARM).
 - Fixes if FB/KBD closed unexpectedly.
 - Fix memory leak in /dev/gntdev ioctl calls.
 - Fix overflow check in xenbus_file_write.
 - Document cleanup.
 - Performance optimization when migrating guests.

* tag 'stable/for-linus-3.7-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/mmu: Use Xen specific TLB flush instead of the generic one.
  xen/arm: use the __HVC macro
  xen/xenbus: fix overflow check in xenbus_file_write()
  xen-kbdfront: handle backend CLOSED without CLOSING
  xen-fbfront: handle backend CLOSED without CLOSING
  xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REF
  x86: remove obsolete comment from asm/xen/hypervisor.h
2012-11-02 13:26:11 -07:00
Salman Qazi 28696f434f x86: Don't clobber top of pt_regs in nested NMI
The nested NMI modifies the place (instruction, flags and stack)
that the first NMI will iret to.  However, the copy of registers
modified is exactly the one that is the part of pt_regs in
the first NMI.  This can change the behaviour of the first NMI.

In particular, Google's arch_trigger_all_cpu_backtrace handler
also prints regions of memory surrounding addresses appearing in
registers.  This results in handled exceptions, after which nested NMIs
start coming in.  These nested NMIs change the value of registers
in pt_regs.  This can cause the original NMI handler to produce
incorrect output.

We solve this problem by interchanging the position of the preserved
copy of the iret registers ("saved") and the copy subject to being
trampled by nested NMI ("copied").

Link: http://lkml.kernel.org/r/20121002002919.27236.14388.stgit@dungbeetle.mtv.corp.google.com

Signed-off-by: Salman Qazi <sqazi@google.com>
[ Added a needed CFI_ADJUST_CFA_OFFSET ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 11:29:36 -04:00
Suresh Siddha 279f146143 x86: apic: Use tsc deadline for oneshot when available
If the TSC deadline mode is supported, LAPIC timer one-shot mode can be
implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated
when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE
MSR.

This enables us to skip the APIC calibration during boot. Also, in
xapic mode, this enables us to skip the uncached apic access to re-arm
the APIC timer.

As this timer ticks at the high frequency TSC rate, we use the
TSC_DIVISOR (32) to work with the 32-bit restrictions in the
clockevent API's to avoid 64-bit divides etc (frequency is u32 and
"unsigned long" in the set_next_event(), max_delta limits the next
event to 32-bit for 32-bit kernel).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: venki@google.com
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1350941878.6017.31.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-11-02 11:23:37 +01:00
Xiao Guangrong 87da7e66a4 KVM: x86: fix vcpu->mmio_fragments overflow
After commit b3356bf0db (KVM: emulator: optimize "rep ins" handling),
the pieces of io data can be collected and write them to the guest memory
or MMIO together

Unfortunately, kvm splits the mmio access into 8 bytes and store them to
vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
will cause vcpu->mmio_fragments overflow

The bug can be exposed by isapc (-M isapc):

[23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ ......]
[23154.858083] Call Trace:
[23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
[23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
[23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

Actually, we can use one mmio_fragment to store a large mmio access then
split it when we pass the mmio-exit-info to userspace. After that, we only
need two entries to store mmio info for the cross-mmio pages access

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-31 20:36:30 -02:00
Andre Przywara 2bbf0a1427 x86, amd: Disable way access filter on Piledriver CPUs
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.

The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.

The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.

The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.

More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf

CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.

Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-31 13:06:55 -07:00
Konrad Rzeszutek Wilk 95a7d76897 xen/mmu: Use Xen specific TLB flush instead of the generic one.
As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.

This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.

Oracle-bug: 14630170

CC: stable@vger.kernel.org
Tested-by:  Jingjie Jiang <jingjie.jiang@oracle.com>
Suggested-by:  Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-31 12:38:31 -04:00
Tang Chen 85b97637bb x86/mce: Do not change worker's running cpu in cmci_rediscover().
cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
running cpu, and migrate itself to the dest cpu. But worker processes are not
allowed to be migrated. If current is a worker, the worker will be migrated to
another cpu, but the corresponding  worker_pool is still on the original cpu.

In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
BUG_ON(rq != this_rq());

This will cause the kernel panic. The call trace is like the following:

[ 6155.451107] ------------[ cut here ]------------
[ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
......
[ 6155.452019] RIP: 0010:[<ffffffff810add15>]  [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
......
[ 6155.452019] Call Trace:
[ 6155.452019]  [<ffffffff8166fc14>] __schedule+0x764/0x880
[ 6155.452019]  [<ffffffff81670059>] schedule+0x29/0x70
[ 6155.452019]  [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
[ 6155.452019]  [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
[ 6155.452019]  [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
[ 6155.452019]  [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
[ 6155.452019]  [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 6155.452019]  [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
[ 6155.452019]  [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
[ 6155.452019]  [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
[ 6155.452019]  [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
[ 6155.452019]  [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
[ 6155.452019]  [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 6155.452019]  [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
[ 6155.452019]  [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
[ 6155.452019]  [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
[ 6155.452019]  [<ffffffff81676187>] notifier_call_chain+0x67/0x150
[ 6155.452019]  [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
[ 6155.452019]  [<ffffffff81070470>] __cpu_notify+0x20/0x40
[ 6155.452019]  [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
[ 6155.452019]  [<ffffffff81655182>] _cpu_down+0x262/0x2e0
[ 6155.452019]  [<ffffffff81655236>] cpu_down+0x36/0x50
[ 6155.452019]  [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
[ 6155.452019]  [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
[ 6155.452019]  [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
[ 6155.452019]  [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
[ 6155.452019]  [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
[ 6155.452019]  [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
[ 6155.452019]  [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
[ 6155.452019]  [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
[ 6155.452019]  [<ffffffff81090589>] process_one_work+0x219/0x680
[ 6155.452019]  [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
[ 6155.452019]  [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
[ 6155.452019]  [<ffffffff810923be>] worker_thread+0x12e/0x320
[ 6155.452019]  [<ffffffff81092290>] ? manage_workers+0x110/0x110
[ 6155.452019]  [<ffffffff81098396>] kthread+0xc6/0xd0
[ 6155.452019]  [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
[ 6155.452019]  [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
[ 6155.452019]  [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
[ 6155.452019]  [<ffffffff8167c4c0>] ? gs_change+0x13/0x13

This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
jobs onto all the other cpus using system_wq. This could bring some delay for
the jobs.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-10-30 14:38:12 -07:00
Olaf Hering b6514633bd x86: remove obsolete comment from asm/xen/hypervisor.h
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-30 09:27:32 -04:00
Maxime Bizon 37aeec3622 x86/ce4100: Fix PCI configuration register access for devices without interrupts
Some CE4100 devices such as the:

 - DFX module (01:0b.7)
 - entertainment encryption device (01:10.0)
 - multimedia controller (01:12.0)

do not have a device interrupt at all.

This patch fixes the PCI controller code to declare the missing
PCI configuration register space, as well as a fixup method for
forcing the interrupt pin to be 0 for these devices. This is
required to ensure that pci drivers matching on these devices
will be able to honor the various PCI subsystem calls touching
the configuration space.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rui.zhang@intel.com
Cc: alan@linux.intel.com
Link: http://lkml.kernel.org/r/1351518020-25556-4-git-send-email-ffainelli@freebox.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:16:47 +01:00
Maxime Bizon d795991602 x86/ce4100: Fix reboot by forcing the reboot method to be KBD
The default reboot is via ACPI for this platform, and the CEFDK
bootloader actually supports this, but will issue a system power
off instead of a real reboot. Setting the reboot method to be
KBD instead of ACPI ensures proper system reboot.

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Cc: rui.zhang@intel.com
Cc: alan@linux.intel.com
Link: http://lkml.kernel.org/r/1351518020-25556-3-git-send-email-ffainelli@freebox.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:16:46 +01:00
Florian Fainelli f49f4ab95c x86/ce4100: Fix pm_poweroff
The CE4100 platform is currently missing a proper pm_poweroff
implementation leading to poweroff making the CPU spin forever
and the CE4100 platform does not enter a low-power mode where
the external Power Management Unit can properly power off the
system. Power off on this platform is implemented pretty much
like reboot, by writing to the SoC built-in 8051 microcontroller
mapped at I/O port 0xcf9, the value 0x4.

Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rui.zhang@intel.com
Cc: alan@linux.intel.com
Link: http://lkml.kernel.org/r/1351518020-25556-2-git-send-email-ffainelli@freebox.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:16:46 +01:00
Peter Huewe 95d18aa2b6 perf/x86: Fix sparse warnings
FYI, there are new sparse warnings:

 arch/x86/kernel/cpu/perf_event.c:1356:18: sparse: symbol 'events_attr' was not declared. Should it be static?

This patch makes it static and also adds the static keyword to
fix arch/x86/kernel/cpu/perf_event.c:1344:9: warning: symbol
'events_sysfs_show' was not declared.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: fengguang.wu@intel.com
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-lerdpXlnruh0yvWs2owwuizl@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:10:52 +01:00
Andreas Herrmann 943482d07e x86, microcode_amd: Change email addresses, MAINTAINERS entry
Signed-off-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: lm-sensors@lm-sensors.org
Cc: oprofile-list@lists.sf.net
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jorg Roedel <joro@8bytes.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20121029175138.GC5024@tweety
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:05:52 +01:00
Borislav Petkov e6d41e8c69 x86, AMD: Change Boris' email address
Move to private email and put in maintained status.

Signed-off-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1351532410-4887-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-30 10:05:50 +01:00
Greg Kroah-Hartman ca364d8388 Merge 3.7-rc3 into tty-next
This merges the tty changes in 3.7-rc3 into tty-next

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-29 09:00:57 -07:00
Frederic Weisbecker 2c5594df34 rcu: Fix unrecovered RCU user mode in syscall_trace_leave()
On x86-64 syscall exit, 3 non exclusive events may happen
looping in the following order:

1) Check if we need resched for user preemption, if so call
schedule_user()

2) Check if we have pending signals, if so call do_notify_resume()

3) Check if we do syscall tracing, if so call syscall_trace_leave()

However syscall_trace_leave() has been written assuming it directly
follows the syscall and forget about the above possible 1st and 2nd
steps.

Now schedule_user() and do_notify_resume() exit in RCU user mode
because they have most chances to resume userspace immediately and
this avoids an rcu_user_enter() call in the syscall fast path.

So by the time we call syscall_trace_leave(), we may well be in RCU
user mode. To fix this up, simply call rcu_user_exit() in the beginning
of this function.

This fixes some reported RCU uses in extended quiescent state.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-27 15:42:00 -07:00
Linus Torvalds 622f202a4c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This fixes a couple of nasty page table initialization bugs which were
  causing kdump regressions.  A clean rearchitecturing of the code is in
  the works - meanwhile these are reverts that restore the
  best-known-working state of the kernel.

  There's also EFI fixes and other small fixes."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mm: Undo incorrect revert in arch/x86/mm/init.c
  x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
  x86, mm: Find_early_table_space based on ranges that are actually being mapped
  x86, mm: Use memblock memory loop instead of e820_RAM
  x86, mm: Trim memory in memblock to be page aligned
  x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
  x86/efi: Fix oops caused by incorrect set_memory_uc() usage
  x86-64: Fix page table accounting
  Revert "x86/mm: Fix the size calculation of mapping tables"
  MAINTAINERS: Add EFI git repository location
2012-10-26 09:35:46 -07:00
Daniel Lezcano 4d0e42cc66 x86: Remove dead hlt_use_halt code
The hlt_use_halt function returns always true and there is only
one definition of it.

The default_idle function can then get ride of the if ...
statement and we can remove the else branch.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linaro-dev@lists.linaro.org
Cc: patches@linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1351181591-8710-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-26 12:25:03 +02:00
Bin Gao 6e87f9b7e4 arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present
MPS tables are not needed for systems that have proper ACPI
support. This is also true for systems that have SFI in place.

So this patch allows the configuration (turning off) of
CONFIG_X86_MPPARSE when either ACPI or SFI is present.

Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: http://lkml.kernel.org/r/20121025163544.GB38477@bingao-desk1.fm.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-26 12:20:38 +02:00
Ingo Molnar 8b724e2a12 EFI updates for 3.7
Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and
 document EFI git repository location on kernel.org.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQiatJAAoJEC84WcCNIz1VaCwP/RkNLYRGruxPFD0gf9ainwVQ
 qSXfpYLmpZeU+TcIBx6aG7PjzQsitvMU9YBtqvIN5uoYHuDjy2vDT52WBqg2Fn5k
 HYW13m/Pex+kCEpV4n6uYi9NM5mJeR/+QlpfQOcofxuqsvG6WUY1l55SF5+V/Dd/
 Dv94yO21JNiBumPM7KadFl5EIZ53j8OdQhEJB0jnomC0cDAWnbIHk97XPrSp6+rf
 03AQrYLnDNHq0HJo44LdoJleiRuxHBC6FrhCsrctvpVLd6iVNIGbJupNTBPvvAxl
 zY4aBoYym87uYo6y3LMevD+L2fkTC3qE6iQilYVbShkoYLnDTOnTCcuwUkRGQ/yX
 vBAHH/FNw2uUKSBeTdbA2/5OEctZ+GVEgkCkplAUfwJAxidyygBn9jD/YXHL+Fu+
 fDMvVnZTKbTQmOOP9cpYbebqAGykyST97HuDxOZ8mha5UP0QhCz5CbRfENdbP8w9
 00+hjEIkS0fjfjaSeCzp5tpkAVovzhZyoVZRCwoe42bZ7SAreDzNTYEnbK6G4owo
 x2mFXGlcTeZCmTNgQEmzby71tuAK+/+UEEXuoYV42wNda52iyvv7xkHJ/Q4li3um
 k0jjFFcqwd3mJC+OrJHr4LTCB1tvNgbpgsDUuUYckwPIIkWa7ZOF9xCWpiu2nC08
 4TI5A5DXf1n5i9sX4aw5
 =oM9H
 -----END PGP SIGNATURE-----

Merge tag 'efi-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

 "Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and
  document EFI git repository location on kernel.org."

Conflicts:
	arch/x86/include/asm/efi.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-26 10:17:38 +02:00
Yinghai Lu f82f64dd9f x86, mm: Undo incorrect revert in arch/x86/mm/init.c
Commit

    844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped

added back some lines back wrongly that has been removed in commit

    7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables"

remove them again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-25 15:45:45 -07:00
Olof Johansson 5189c2a7c7 x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off
efi_enabled once setup is done. Beyond setup, it is normally used to
determine if runtime services are available and we will have none.

This will resolve issues stemming from efivars modprobe panicking on a
32/64-bit setup, as well as some reboot issues on similar setups.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991

Reported-by: Marko Kohtala <marko.kohtala@gmail.com>
Reported-by: Maxim Kammerer <mk@dee.su>
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: stable@kernel.org # 3.4 - 3.6
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-10-25 19:09:40 +01:00
Jacob Shin 844ab6f993 x86, mm: Find_early_table_space based on ranges that are actually being mapped
Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This is needed after 1bbbbe779a, to address
the panic reported here:

  https://lkml.org/lkml/2012/10/20/160
  https://lkml.org/lkml/2012/10/21/157

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
Tested-by: Tom Rini <trini@ti.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-24 13:37:04 -07:00
Yinghai Lu 1f2ff682ac x86, mm: Use memblock memory loop instead of e820_RAM
We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time.

Also memblock has page aligned range for ram, so we could avoid mapping
partial pages.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-10-24 11:52:36 -07:00
Yinghai Lu 6ede1fd3cb x86, mm: Trim memory in memblock to be page aligned
We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-10-24 11:52:21 -07:00
Maxime Bizon 08ec212c0f x86: ce4100: allow second UART usage
The current CE4100 and 8250_pci code have both a limitation preventing the
registration and usage of CE4100's second UART. This patch changes the
platform code fixing up the UART port to work on a relative UART port
base address, as well as the 8250_pci code to make it register 2 UART ports
for CE4100 and pass the port index down to all consumers.

Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24 11:34:51 -07:00
David Vrabel ce37f40033 x86: Allow tracing of functions in arch/x86/kernel/rtc.c
Move native_read_tsc() to tsc.c to allow profiling to be
re-enabled for rtc.c.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1349698050-6560-1-git-send-email-david.vrabel@citrix.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 13:14:22 +02:00
Dimitri Sivanich 94777fc51b x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
Posting this patch to fix an issue concerning sparse irq's that
I raised a while back.  There was discussion about adding
refcounting to sparse irqs (to fix other potential race
conditions), but that does not appear to have been addressed
yet.  This covers the only issue of this type that I've
encountered in this area.

A NULL pointer dereference can occur in
smp_irq_move_cleanup_interrupt() if we haven't yet setup the
irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set
vector_irq in __assign_irq_vector(), but not yet called
irq_set_chip_data() to set the irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
this time, smp_irq_move_cleanup_interrupt() will attempt to
process the aforementioned irq, but panic when accessing
irq_cfg.

Only continue processing the irq if irq_cfg is non-NULL.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:53:51 +02:00
Wei Yongjun 64dfab8e83 perf/x86: Remove unused variable in nhmex_rbox_alter_er()
The variable port is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/CAPgLHd8NZkYSkZm22FpZxiEh6HcA0q-V%3D29vdnheiDhgrJZ%2Byw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:51:40 +02:00
Matt Fleming 3e8fa263a9 x86/efi: Fix oops caused by incorrect set_memory_uc() usage
Calling __pa() with an ioremap'd address is invalid. If we
encounter an efi_memory_desc_t without EFI_MEMORY_WB set in
->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this results in the following oops:

  BUG: unable to handle kernel paging request at f7f22280
  IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
  *pdpt = 0000000001978001 *pde = 0000000001ffb067 *pte = 0000000000000000
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:

  Pid: 0, comm: swapper Not tainted 3.0.0-acpi-efi-0805 #3
   EIP: 0060:[<c10257b9>] EFLAGS: 00010202 CPU: 0
   EIP is at reserve_ram_pages_type+0x89/0x210
   EAX: 0070e280 EBX: 38714000 ECX: f7814000 EDX: 00000000
   ESI: 00000000 EDI: 38715000 EBP: c189fef0 ESP: c189fea8
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 0, ti=c189e000 task=c18bbe60 task.ti=c189e000)
  Stack:
   80000200 ff108000 00000000 c189ff00 00038714 00000000 00000000 c189fed0
   c104f8ca 00038714 00000000 00038715 00000000 00000000 00038715 00000000
   00000010 38715000 c189ff48 c1025aff 38715000 00000000 00000010 00000000
  Call Trace:
   [<c104f8ca>] ? page_is_ram+0x1a/0x40
   [<c1025aff>] reserve_memtype+0xdf/0x2f0
   [<c1024dc9>] set_memory_uc+0x49/0xa0
   [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
   [<c19216d4>] start_kernel+0x291/0x2f2
   [<c19211c7>] ? loglevel+0x1b/0x1b
   [<c19210bf>] i386_start_kernel+0xbf/0xc8

The only time we can call set_memory_uc() for a memory region is
when it is part of the direct kernel mapping. For the case where
we ioremap a memory region we must leave it alone.

This patch reimplements the fix from e8c7106280 ("x86, efi:
Calling __pa() with an ioremap()ed address is invalid") which
was reverted in e1ad783b12 because it caused a regression on
some MacBooks (they hung at boot). The regression was caused
because the commit only marked EFI_RUNTIME_SERVICES_DATA as
E820_RESERVED_EFI, when it should have marked all regions that
have the EFI_MEMORY_RUNTIME attribute.

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because of the way that the memory map might be
configured as detailed in the following bug report,

	https://bugzilla.redhat.com/show_bug.cgi?id=748516

e.g. some of the EFI memory regions *need* to be mapped as part
of the direct kernel mapping.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1350649546-23541-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:48:47 +02:00
Ma Ling 269833bd5a x86/asm: Clean up copy_page_*() comments and code
Modern CPUs use fast-string instruction to accelerate copy
performance, by combining data into 128 bit chunks.

Modify comments and coding style to match it.

Signed-off-by: Ma Ling <ling.ma@intel.com>
Cc: iant@google.com
Link: http://lkml.kernel.org/r/1350503565-19167-1-git-send-email-ling.ma@intel.com
[ Cleaned up the clean up. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:42:47 +02:00
Vince Weaver e4074b3049 perf/x86: Enable overflow on Intel KNC with a custom knc_pmu_handle_irq()
Although based on the Intel P6 design, the interrupt mechnanism
for KNC more closely resembles the Intel architectural
perfmon one.

We can't just re-use that code though, because KNC has different
MSR numbers for the status and ack registers.

In this case we just cut-and paste from perf_event_intel.c
with some minor changes, as it looks like it would not be
worth the trouble to change that code to be MSR-configurable.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171304410.23243@vincent-weaver-1.um.maine.edu
[ Small stylistic edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:00:49 +02:00
Vince Weaver 7d011962af perf/x86: Remove cpuc->enable check on Intl KNC event enable/disable
x86_pmu.enable() is called from x86_pmu_enable() with
cpuc->enabled set to 0.  This means we weren't re-enabling the
counters after a context switch.

This patch just removes the check, as it should't be necessary
(and the equivelent x86_ generic code does not have the checks).

The origin of this problem is the KNC driver being based on the
P6 one.   The P6 driver also has this issue, but works anyway
due to various lucky accidents.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows
Cc: Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171303290.23243@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:00:49 +02:00
Vince Weaver ae5ba47a99 perf/x86: Make Intel KNC use full 40-bit width of counters
Early versions of Intel KNC chips have a bug where bits above 32
were not properly set.  We worked around this by only using the
bottom 32 bits (out of 40 that should be available).

It turns out this workaround breaks overflow handling.

The buggy silicon will in theory never be used in production
systems, so remove this workaround so we get proper overflow
support.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: eranian@gmail.com
Cc: Meadows Lawrence F <lawrence.f.meadows@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210171302140.23243@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 12:00:48 +02:00
Yan, Zheng 032c3851f5 perf/x86/uncore: Handle pci_read_config_dword() errors
This, beyond handling corner cases, also fixes some build warnings:

 arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_disable_box’:
 arch/x86/kernel/cpu/perf_event_intel_uncore.c:124:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
 arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_enable_box’:
 arch/x86/kernel/cpu/perf_event_intel_uncore.c:135:9: warning: ‘config’ is used uninitialized in this function [-Wuninitialized]
 arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function ‘snbep_uncore_pci_read_counter’:
 arch/x86/kernel/cpu/perf_event_intel_uncore.c:164:2: warning: ‘count’ is used uninitialized in this function [-Wuninitialized]

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1351068140-13456-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:57:03 +02:00
Jan Beulich 876ee61aad x86-64: Fix page table accounting
Commit 20167d3421 ("x86-64: Fix
accounting in kernel_physical_mapping_init()") went a little too
far by entirely removing the counting of pre-populated page
tables: this should be done at boot time (to cover the page
tables set up in early boot code), but shouldn't be done during
memory hot add.

Hence, re-add the removed increments of "pages", but make them
and the one in phys_pte_init() conditional upon !after_bootmem.

Reported-Acked-and-Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/506DAFBA020000780009FA8C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:50:25 +02:00
Jiri Olsa 20550a4345 perf/x86: Add hardware events translations for Intel P6 cpus
Add support for Intel P6 processors to display 'events' sysfs
directory (/sys/devices/cpu/events/) with hw event translations:

  # ls /sys/devices/cpu/events/
  branch-instructions
  branch-misses
  bus-cycles
  cache-misses
  cache-references
  cpu-cycles
  instructions
  ref-cycles
  stalled-cycles-backend
  stalled-cycles-frontend

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1349873598-12583-6-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:41:25 +02:00
Jiri Olsa 0bf79d4413 perf/x86: Add hardware events translations for AMD cpus
Add support for AMD processors to display 'events' sysfs
directory (/sys/devices/cpu/events/) with hw event translations:

  # ls  /sys/devices/cpu/events/
  branch-instructions
  branch-misses
  bus-cycles
  cache-misses
  cache-references
  cpu-cycles
  instructions
  ref-cycles
  stalled-cycles-backend
  stalled-cycles-frontend

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1349873598-12583-5-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:41:25 +02:00
Jiri Olsa 43c032febd perf/x86: Add hardware events translations for Intel cpus
Add support for Intel processors to display 'events' sysfs
directory (/sys/devices/cpu/events/) with hw event translations:

  # ls  /sys/devices/cpu/events/
  branch-instructions
  branch-misses
  bus-cycles
  cache-misses
  cache-references
  cpu-cycles
  instructions
  ref-cycles
  stalled-cycles-backend
  stalled-cycles-frontend

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1349873598-12583-4-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:41:24 +02:00
Jiri Olsa 8300daa267 perf/x86: Filter out undefined events from sysfs events attribute
The sysfs events group attribute currently shows all hw events,
including also undefined ones.

This patch filters out all undefined events out of the sysfs events
group attribute, so they don't even show up.

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1349873598-12583-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:41:24 +02:00
Jiri Olsa a47473939d perf/x86: Make hardware event translations available in sysfs
Add support to display hardware events translations available
through the sysfs. Add 'events' group attribute under the sysfs
x86 PMU record with attribute/file for each hardware event.

This patch adds only backbone for PMUs to display config under
'events' directory. The specific PMU support itself will come
in next patches, however this is how the sysfs group will look
like:

  # ls  /sys/devices/cpu/events/
  branch-instructions
  branch-misses
  bus-cycles
  cache-misses
  cache-references
  cpu-cycles
  instructions
  ref-cycles
  stalled-cycles-backend
  stalled-cycles-frontend

The file - hw event ID mapping is:

  file                      hw event ID
  ---------------------------------------------------------------
  cpu-cycles                PERF_COUNT_HW_CPU_CYCLES
  instructions              PERF_COUNT_HW_INSTRUCTIONS
  cache-references          PERF_COUNT_HW_CACHE_REFERENCES
  cache-misses              PERF_COUNT_HW_CACHE_MISSES
  branch-instructions       PERF_COUNT_HW_BRANCH_INSTRUCTIONS
  branch-misses             PERF_COUNT_HW_BRANCH_MISSES
  bus-cycles                PERF_COUNT_HW_BUS_CYCLES
  stalled-cycles-frontend   PERF_COUNT_HW_STALLED_CYCLES_FRONTEND
  stalled-cycles-backend    PERF_COUNT_HW_STALLED_CYCLES_BACKEND
  ref-cycles                PERF_COUNT_HW_REF_CPU_CYCLES

Each file in the 'events' directory contains the term translation
for the symbolic hw event for the currently running cpu model.

  # cat /sys/devices/cpu/events/stalled-cycles-backend
  event=0xb1,umask=0x01,inv,cmask=0x01

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1349873598-12583-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:41:23 +02:00
Vince Weaver 58e9eaf06f perf/x86: Remove P6 cpuc->enabled check
Between 2.6.33 and 2.6.34 the PMU code was made modular.

The x86_pmu_enable() call was extended to disable cpuc->enabled
and iterate the counters, enabling one at a time, before calling
enable_all() at the end, followed by re-enabling cpuc->enabled.

Since cpuc->enabled was set to 0, that change effectively caused
the "val |= ARCH_PERFMON_EVENTSEL_ENABLE;" code in p6_pmu_enable_event()
and p6_pmu_disable_event() to be dead code that was never called.

This change removes this code (which was confusing) and adds some
extra commentary to make it more clear what is going on.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191732000.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:32:00 +02:00
Vince Weaver e09df47885 perf/x86: Update/fix generic events on P6 PMU
This patch updates the generic events on p6, including some new
extended cache events.

Values for these events were taken from the equivelant PAPI
predefined events.

Tested on a Pentium II.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191730080.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:31:58 +02:00
Vince Weaver 7991c9ca40 perf/x86: Fix P6 FP_ASSIST event constraint
According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
not Counter 0.

Tested on a Pentium II.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 10:31:57 +02:00
Dave Young 7b16bbf973 Revert "x86/mm: Fix the size calculation of mapping tables"
Commit:

   722bc6b167 x86/mm: Fix the size calculation of mapping tables

Tried to address the issue that the first 2/4M should use 4k pages
if PSE enabled, but extra counts should only be valid for x86_32.

This commit caused a kdump regression: the kdump kernel hangs.

Work is in progress to fundamentally fix the various page table
initialization issues that we have, via the design suggested
by H. Peter Anvin, but it's not ready yet to be merged.

So, to get a working kdump revert to the last known working version,
which is the revert of this commit and of a followup fix (which was
incomplete):

   bd2753b2dd x86/mm: Only add extra pages count for the first memory range during pre-allocation

Tested kdump on physical and virtual machines.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Tested-by: Flavio Leitner <fbl@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Flavio Leitner <fbl@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: ianfang.cn@gmail.com
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 09:38:25 +02:00
Andre Przywara bffd5fc260 x86/perf: Fix virtualization sanity check
In check_hw_exists() we try to detect non-emulated MSR accesses
by writing an arbitrary value into one of the PMU registers
and check if it's value after a readout is still the same.
This algorithm silently assumes that the register does not contain
the magic value already, which is wrong in at least one situation.

Fix the algorithm to really do a read-modify-write cycle. This fixes
a warning under Xen under some circumstances on AMD family 10h CPUs.

The reasons in more details actually sound like a story from
Believe It or Not!:

First you need an AMD family 10h/12h CPU. These do not reset the
PERF_CTR registers on a reboot.
Now you boot bare metal Linux, which goes successfully through this
check, but leaves the magic value of 0xabcd in the register. You
don't use the performance counters, but do a reboot (warm reset).
Then you choose to boot Xen. The check will be triggered with a
recent Linux kernel as Dom0 again, trying to write 0xabcd into the
MSR. Xen silently drops the write (expected), but the subsequent read
will return the value in the register, which just happens to be the
expected magic value. Thus the test misleadingly succeeds, leaving
the kernel in the belief that the PMU is available. This will trigger
the following message:

[    0.020294] ------------[ cut here ]------------
[    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
[    0.020318] Hardware name: empty
[    0.020323] Modules linked in:
[    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
[    0.020340] Call Trace:
[    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
[    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
[    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
[    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
[    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
[    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
[    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
[    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
[    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
[    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
[    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
[    0.020500] ---[ end trace a7919e7f17c0a725 ]---

The new code will change every of the 16 low bits read from the
register and tries to write and read-back that modified number
from the MSR.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24 08:53:13 +02:00
Linus Torvalds 0e9e3e306c Bug-fixes:
* Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
    of the %eip when returning from a signal handler.
  * Fix various ARM compile issues after the merge fallout.
  * Continue on making more of the Xen generic code usable by ARM platform.
  * Fix SR-IOV passthrough to mirror multifunction PCI devices.
  * Fix various compile warnings.
  * Remove hypercalls that don't exist anymore.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQhsIHAAoJEFjIrFwIi8fJ9LsH+gLGiF7dvFIUw1IA/Ev+tZ9Y
 YFFMJwmP71ZoqrJnEH+0vXlDU7YQAF/qQysVfACfHU5en2OEO24IuINddrm3wcYU
 2YAwEiLQstWhK1bhYqRqWeczjR3BV0NWtUoHpQar/5h4Ykppl5OxmXdBEfv+ThzA
 ju2d9fvQoJR7flW/CsWqoNcyPubzzXWYRCBWLdChw3NXVQTr/5ZDwvkIwgk6Gv5g
 vR0Qlirjdf2IyyE77zYhZw61H82IXoVCKnmif3HC1lYnSvVdVxamI0UhtXIjPJQU
 KB2e9Qkfix8weXDtpNBqa/VUIW7R83qCTZszs4mD/ktPAhgvxzCF3h/XLLXuwS4=
 =FR/L
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen bug-fixes from Konrad Rzeszutek Wilk:
 - Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
   of the %eip when returning from a signal handler.
 - Fix various ARM compile issues after the merge fallout.
 - Continue on making more of the Xen generic code usable by ARM
   platform.
 - Fix SR-IOV passthrough to mirror multifunction PCI devices.
 - Fix various compile warnings.
 - Remove hypercalls that don't exist anymore.

* tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
  xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
  xen: balloon: use correct type for frame_list
  xen/x86: don't corrupt %eip when returning from a signal handler
  xen: arm: make p2m operations NOPs
  xen: balloon: don't include e820.h
  xen: grant: use xen_pfn_t type for frame_list.
  xen: events: pirq_check_eoi_map is X86 specific
  xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
  xen: sysfs: fix build warning.
  xen: sysfs: include err.h for PTR_ERR etc
  xen: xenbus: quirk uses x86 specific cpuid
  xen PV passthru: assign SR-IOV virtual functions to separate virtual slots
  xen/xenbus: Fix compile warning.
  xen/x86: remove duplicated include from enlighten.c
2012-10-24 05:17:27 +03:00
Linus Torvalds 3d0ceac129 KVM updates for 3.7-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQhnb/AAoJEI7yEDeUysxlCLsQAI4EFZWJiWwY6TtYfGuhWvzi
 XvaCwdH8NYE1YWEqWmu7B864gKJb4AEjJ9Du3zj52IRkurBEstIM9trnr/WjLkEP
 mSC5AIqFzy0Wjyqy8aUDzkMGEoA2QOMk/FCHKYIF57genRLP6p8+p57MmMKkgSSZ
 6FUwYWLcJEUIGg4VVnYkEf6rWQYDgBUCBOwLx/+h03B2ff/U4648dVIlJaA2SCt8
 B8mXV8mgb1soRkuleE8/p0b/pj+tHBO0f2oZkvg60/JXMpiTopec+5LZncEz45C9
 fqel3bk2RZW8IIHh+Ek/I2VxrZmalJ8aHhZfkivHp3DCAgggdJ9oviR8xyRhj29l
 5eFeLibbOvvDscWxA9pSJsIGwwRjtHbj38YEAAZwm23E0WVPwICC+ePVMDW33R0T
 3L8kXDFVLHEjupjJz4CYFeUHrC9dkf74FxqJ9v9jW3iY+F+1xX5c5KJL3NNKAI6M
 kTgSzFKUmgcNVCAOFFKRugjcRmS5dEKX6FXxa3NHnYrMEcaI2pQE6ZJtuKs54BPN
 euVhtK1tLXfnWrrpkYyZMfIZPVv3dIFddORrlh5GE1oTtwfV5MUUM2U6QPreqVuM
 2EU1MfW92su82CcsRuGzfjSLD/NpJGfF1tSle8xVEIn1xuS6aAAsnl/uP+zMuVal
 rMVJGBwBD0O2OyPwVXT9
 =qz8+
 -----END PGP SIGNATURE-----

Merge tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Avi Kivity:
 "KVM updates for 3.7-rc2"

* tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
  KVM: apic: fix LDR calculation in x2apic mode
  KVM: MMU: fix release noslot pfn
2012-10-24 04:08:42 +03:00
Sasha Levin c5e015d494 KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
marked that spot as an exit from idleness.

Not doing so can cause RCU warnings such as:

[  732.788386] ===============================
[  732.789803] [ INFO: suspicious RCU usage. ]
[  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
[  732.790032] -------------------------------
[  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
[  732.790032]
[  732.790032] other info that might help us debug this:
[  732.790032]
[  732.790032]
[  732.790032] RCU used illegally from idle CPU!
[  732.790032] rcu_scheduler_active = 1, debug_locks = 1
[  732.790032] RCU used illegally from extended quiescent state!
[  732.790032] 2 locks held by trinity-child31/8252:
[  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
[  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
[  732.790032]
[  732.790032] stack backtrace:
[  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
[  732.790032] Call Trace:
[  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
[  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
[  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
[  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
[  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
[  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
[  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
[  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
[  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
[  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
[  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
[  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
[  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
[  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
[  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
[  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
[  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22 18:03:28 +02:00
Gleb Natapov 7f46ddbd48 KVM: apic: fix LDR calculation in x2apic mode
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Chegu Vinod  <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22 18:03:27 +02:00
Xiao Guangrong f3ac1a4b66 KVM: MMU: fix release noslot pfn
We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22 18:03:25 +02:00
Ingo Molnar f38787f4f9 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent
Pull various uprobes bugfixes from Oleg Nesterov - mostly race and
failure path fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21 18:18:17 +02:00
Ingo Molnar 957b9095ed Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent
Pull event-wrapping Oprofile fix from Robert Richter.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21 18:11:20 +02:00
Yan, Zheng a05123bdd1 perf/x86: Disable uncore on virtualized CPUs
Initializing uncore PMU on virtualized CPU may hang the kernel.
This is because kvm does not emulate the entire hardware. Thers
are lots of uncore related MSRs, making kvm enumerate them all
is a non-trival task. So just disable uncore on virtualized CPU.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Tested-by: Pekka Enberg <penberg@kernel.org>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: andi@firstfloor.org
Cc: avi@redhat.com
Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-20 10:07:02 +02:00
Linus Torvalds 8c1bee685e Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Assorted small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf python: Properly link with libtraceevent
  perf hists browser: Add back callchain folding symbol
  perf tools: Fix build on sparc.
  perf python: Link with libtraceevent
  perf python: Initialize 'page_size' variable
  tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter
  lib tools traceevent: Add back pevent assignment in __pevent_parse_format()
  perf hists browser: Fix off-by-two bug on the first column
  perf tools: Remove warnings on JIT samples for srcline sort key
  perf tools: Fix segfault when using srcline sort key
  perf: Require exclude_guest to use PEBS - kernel side enforcement
  perf tool: Precise mode requires exclude_guest
2012-10-19 18:39:36 -07:00