dect
/
linux-2.6
Archived
13
0
Fork 0
Commit Graph

13783 Commits

Author SHA1 Message Date
Linus Torvalds 8e204874db Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, vdso: Do not allocate memory for the vDSO
  clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
  x86, vdso: Drop now wrong comment
  Document the vDSO and add a reference parser
  ia64: Replace clocksource.fsys_mmio with generic arch data
  x86-64: Move vread_tsc and vread_hpet into the vDSO
  clocksource: Replace vread with generic arch data
  x86-64: Add --no-undefined to vDSO build
  x86-64: Allow alternative patching in the vDSO
  x86: Make alternative instruction pointers relative
  x86-64: Improve vsyscall emulation CS and RIP handling
  x86-64: Emulate legacy vsyscalls
  x86-64: Fill unused parts of the vsyscall page with 0xcc
  x86-64: Remove vsyscall number 3 (venosys)
  x86-64: Map the HPET NX
  x86-64: Remove kernel.vsyscall64 sysctl
  x86-64: Give vvars their own page
  x86-64: Document some of entry_64.S
  x86-64: Fix alignment of jiffies variable
2011-07-22 17:05:15 -07:00
Linus Torvalds 3e0b8df79d Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Correct UV2 BAU destination timeout
  x86, UV: Correct failed topology memory leak
  x86, UV: Remove cpumask_t from the stack
  x86, UV: Rename hubmask to pnmask
  x86, UV: Correct reset_with_ipi()
  x86, UV: Allow for non-consecutive sockets
  x86, UV: Inline header file functions
  x86, UV: Fix smp_processor_id() use in a preemptable region
  x66, UV: Enable 64-bit ACPI MFCG support for SGI UV2 platform
  x86, UV: Clean up uv_mmrs.h
2011-07-22 17:04:55 -07:00
Linus Torvalds 8051207959 Merge branch 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Kill handle_signal()->set_fs()
  x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
  x86, signals: Convert the X86_32 code to use set_current_blocked()
  x86, signals: Convert the IA32_EMULATION code to use set_current_blocked()
2011-07-22 17:04:32 -07:00
Linus Torvalds 9e39264ed4 Merge branch 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Implement pfn -> nid mapping granularity check
  x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
2011-07-22 17:04:18 -07:00
Linus Torvalds dc43d9fa73 Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mtrr: Use pci_dev->revision
  x86, mtrr: use stop_machine APIs for doing MTRR rendezvous
  stop_machine: implement stop_machine_from_inactive_cpu()
  stop_machine: reorganize stop_cpus() implementation
  x86, mtrr: lock stop machine during MTRR rendezvous sequence
2011-07-22 17:04:04 -07:00
Linus Torvalds 80775068db Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, microcode, AMD: Fix section header size check
  x86, microcode, AMD: Correct buf references
2011-07-22 17:03:52 -07:00
Linus Torvalds 7c6582b28a Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: Use mce_sysdev_ prefix to group functions
  x86, mce: Use mce_chrdev_ prefix to group functions
  x86, mce: Cleanup mce_read()
  x86, mce: Cleanup mce_create()/remove_device()
  x86, mce: Check the result of ancient_init()
  x86, mce: Introduce mce_gather_info()
  x86, mce: Replace MCM_ with MCI_MISC_
  x86, mce: Replace MCE_SELF_VECTOR by irq_work
  x86, mce, severity: Clean up trivial coding style problems
  x86, mce, severity: Cleanup severity table
  x86, mce, severity: Make formatting a bit more readable
  x86, mce, severity: Fix two severities table signatures
2011-07-22 17:03:40 -07:00
Linus Torvalds 227ad9bc07 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, efi: Properly pre-initialize table pointers
  x86, efi: Add infrastructure for UEFI 2.0 runtime services
  x86, efi: Fix argument types for SetVariable()
2011-07-22 17:03:14 -07:00
Linus Torvalds 35b004cce1 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
  x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE
  x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
2011-07-22 17:02:54 -07:00
Linus Torvalds 0a613b647b Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const
  x86: Convert vmalloc()+memset() to vzalloc()
2011-07-22 17:02:38 -07:00
Linus Torvalds eb47418dc5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix write lock scalability 64-bit issue
  x86: Unify rwsem assembly implementation
  x86: Unify rwlock assembly implementation
  x86, asm: Fix binutils 2.16 issue with __USER32_CS
  x86, asm: Cleanup thunk_64.S
  x86, asm: Flip RESTORE_ARGS arguments logic
  x86, asm: Flip SAVE_ARGS arguments logic
  x86, asm: Thin down SAVE/RESTORE_* asm macros
2011-07-22 17:02:24 -07:00
Linus Torvalds 2c9e88a108 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
  x86, ioapic: Print IRTE when IR is enabled
  x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
  x86, ioapic: Also print Dest field
  x86, ioapic: Format clean up for IOAPIC output
  x86: print APIC data a little later during boot
2011-07-22 17:02:07 -07:00
Linus Torvalds 52de84f3f3 Merge branch 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Serialize EFI time accesses on rtc_lock
  x86: Serialize SMP bootup CMOS accesses on rtc_lock
  rtc: stmp3xxx: Remove UIE handlers
  rtc: stmp3xxx: Get rid of mach-specific accessors
  rtc: stmp3xxx: Initialize drvdata before registering device
  rtc: stmp3xxx: Port stmp-functions to mxs-equivalents
  rtc: stmp3xxx: Restore register definitions
  rtc: vt8500: Use define instead of hardcoded value for status bit
2011-07-22 16:52:39 -07:00
Linus Torvalds a99a7d1436 Merge branch 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mips: Fix i8253 clockevent fallout
  i8253: Cleanup outb/inb magic
  arm: Footbridge: Use common i8253 clockevent
  mips: Use common i8253 clockevent
  x86: Use common i8253 clockevent
  i8253: Create common clockevent implementation
  i8253: Export i8253_lock unconditionally
  pcpskr: MIPS: Make config dependencies finer grained
  pcspkr: Cleanup Kconfig dependencies
  i8253: Move remaining content and delete asm/i8253.h
  i8253: Consolidate definitions of PIT_LATCH
  x86: i8253: Consolidate definitions of global_clock_event
  i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
  alpha: i8253: Cleanup remaining users of i8253pit.h
  i8253: Remove I8253_LOCK config
  i8253: Make pcsp sound driver use the shared i8253_lock
  i8253: Make pcspkr input driver use the shared i8253_lock
  i8253: Consolidate all kernel definitions of i8253_lock
  i8253: Unify all kernel declarations of i8253_lock
  i8253: Create linux/i8253.h and use it in all 8253 related files
2011-07-22 16:51:56 -07:00
Linus Torvalds 4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Linus Torvalds 6d16d6d9bb Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  iommu/core: Fix build with INTR_REMAP=y && CONFIG_DMAR=n
  iommu/amd: Don't use MSI address range for DMA addresses
  iommu/amd: Move missing parts to drivers/iommu
  iommu: Move iommu Kconfig entries to submenu
  x86/ia64: intel-iommu: move to drivers/iommu/
  x86: amd_iommu: move to drivers/iommu/
  msm: iommu: move to drivers/iommu/
  drivers: iommu: move to a dedicated folder
  x86/amd-iommu: Store device alias as dev_data pointer
  x86/amd-iommu: Search for existind dev_data before allocting a new one
  x86/amd-iommu: Allow dev_data->alias to be NULL
  x86/amd-iommu: Use only dev_data in low-level domain attach/detach functions
  x86/amd-iommu: Use only dev_data for dte and iotlb flushing routines
  x86/amd-iommu: Store ATS state in dev_data
  x86/amd-iommu: Store devid in dev_data
  x86/amd-iommu: Introduce global dev_data_list
  x86/amd-iommu: Remove redundant device_flush_dte() calls
  iommu-api: Add missing header file

Fix up trivial conflicts (independent additions close to each other) in
drivers/Makefile and include/linux/pci.h
2011-07-22 16:39:42 -07:00
Linus Torvalds 17413f5acd Merge branch 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: Fixup __this_cpu_xchg* operations
2011-07-22 15:07:35 -07:00
Linus Torvalds acb41c0f92 Merge branch 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  pci/of: Consolidate pci_bus_to_OF_node()
  pci/of: Consolidate pci_device_to_OF_node()
  x86/devicetree: Use generic PCI <-> OF matching
  microblaze/pci: Move the remains of pci_32.c to pci-common.c
  microblaze/pci: Remove powermac originated cruft
  pci/of: Match PCI devices to OF nodes dynamically
2011-07-22 14:54:02 -07:00
Linus Torvalds 951cc93a74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1287 commits)
  icmp: Fix regression in nexthop resolution during replies.
  net: Fix ppc64 BPF JIT dependencies.
  acenic: include NET_SKB_PAD headroom to incoming skbs
  ixgbe: convert to ndo_fix_features
  ixgbe: only enable WoL for magic packet by default
  ixgbe: remove ifdef check for non-existent define
  ixgbe: Pass staterr instead of re-reading status and error bits from descriptor
  ixgbe: Move interrupt related values out of ring and into q_vector
  ixgbe: add structure for containing RX/TX rings to q_vector
  ixgbe: inline the ixgbe_maybe_stop_tx function
  ixgbe: Update ATR to use recorded TX queues instead of CPU for routing
  igb: Fix for DH89xxCC near end loopback test
  e1000: always call e1000_check_for_link() on e1000_ce4100 MACs.
  netxen: add fw version compatibility check
  be2net: request native mode each time the card is reset
  ipv4: Constrain UFO fragment sizes to multiples of 8 bytes
  virtio_net: Fix panic in virtnet_remove
  ipv6: make fragment identifications less predictable
  ipv6: unshare inetpeers
  can: make function can_get_bittiming static
  ...
2011-07-22 14:43:13 -07:00
Linus Torvalds a7e1aabb28 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: Fix in/out emulation
  lguest: Fix translation count about wikipedia's cpuid page
  lguest: Fix three simple typos in comments
  lguest: update comments
  lguest: Simplify device initialization.
  lguest: don't rewrite vmcall instructions
  lguest: remove remaining vmcall
  lguest: use a special 1:1 linear pagetable mode until first switch.
  lguest: Do not exit on non-fatal errors
2011-07-22 13:45:50 -07:00
Linus Torvalds 111ad119d1 Merge branch 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI
  xen/pciback: Remove the DEBUG option.
  xen/pciback: Drop two backends, squash and cleanup some code.
  xen/pciback: Print out the MSI/MSI-X (PIRQ) values
  xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices.
  xen: rename pciback module to xen-pciback.
  xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases.
  xen/pciback: Allocate IRQ handler for device that is shared with guest.
  xen/pciback: Disable MSI/MSI-X when reseting a device
  xen/pciback: guest SR-IOV support for PV guest
  xen/pciback: Register the owner (domain) of the PCI device.
  xen/pciback: Cleanup the driver based on checkpatch warnings and errors.
  xen/pciback: xen pci backend driver.
  xen: tmem: self-ballooning and frontswap-selfshrinking
  xen: Add module alias to autoload backend drivers
  xen: Populate xenbus device attributes
  xen: Add __attribute__((format(printf... where appropriate
  xen: prepare tmem shim to handle frontswap
  xen: allow enable use of VGA console on dom0
2011-07-22 13:45:15 -07:00
Linus Torvalds 997271cf5e Merge branch 'stable/pci.cleanups.v1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/pci.cleanups.v1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pci: Use 'acpi_gsi_to_irq' value unconditionally.
  xen/pci: Remove 'xen_allocate_pirq_gsi'.
  xen/pci: Retire unnecessary #ifdef CONFIG_ACPI
  xen/pci: Move the allocation of IRQs when there are no IOAPIC's to the end
  xen/pci: Squash pci_xen_initial_domain and xen_setup_pirqs together.
  xen/pci: Use the xen_register_pirq for HVM and initial domain users
  xen/pci: In xen_register_pirq bind the GSI to the IRQ after the hypercall.
  xen/pci: Provide #ifdef CONFIG_ACPI to easy code squashing.
  xen/pci: Update comments and fix empty spaces.
  xen/pci: Shuffle code around.
2011-07-22 13:44:53 -07:00
Linus Torvalds 896d01796d Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen:pvhvm: Modpost section mismatch fix
2011-07-22 13:44:45 -07:00
Jonas Bonn a4e05276a1 asm-generic: move archictures to common delay.h
This patch moves the in-tree architectures that were using the 'generic'
delay.h over to using the header file in asm-generic.

This is not done using the generic-y mechanism as none of these arch's
have started using that mechanism yet.  This is a trivial change to make
later when the arch begins using generic-y.

Note the subtle change to the avr32 and SH architectures where the argument
to __const_udelay was previously using the rounded down constant value
instead of the rounded up value.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
2011-07-22 18:46:24 +02:00
Narendra_K@Dell.com 9b373ed18f x86/PCI: Preserve existing pci=bfsort whitelist for Dell systems
Commit 6e8af08dfa enables pci=bfsort on
future Dell systems. But the identification string 'Dell System' matches
on already existing whitelist, which do not have SMBIOS type 0xB1,
causing pci=bfsort not being set on existing whitelist.

This patch fixes the regression by moving the type 0xB1 check beyond the
existing whitelist so that existing whitelist is walked before.

Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:44:28 -07:00
tip-bot for Sergei Shtylyov a8c7ef3187 x86/PCI: quirks: Use pci_dev->revision
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to arch/x86/...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/201107111901.39281.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:41:39 -07:00
Ralf Baechle d5341942d7 PCI: Make the struct pci_dev * argument of pci_fixup_irqs const.
Aside of the usual motivation for constification,  this function has a
history of being abused a hook for interrupt and other fixups so I turned
this function const ages ago in the MIPS code but it should be done
treewide.

Due to function pointer passing in varous places a few other functions
had to be constified as well.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
To: Anton Vorontsov <avorontsov@mvista.com>
To: Chris Metcalf <cmetcalf@tilera.com>
To: Colin Cross <ccross@android.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
To: Eric Miao <eric.y.miao@gmail.com>
To: Erik Gilling <konkers@android.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
To: "H. Peter Anvin" <hpa@zytor.com>
To: Imre Kaloz <kaloz@openwrt.org>
To: Ingo Molnar <mingo@redhat.com>
To: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
To: Jesse Barnes <jbarnes@virtuousgeek.org>
To: Krzysztof Halasa <khc@pm.waw.pl>
To: Lennert Buytenhek <kernel@wantstofly.org>
To: Matt Turner <mattst88@gmail.com>
To: Nicolas Pitre <nico@fluxnic.net>
To: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mundt <lethal@linux-sh.org>
To: Richard Henderson <rth@twiddle.net>
To: Russell King <linux@arm.linux.org.uk>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-pci@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-tegra@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:26:06 -07:00
Jan Beulich db34a363b9 x86/PCI: config space accessor functions should not ignore the segment argument
Without this change, the majority of the raw PCI config space access
functions silently ignore a non-zero segment argument, which is
certainly wrong.

Apart from pci_direct_conf1, all other non-MMCFG access methods get
used only for non-extended accesses (i.e. assigned to raw_pci_ops
only). Consequently, with the way raw_pci_{read,write}() work, it would
be a coding error to call these functions with a non-zero segment (with
the current call flow this cannot happen afaict).

The access method 1 accessor, as it can be used for extended accesses
(on AMD systems) instead gets checks added for the passed in segment to
be zero. This would be the case when on such a system having multiple
PCI segments (don't know whether any exist in practice) MMCFG for some
reason is not usable, and method 1 gets selected for doing extended
accesses. Rather than accessing the wrong device's config space, the
function will now error out.

v2: Convert BUG_ON() to WARN_ON(), and extend description as per Ingo's
request.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:25:41 -07:00
Bjorn Helgaas 43d786ed4d x86/PCI: reduce severity of host bridge window conflict warnings
Host bridge windows are top-level resources, so if we find a host bridge
window conflict, it's probably with a hard-coded legacy reservation.
Moving host bridge windows is theoretically possible, but we don't support
it; we just ignore windows with conflicts, and it's not worth making this
a user-visible error.

Reported-and-tested-by: Jools Wills <jools@oxfordinspire.co.uk>
References: https://bugzilla.kernel.org/show_bug.cgi?id=38522
Reported-by: Das <dasfox@gmail.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=16497
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:25:39 -07:00
Shaohua Li 0aba496fc8 x86/PCI: select direct access mode for mmconfig option
Direct access is needed in mmconf mode too. There are two reasons:
1. we need it to access first 256 bytes. We have bug before that
   using mmconf to access pci config space hangs system (when
   resizing BARs), so we use type1 config for legacy config space.
2. when doing mmconfg bar checking, we need access ACPI _CRS,
   which might access PCI config space.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:25:36 -07:00
Adrian Knoth 8d431f4160 lguest: Fix translation count about wikipedia's cpuid page
The comment is outdated, wikipedia now has six translations of the cpuid
page.

Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-22 14:39:50 +09:30
Adrian Knoth 64be11583b lguest: Fix three simple typos in comments
This patch fixes three typos I've accidentally spotted.

Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (one was already fixed)
2011-07-22 14:39:50 +09:30
Rusty Russell 9f54288def lguest: update comments
Also removes a long-unused #define and an extraneous semicolon.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-22 14:39:50 +09:30
Rusty Russell 7e1941444f lguest: remove remaining vmcall
We switch back from using vmcall in 091ebf07a2
because it was unreliable under kvm, but I missed one (rarely-used) place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-22 14:39:49 +09:30
Rusty Russell 5dea1c88ed lguest: use a special 1:1 linear pagetable mode until first switch.
The Host used to create some page tables for the Guest to use at the
top of Guest memory; it would then tell the Guest where this was.  In
particular, it created linear mappings for 0 and 0xC0000000 addresses
because lguest used to switch to its real page tables quite late in
boot.

However, since d50d8fe19 Linux initialized boot page tables in
head_32.S even before the "are we lguest?" boot jump.  So, now we can
simplify things: the Host pagetable code assumes 1:1 linear mapping
until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do
before we reach C code.

This also means that the Host doesn't need to know anything about the
Guest's PAGE_OFFSET.  (Non-Linux guests might not even have such a
thing).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-07-22 14:39:48 +09:30
Andy Lutomirski aafade242f x86-64, vdso: Do not allocate memory for the vDSO
We can map the vDSO straight from kernel data, saving a few page
allocations.  As an added bonus, the deleted code contained a memory
leak.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/2c4ed5c2c2e93603790229e0c3403ae506ccc0cb.1311277573.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-21 13:41:53 -07:00
H. Peter Anvin ae7bd11b47 clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
The machinery for __ARCH_HAS_CLOCKSOURCE_DATA assumed a file in
asm-generic would be the default for architectures without their own
file in asm/, but that is not how it works.

Replace it with a Kconfig option instead.

Link: http://lkml.kernel.org/r/4E288AA6.7090804@zytor.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tony Luck <tony.luck@intel.com>
2011-07-21 13:34:05 -07:00
H. Peter Anvin a536877e77 x86: Make Dell Latitude E6420 use reboot=pci
Yet another variant of the Dell Latitude series which requires
reboot=pci.

From the E5420 bug report by Daniel J Blueman:

> The E6420 is affected also (same platform, different casing and
> features), which provides an external confirmation of the issue; I can
> submit a patch for that later or include it if you prefer:
> http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2011-07-21 11:47:17 -07:00
Daniel J Blueman b7798d28ec x86: Make Dell Latitude E5420 use reboot=pci
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.

[ hpa: this was deferred because we believed for a long time that the
  recent reshuffling of the boot priorities in commit
  660e34cebf fixed this platform.
  Unfortunately that turned out to be incorrect. ]

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2011-07-21 11:45:49 -07:00
Robert Richter 1ac2e6ca44 x86, perf: Make copy_from_user_nmi() a library function
copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 20:41:57 +02:00
Cyrill Gorcunov f53173e47d x86, perf: P4 PMU - Fix typos in comments and style cleanup
This patch:

 - fixes typos in comments and clarifies the text
 - renames obscure p4_event_alias::original and ::alter members to
   ::original and ::alternative as appropriate
 - drops parenthesis from the return of p4_get_alias_event()

No functional changes.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/20110721160625.GX7492@sun
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 20:41:54 +02:00
Phil Carmody 497888cf69 treewide: fix potentially dangerous trailing ';' in #defined values/expressions
All these are instances of
  #define NAME value;
or
  #define NAME(params_opt) value;

These of course fail to build when used in contexts like
  if(foo $OP NAME)
  while(bar $OP NAME)
and may silently generate the wrong code in contexts such as
  foo = NAME + 1;    /* foo = value; + 1; */
  bar = NAME - 1;    /* bar = value; - 1; */
  baz = NAME & quux; /* baz = value; & quux; */

Reported on comp.lang.c,
Message-ID: <ab0d55fe-25e5-482b-811e-c475aa6065c3@c29g2000yqd.googlegroups.com>
Initial analysis of the dangers provided by Keith Thompson in that thread.

There are many more instances of more complicated macros having unnecessary
trailing semicolons, but this pile seems to be all of the cases of simple
values suffering from the problem. (Thus things that are likely to be found
in one of the contexts above, more complicated ones aren't.)

Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-21 14:10:00 +02:00
Huang Ying 050438ed5a kexec, x86: Fix incorrect jump back address if not preserving context
In kexec jump support, jump back address passed to the kexeced
kernel via function calling ABI, that is, the function call
return address is the jump back entry.

Furthermore, jump back entry == 0 should be used to signal that
the jump back or preserve context is not enabled in the original
kernel.

But in the current implementation the stack position used for
function call return address is not cleared context
preservation is disabled. The patch fixes this bug.

Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 11:19:28 +02:00
Alan Cox 43605ef188 x86, config: Introduce an INTEL_MID configuration
We need to carve up the configuration between:

 - MID general
 - Moorestown specific
 - Medfield specific
 - Future devices

As a base point create an INTEL_MID configuration property. We
make the existing MRST configuration a sub-option. This means
that the rest of the kernel config can still use X86_MRST checks
without anything going backwards.

After this is merged future patches will tidy up which devices
are MID and which are X86_MRST, as well as add options for
Medfield.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20110712164859.7642.84136.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 10:35:14 +02:00
Sergei Shtylyov 38175051f8 x86, quirks: Use pci_dev->revision
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to arch/x86/...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/201107111901.39281.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 10:26:00 +02:00
Greg Dietsche a6c23905ff x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const
This array is read-only. Make it explicit by marking as const.

Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Link: http://lkml.kernel.org/r/1309482653-23648-1-git-send-email-Gregory.Dietsche@cuw.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 10:04:51 +02:00
Jan Beulich ef68c8f87e x86: Serialize EFI time accesses on rtc_lock
The EFI specification requires that callers of the time related
runtime functions serialize with other CMOS accesses in the
kernel, as the EFI time functions may choose to also use the
legacy CMOS RTC.

Besides fixing a latent bug, this is a prerequisite to safely
enable the rtc-efi driver for x86, which ought to be preferred
over rtc-cmos on all EFI platforms.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <mjg@redhat.com>
Link: http://lkml.kernel.org/r/4E257E33020000780004E319@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Garrett <mjg@redhat.com>
2011-07-21 09:21:00 +02:00
Jan Beulich ac619f4eba x86: Serialize SMP bootup CMOS accesses on rtc_lock
With CPU hotplug, there is a theoretical race between other CMOS
(namely RTC) accesses and those done in the SMP secondary
processor bringup path.

I am unware of the problem having been noticed by anyone in practice,
but it would very likely be rather spurious and very hard to reproduce.
So to be on the safe side, acquire rtc_lock around those accesses.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/4E257AE7020000780004E2FF@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 09:20:59 +02:00
Jan Beulich a750036f35 x86: Fix write lock scalability 64-bit issue
With the write lock path simply subtracting RW_LOCK_BIAS there
is, on large systems, the theoretical possibility of overflowing
the 32-bit value that was used so far (namely if 128 or more
CPUs manage to do the subtraction, but don't get to do the
inverse addition in the failure path quickly enough).

A first measure is to modify RW_LOCK_BIAS itself - with the new
value chosen, it is good for up to 2048 CPUs each allowed to
nest over 2048 times on the read path without causing an issue.
Quite possibly it would even be sufficient to adjust the bias a
little further, assuming that allowing for significantly less
nesting would suffice.

However, as the original value chosen allowed for even more
nesting levels, to support more than 2048 CPUs (possible
currently only for 64-bit kernels) the lock itself gets widened
to 64 bits.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 09:03:36 +02:00
Jan Beulich a738669464 x86: Unify rwsem assembly implementation
Rather than having two functionally identical implementations
for 32- and 64-bit configurations, use the previously extended
assembly abstractions to fold the rwsem two implementations into
a shared one.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 09:03:32 +02:00
Jan Beulich 4625cd6379 x86: Unify rwlock assembly implementation
Rather than having two functionally identical implementations
for 32- and 64-bit configurations, extend the existing assembly
abstractions enough to fold the two rwlock implementations into
a shared one.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 09:03:31 +02:00
Linus Torvalds 919d25a710 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86. reboot: Make Dell Latitude E6320 use reboot=pci
  x86, doc only: Correct real-mode kernel header offset for init_size
  x86: Disable AMD_NUMA for 32bit for now
2011-07-20 15:33:59 -07:00
Jeremy Fitzhardinge 2a6f6d0955 xen/multicall: move *idx fields to start of mc_buffer
The CPU would prefer small offsets.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:46 -07:00
Jeremy Fitzhardinge eac303bf2e xen/multicall: special-case singleton hypercalls
Singleton calls seem to end up being pretty common, so just
directly call the hypercall rather than going via multicall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:45 -07:00
Jeremy Fitzhardinge 4a7b005dbf xen/multicalls: add unlikely around slowpath in __xen_mc_entry()
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:45 -07:00
Jeremy Fitzhardinge ffc78767f2 xen/multicalls: disable MC_DEBUG
It's useful - and probably should be a config - but its very heavyweight,
especially with the tracing stuff to help sort out problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:28 -07:00
Jeremy Fitzhardinge bc7fe1d977 xen/mmu: tune pgtable alloc/release
Make sure the fastpath code is inlined.  Batch the page permission change
and the pin/unpin, and make sure that it can be batched with any
adjacent set_pte/pmd/etc operations.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:28 -07:00
Jeremy Fitzhardinge dcf7435cfe xen/mmu: use extend_args for more mmuext updates
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge c8eed1719a xen/trace: add tlb flush tracepoints
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge ab78f7ad2c xen/trace: add segment desc tracing
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge 5f94fb5b8e xen/trace: add xen_pgd_(un)pin tracepoints
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge c2ba050d2e xen/trace: add ptpage alloc/release tracepoints
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge 8470880791 xen/trace: add mmu tracepoints
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge c796f213a6 xen/trace: add multicall tracing
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:26 -07:00
Jeremy Fitzhardinge f04e2ee41d xen/trace: set up tracepoint skeleton
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:04 -07:00
Jeremy Fitzhardinge 84cdee76b1 xen/multicalls: remove debugfs stats
Remove debugfs stats to make way for tracing.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:04 -07:00
Borislav Petkov 8c400f6ce0 x86, vdso: Drop now wrong comment
Now that 1b3f2a72bb is in, it is very
important that the below lying comment be removed! :-)

Signed-off-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20110718191054.GA18359@liondog.tnic
Acked-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-18 12:29:50 -07:00
Len Brown 17edf2d79f x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message
Fix the printk_once() so that it actually prints (didn't print before
due to a stray comma.)

[ hpa: changed to an incremental patch and adjusted the description
  accordingly. ]

Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107151732480.18606@x980
Cc: <table@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-15 15:13:55 -07:00
Oleg Nesterov 73d382decc x86: Kill handle_signal()->set_fs()
handle_signal()->set_fs() has a nice comment which explains what
set_fs() is, but it doesn't explain why it is needed and why it
depends on CONFIG_X86_64.

Afaics, the history of this confusion is:

	1. I guess today nobody can explain why it was needed
	   in arch/i386/kernel/signal.c, perhaps it was always
	   wrong. This predates 2.4.0 kernel.

	2. then it was copy-and-past'ed to the new x86_64 arch.

	3. then it was removed from i386 (but not from x86_64)
	   by b93b6ca3 "i386: remove unnecessary code".

	4. then it was reintroduced under CONFIG_X86_64 when x86
	   unified i386 and x86_64, because the patch above didn't
	   touch x86_64.

Remove it. ->addr_limit should be correct. Even if it was possible
that it is wrong, it is too late to fix it after setup_rt_frame().

Linus commented in:
http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org

... about the equivalent bit from i386:

Heh. I think it's entirely historical.

Please realize that the whole reason that function is called "set_fs()" is 
that it literally used to set the %fs segment register, not 
"->addr_limit".

So I think the "set_fs(USER_DS)" is there _only_ to match the other

        regs->xds = __USER_DS;
        regs->xes = __USER_DS;
        regs->xss = __USER_DS;
        regs->xcs = __USER_CS;

things, and never mattered. And now it matters even less, and has been 
copied to all other architectures where it is just totally insane.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:46:20 -07:00
Oleg Nesterov 9b42962074 x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
1. do_signal() looks at TS_RESTORE_SIGMASK and calculates the
   mask which should be stored in the signal frame, then it
   passes "oldset" to the callees, down to setup_rt_frame().

   This is ugly, setup_rt_frame() can do this itself and nobody
   else needs this sigset_t. Move this code into setup_rt_frame.

2. do_signal() also clears TS_RESTORE_SIGMASK if handle_signal()
   succeeds.

   We can move this to setup_rt_frame() as well, this avoids the
   unnecessary checks and makes the logic more clear.

3. use set_current_blocked() instead of sigprocmask(SIG_SETMASK),
   sigprocmask() should be avoided.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710182203.GA27979@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:22:11 -07:00
Oleg Nesterov 3982294b03 x86, signals: Convert the X86_32 code to use set_current_blocked()
sys_sigsuspend() and sys_sigreturn() change ->blocked directly.
This is not correct, see the changelog in e6fa16ab
"signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710192727.GA31759@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:21:57 -07:00
Oleg Nesterov 905f29e2aa x86, signals: Convert the IA32_EMULATION code to use set_current_blocked()
sys32_sigsuspend() and sys32_*sigreturn() change ->blocked directly.
This is not correct, see the changelog in e6fa16ab
"signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710192724.GA31755@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 21:21:31 -07:00
Andy Lutomirski 98d0ac38ca x86-64: Move vread_tsc and vread_hpet into the vDSO
The vsyscall page now consists entirely of trap instructions.

Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 17:57:05 -07:00
H. Peter Anvin 4bb82178f5 x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE
Fix a trivial typo in the name of the constant
ENERGY_PERF_BIAS_POWERSAVE.  This didn't cause trouble because this
constant is not currently used for anything.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/tip-abe48b108247e9b90b4c6739662a2e5c765ed114@git.kernel.org
2011-07-14 14:58:44 -07:00
Cyrill Gorcunov f912987097 perf, x86: P4 PMU - Introduce event alias feature
Instead of hw_nmi_watchdog_set_attr() weak function
and appropriate x86_pmu::hw_watchdog_set_attr() call
we introduce even alias mechanism which allow us
to drop this routines completely and isolate quirks
of Netburst architecture inside P4 PMU code only.

The main idea remains the same though -- to allow
nmi-watchdog and perf top run simultaneously.

Note the aliasing mechanism applies to generic
PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
event (say passed as RAW initially) might have some
additional bits set inside ESCR register changing
the behaviour of event and we can't guarantee anymore
that alias event will give the same result.

P.S. Thanks a huge to Don and Steven for for testing
     and early review.

Acked-by: Don Zickus <dzickus@redhat.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14 17:25:04 -04:00
Len Brown abe48b1082 x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS
Since 2.6.36 (23016bf0d2), Linux prints the existence of "epb" in /proc/cpuinfo,
Since 2.6.38 (d5532ee7b4), the x86_energy_perf_policy(8) utility has
been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.

However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.

x86_energy_perf_policy(8) is available to change the default after boot,
should the user have a different preference.

Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2011-07-14 12:13:42 -07:00
David S. Miller 6a7ebdf2fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-14 07:56:40 -07:00
Glauber Costa 095c0aa83e sched: adjust scheduler cpu power for stolen time
This patch makes update_rq_clock() aware of steal time.
The mechanism of operation is not different from irq_time,
and follows the same principles. This lives in a CONFIG
option itself, and can be compiled out independently of
the rest of steal time reporting. The effect of disabling it
is that the scheduler will still report steal time (that cannot be
disabled), but won't use this information for cpu power adjustments.

Everytime update_rq_clock_task() is invoked, we query information
about how much time was stolen since last call, and feed it into
sched_rt_avg_update().

Although steal time reporting in account_process_tick() keeps
track of the last time we read the steal clock, in prev_steal_time,
this patch do it independently using another field,
prev_steal_time_rq. This is because otherwise, information about time
accounted in update_process_tick() would never reach us in update_rq_clock().

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-14 12:59:47 +03:00
Glauber Costa 3c404b578f KVM guest: Add a pv_ops stub for steal time
This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-14 12:59:44 +03:00
Glauber Costa c9aaa8957f KVM: Steal time implementation
To implement steal time, we need the hypervisor to pass the guest
information about how much time was spent running other processes
outside the VM, while the vcpu had meaningful work to do - halt
time does not count.

This information is acquired through the run_delay field of
delayacct/schedstats infrastructure, that counts time spent in a
runqueue but not running.

Steal time is a per-cpu information, so the traditional MSR-based
infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
memory area address containing information about steal time

This patch contains the hypervisor part of the steal time infrasructure,
and can be backported independently of the guest portion.

[avi, yongjie: export delayacct_on, to avoid build failures in some configs]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Rik van Riel <riel@redhat.com>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Yongjie Ren <yongjie.ren@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-14 12:59:14 +03:00
Andy Lutomirski 433bd805e5 clocksource: Replace vread with generic arch data
The vread field was bloating struct clocksource everywhere except
x86_64, and I want to change the way this works on x86_64, so let's
split it out into per-arch data.

Cc: x86@kernel.org
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: linux-ia64@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:23:12 -07:00
Andy Lutomirski 7f79ad15f3 x86-64: Add --no-undefined to vDSO build
This gives much nicer diagnostics when something goes wrong.  It's
supported at least as far back as binutils 2.15.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/de0b50920469ff6359c529526e7639fdd36fa83c.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:23:09 -07:00
Andy Lutomirski 1b3f2a72bb x86-64: Allow alternative patching in the vDSO
This code is short enough and different enough from the module
loader that it's not worth trying to share anything.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/e73112e4381fff29e31b882c2d0856822edaea53.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:23:07 -07:00
Andy Lutomirski 59e97e4d6f x86: Make alternative instruction pointers relative
This save a few bytes on x86-64 and means that future patches can
apply alternatives to unrelocated code.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:22:56 -07:00
Andy Lutomirski c9712944b2 x86-64: Improve vsyscall emulation CS and RIP handling
Three fixes here:
 - Send SIGSEGV if called from compat code or with a funny CS.
 - Don't BUG on impossible addresses.
 - Add a missing local_irq_disable.

This patch also removes an unused variable.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13 11:22:55 -07:00
Tejun Heo 1e01979c8f x86, numa: Implement pfn -> nid mapping granularity check
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:29 -07:00
Tejun Heo d0ead15738 x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:11 -07:00
Maxime Ripard 3628c3f5c8 x86. reboot: Make Dell Latitude E6320 use reboot=pci
The Dell Latitude E6320 doesn't reboot unless reboot=pci is set.
Force it thanks to DMI.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Link: http://lkml.kernel.org/r/1309269451-4966-1-git-send-email-maxime.ripard@free-electrons.com
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-12 21:42:48 -07:00
Naga Chumbalkar 42f0efc5aa x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
When IR (interrupt remapping) is enabled print_IO_APIC() displays output according
to legacy RTE (redirection table entry) definitons:

 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 00  1    0    0   0   0    0    0    00
 01 00  0    0    0   0   0    0    0    01
 02 00  0    0    0   0   0    0    0    02
 03 00  1    0    0   0   0    0    0    03
 04 00  1    0    0   0   0    0    0    04
 05 00  1    0    0   0   0    0    0    05
 06 00  1    0    0   0   0    0    0    06
...

The above output is as per Sec 3.2.4 of the IOAPIC datasheet:
82093AA I/O Advanced Programmable Interrupt Controller (IOAPIC):
http://download.intel.com/design/chipsets/datashts/29056601.pdf

Instead the output should display the fields as discussed in Sec 5.5.1
of the VT-d specification:

(Intel Virtualization Technology for Directed I/O:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf)

After the fix:
 NR Indx Fmt Mask Trig IRR Pol Stat Indx2 Zero Vect:
 00 0000 0   1    0    0   0   0    0     0    00
 01 000F 1   0    0    0   0   0    0     0    01
 02 0001 1   0    0    0   0   0    0     0    02
 03 0002 1   1    0    0   0   0    0     0    03
 04 0011 1   1    0    0   0   0    0     0    04
 05 0004 1   1    0    0   0   0    0     0    05
 06 0005 1   1    0    0   0   0    0     0    06
...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211658.2939.93123.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 20:17:58 -07:00
Naga Chumbalkar 3040db92ee x86, ioapic: Print IRTE when IR is enabled
When "apic=debug" is used as a boot parameter, Linux prints the IOAPIC routing
entries in "dmesg". Below is output from IOAPIC whose apic_id is 8:

# dmesg | grep "routing entry"
IOAPIC[8]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
...

Similarly, when IR (interrupt remapping) is enabled, and the IRTE
(interrupt remapping table entry) is set up we should display it.

After the fix:

# dmesg | grep IRTE
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:31 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:30 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:33 Dest:00000000 SID:00F1 SQ:0 SVT:1)
...

The IRTE is defined in Sec 9.5 of the Intel VT-d Specification.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211704.2939.71291.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:34:00 -07:00
Naga Chumbalkar 2597085228 x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
If there's no special reason to zero-out the "high" 32-bits of the IA32_APIC_BASE
MSR, let's preserve it.

The x2APIC Specification doesn't explicitly state any such requirement. (Sec 2.2
in: http://www.intel.com/Assets/PDF/manual/318148.pdf).

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712055831.2498.78521.sendpatchset@nchumbalkar.americas.cpqcorp.net
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:33:49 -07:00
Christoph Lameter 688d3be815 percpu: Fixup __this_cpu_xchg* operations
Somehow we got into a situation where the __this_cpu_xchg() operations were
not defined in the same way as this_cpu_xchg() and friends. I had some build
failures under 32 bit that were addressed by these fixes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-07-12 13:47:16 +02:00
Glauber Costa 9ddabbe72e KVM: KVM Steal time guest/host interface
To implement steal time, we need the hypervisor to pass the guest information
about how much time was spent running other processes outside the VM.
This is per-vcpu, and using the kvmclock structure for that is an abuse
we decided not to make.

In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
holds the memory area address containing information about steal time

This patch contains the headers for it. I am keeping it separate to facilitate
backports to people who wants to backport the kernel part but not the
hypervisor, or the other way around.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:17:03 +03:00
Glauber Costa 4b6b35f55c KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface
This patch is simple, put in a different commit so it can be more easily
shared between guest and hypervisor. It just defines a named constant
to indicate the enable bit for KVM-specific MSRs.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:17:02 +03:00
Takuya Yoshikawa 3c8c652ae4 KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()
Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:44 +03:00
Takuya Yoshikawa 92c1c1e85b KVM: MMU: Rename the walk label in walk_addr_generic()
The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:43 +03:00
Takuya Yoshikawa 134291bf3c KVM: MMU: Clean up the error handling of walk_addr_generic()
Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:42 +03:00
Marcelo Tosatti f8f7e5ee10 Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB"
This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:41 +03:00
Avi Kivity 45bd07b9d5 KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB
kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
it does not.

Fix by flushing the tlb (and syncing the new root as well).

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:27 +03:00
Avi Kivity 411c588dfb KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:26 +03:00