Now we have the following behavior:
1) object_new() returns an object with ref = 1
2) object_initialize() does not increase the reference count (ref may be 0).
3) object_deref() will finalize the object when ref = 0. it does not free the
memory associated with the object.
4) both link and child properties correctly set the reference count.
The expected usage is the following:
1) child devices should generally be created via object_initialize() using
memory from the parent device. Adding the object as a child property will
take ownership of the object and tie the child's life cycle to the parent.
2) If a child device is created via qdev_create() or some other form of
object_new(), there must be an object_delete() call in the parent device's
finalize function.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Links had limited utility before as they only allowed a concrete type to be
specified. Now we can support abstract types and interfaces which means it's
now possible to have a link<PCIDevice>.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is mostly code movement although not entirely. This makes properties part
of the Object base class which means that we can now start using Object in a
meaningful way outside of qdev.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
I'm sure the intentions were good here, but there's no reason this should be in
qdev. Move it to qemu-char where it belongs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Note that the FIXME gets fixed in series 4/4. We need to convert BusState to
QOM before we can make parent_bus a link.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This gets us closer to being able to object_new() a qdev type and have a
functioning object verses having to call qdev_create().
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds a command that allows searching for types that implement a property.
This allows you to do things like search for all available PCIDevices. In the
future, we'll also have a standard interface for things with a BlockDriverState
property that a PCIDevice could implement.
This will enable search queries like, "any type that implements the BlockDevice
interface" which would allow management tools to present available block devices
without having to hard code device names. Since an object can implement
multiple interfaces, one device could act both as a BlockDevice and a
NetworkDevice.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Limit them to the device_add functionality. Device aliases were a hack based
on the fact that virtio was modeled the wrong way. The mechanism for aliasing
is very limited in that only one alias can exist for any device.
We have to support it for the purposes of compatibility but we only need to
support it in device_add so restrict it to that piece of code.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
v1 -> v2
- Use a table for aliases (Paolo)
This was done in a mostly automated fashion. I did it in three steps and then
rebased it into a single step which avoids repeatedly touching every file in
the tree.
The first step was a sed-based addition of the parent type to the subclass
registration functions.
The second step was another sed-based removal of subclass registration functions
while also adding virtual functions from the base class into a class_init
function as appropriate.
Finally, a python script was used to convert the DeviceInfo structures and
qdev_register_subclass functions to TypeInfo structures, class_init functions,
and type_register_static calls.
We are almost fully converted to QOM after this commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This allows us to drop per-Device registration functions by allowing the
class_init functions to overload qdev methods.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now DeviceInfo is no longer used after object construction. All of the
relevant members have been moved to DeviceClass.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We can probably model USBHidDevice as a base class to get even better code
sharing but for now, just use a common function to initialize the common class
members.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When running Linux on e500 with powersave-nap enabled, Linux tries to
read out the L1CFG0 register and calculates some things from it. Passing
0 there ends up in a division by 0, resulting in -1, resulting in badness.
So let's populate the L1CFG0 register with reasonable defaults. That way
guests aren't completely confused.
Reported-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500mc implements Embedded.Processor Control, so enable it and
thus enable guests to IPI each other. This makes -smp work with -cpu
e500mc.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the msgsnd instruction. It is part of the
Embedded.Processor Control specification and allows one CPU to
IPI another CPU without going through an interrupt controller.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the msgclr instruction. It is part of the
Embedded.Processor Control specification and clears pending doorbell
interrupts on the current CPU.
Signed-off-by: Alexander Graf <agraf@suse.de>
We already had all the code available to have doorbell exceptions
be handled properly. It was just disabled.
Enable it, so we can rely on it.
Signed-off-by: Alexander Graf <agraf@suse.de>
We're going to introduce doorbell instructions (called processor
control in the spec) soon. Add some defines for easier patch
readability later.
Signed-off-by: Alexander Graf <agraf@suse.de>
Our EXCP list is getting outdated. By now, 3 new exception vectors have
been introduced. Update the list so we have everything at one place.
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 84b058d broke compilation for KVM on non-x86 targets, which
don't have KVM_CAP_IRQ_ROUTING defined.
Fix by not using the unavailable constant when it's not around.
Signed-off-by: Alexander Graf <agraf@suse.de>
We can have TLBs that only support a single page size. This is defined
by the absence of the AVAIL flag in TLBnCFG. If this is the case, we
currently write invalid size info into the TLB, but override it on
internal fault.
Let's move the check over to tlbwe, so we don't have the AVAIL check in
the hotter fault path.
Signed-off-by: Alexander Graf <agraf@suse.de>
Our internal helpers to fetch TLB entries were not able to tell us
that an entry doesn't even exist. Pass an error out if we hit such
a case to not accidently pass beyond the TLB array.
Signed-off-by: Alexander Graf <agraf@suse.de>
The PowerPC 2.06 BookE ISA defines an opcode called "tlbilx" which is used
to flush TLB entries. It's the recommended way of flushing in virtualized
environments.
So far we got away without implementing it, but Linux for e500mc uses this
instruction, so we better add it :).
Signed-off-by: Alexander Graf <agraf@suse.de>
When setting a TLB entry, we need to check if the TLB we're putting it in
actually supports the given size. According to the 2.06 PowerPC ISA, a
value that's out of range can either be redefined to something implementation
dependent or we can raise an illegal opcode exception. We do the latter.
Signed-off-by: Alexander Graf <agraf@suse.de>
When using MAV 2.0 TLB registers, we have another range of TLB registers
available to read the supported page sizes from.
Add SPR definitions for those and add a helper function that we can use
to receive such a bitmap even when using MAV 1.0.
Signed-off-by: Alexander Graf <agraf@suse.de>
We might want to call the tlb check function without actually caring about
the real address resolution. Check if we really should write the value
back.
Signed-off-by: Alexander Graf <agraf@suse.de>
The msync instruction as defined today is only valid on 4xx cores, not
on e500 which also supports msync, but treats it the same way as sync.
Rename it to reflect that it's 4xx only.
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500 CPUs don't use 440's msync which falls on the same opcode IDs,
but instead use the real powerpc sync instruction. This is important,
since the invalid mask differs between the two.
Signed-off-by: Alexander Graf <agraf@suse.de>
Our code only knows IVORs up to 37. Add the new ones defined in ISA 2.06
from 38 - 42.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Unfortunately the HIOR setting code slipped into upstream QEMU
before it was pulled into upstream KVM. And since Murphy is always
right, comments on the patches only emerged on the pull request
leading to changes in the interface.
So here's an update to the HIOR setting. While at it, I also relaxed
it a bit since for HV KVM we can already run fine without and 3.2
works just fine with HV KVM but when not setting HIOR. We will only
need this when running PAPR in PR KVM.
Since we accidently changed the ABI and API along the way, we have
to update the underlying kernel headers together with the code that
uses it to not break bisectability.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch is basically what ./scripts/update-linux-headers.sh against
upstream KVM's next branch outputs except that all the HIOR bits are
removed. These we have to update with the code that uses them.
Signed-off-by: Alexander Graf <agraf@suse.de>
This file only contains code from Red Hat, so it can use GPLv2+.
Tested with `git blame -M -C net/checksum.c`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The most common use of -net tap is to connect a tap device to a bridge. This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.
This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu. The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.
By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users. We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.
Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.
A typical invocation would be similar to one of the following:
qemu linux.img -net bridge -net nic,model=virtio
qemu linux.img -net tap,helper="/usr/local/libexec/qemu-bridge-helper"
-net nic,model=virtio
qemu linux.img -netdev bridge,id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
qemu linux.img -netdev tap,helper="/usr/local/libexec/qemu-bridge-helper",id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
The default bridge that we attach to is br0. The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.
Alternatively, if a user wants to use a different bridge, a typical invocation
would be simliar to one of the following:
qemu linux.img -net bridge,br=qemubr0 -net nic,model=virtio
qemu linux.img -net tap,helper="/usr/local/libexec/qemu-bridge-helper --br=qemubr0"
-net nic,model=virtio
qemu linux.img -netdev bridge,br=qemubr0,id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
qemu linux.img -netdev tap,helper="/usr/local/libexec/qemu-bridge-helper --br=qemubr0",id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The ideal way to use qemu-bridge-helper is to give it an fscap of using:
setcap cap_net_admin=ep qemu-bridge-helper
Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied. This means they'll have to SUID the qemu-bridge-helper
binary.
To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user. This is
hopefully close to equivalent to fscap support from a security perspective.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We go to great lengths to restrict ourselves to just cap_net_admin as an OS
enforced security mechanism. However, we further restrict what we allow users
to do to simply adding a tap device to a bridge interface by virtue of the fact
that this is the only functionality we expose.
This is not good enough though. An administrator is likely to want to restrict
the bridges that an unprivileged user can access, in particular, to restrict
an unprivileged user from putting a guest on what should be isolated networks.
This patch implements an ACL mechanism that is enforced by qemu-bridge-helper.
The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of
'all'. All users are blacklisted by default, and deny takes precedence over
allow.
An interesting feature of this ACL mechanism is that you can include external
ACL files. The main reason to support this is so that you can set different
file system permissions on those external ACL files. This allows an
administrator to implement rather sophisticated ACL policies based on
user/group policies via the file system.
As an example:
/etc/qemu/bridge.conf root:qemu 0640
allow br0
include /etc/qemu/alice.conf
include /etc/qemu/bob.conf
include /etc/qemu/charlie.conf
/etc/qemu/alice.conf root:alice 0640
allow br1
/etc/qemu/bob.conf root:bob 0640
allow br2
/etc/qemu/charlie.conf root:charlie 0640
deny all
This ACL pattern allows any user in the qemu group to get a tap device
connected to br0 (which is bridged to the physical network).
Users in the alice group can additionally get a tap device connected to br1.
This allows br1 to act as a private bridge for the alice group.
Users in the bob group can additionally get a tap device connected to br2.
This allows br2 to act as a private bridge for the bob group.
Users in the charlie group cannot get a tap device connected to any bridge.
Under no circumstance can the bob group get access to br1 or can the alice
group get access to br2. And under no cicumstance can the charlie group
get access to any bridge.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>