dect
/
linux-2.6
Archived
13
0
Fork 0
Commit Graph

335482 Commits

Author SHA1 Message Date
Tejun Heo 033fa1c5f5 cgroup, cpuset: remove cgroup_subsys->post_clone()
Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone().  Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core.  cpuset can
check the flag from its ->css_allocate() and take the necessary
action.

Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().

Loosely based on Glauber's "generalize post_clone into post_create"
patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Original-patch-by: Glauber Costa <glommer@parallels.com>
Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:39 -08:00
Tejun Heo 2260e7fc1f cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation.  Also, drop
clone_children() wrapper in cgroup.c.  The thin wrapper is used only a
few times and one of them will go away soon.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:38 -08:00
Tejun Heo 92fb97487a cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are.  Also, update documentation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:38 -08:00
Tejun Heo b1929db42f cgroup: allow ->post_create() to fail
There could be cases where controllers want to do initialization
operations which may fail from ->post_create().  This patch makes
->post_create() return -errno to indicate failure and online_css()
relay such failures.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-19 08:13:38 -08:00
Tejun Heo 4b8b47eb00 cgroup: update cgroup_create() failure path
cgroup_create() was ignoring failure of cgroupfs files.  Update it
such that, if file creation fails, it rolls back by calling
cgroup_destroy_locked() and returns failure.

Note that error out goto labels are renamed.  The labels are a bit
confusing but will become better w/ later cgroup operation renames.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:38 -08:00
Tejun Heo b8a2df6a5b cgroup: use mutex_trylock() when grabbing i_mutex of a new cgroup directory
All cgroup directory i_mutexes nest outside cgroup_mutex; however, new
directory creation is a special case.  A new cgroup directory is
created while holding cgroup_mutex.  Populating the new directory
requires both the new directory's i_mutex and cgroup_mutex.  Because
all directory i_mutexes nest outside cgroup_mutex, grabbing both
requires releasing cgroup_mutex first, which isn't a good idea as the
new cgroup isn't yet ready to be manipulated by other cgroup
opreations.

This is worked around by grabbing the new directory's i_mutex while
holding cgroup_mutex before making it visible.  As there's no other
user at that point, grabbing the i_mutex under cgroup_mutex can't lead
to deadlock.

cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to
worry about the reverse locking order; however, this creates pseudo
locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true -
all directory i_mutexes are still nested outside cgroup_mutex.  This
pseudo locking dependency can lead to spurious lockdep warnings.

Use mutex_trylock() instead.  This will always succeed and lockdep
doesn't create any locking dependency for it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo d19e19de48 cgroup: simplify cgroup_load_subsys() failure path
Now that cgroup_unload_subsys() can tell whether the root css is
online or not, we can safely call cgroup_unload_subsys() after idr
init failure in cgroup_load_subsys().

Replace the manual unrolling and invoke cgroup_unload_subsys() on
failure.  This drops cgroup_mutex inbetween but should be safe as the
subsystem will fail try_module_get() and thus can't be mounted
inbetween.  As this means that cgroup_unload_subsys() can be called
before css_sets are rehashed, remove BUG_ON() on %NULL
css_set->subsys[] from cgroup_unload_subsys().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo a31f2d3ff7 cgroup: introduce CSS_ONLINE flag and on/offline_css() helpers
New helpers on/offline_css() respectively wrap ->post_create() and
->pre_destroy() invocations.  online_css() sets CSS_ONLINE after
->post_create() is complete and offline_css() invokes ->pre_destroy()
iff CSS_ONLINE is set and clears it while also handling the temporary
dropping of cgroup_mutex.

This patch doesn't introduce any behavior change at the moment but
will be used to improve cgroup_create() failure path and allow
->post_create() to fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo 42809dd422 cgroup: separate out cgroup_destroy_locked()
Separate out cgroup_destroy_locked() from cgroup_destroy().  This will
be later used in cgroup_create() failure path.

While at it, add lockdep asserts on i_mutex and cgroup_mutex, and move
@d and @parent assignments to their declarations.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo 02ae7486d0 cgroup: fix harmless bugs in cgroup_load_subsys() fail path and cgroup_unload_subsys()
* If idr init fails, cgroup_load_subsys() cleared dummytop->subsys[]
  before calilng ->destroy() making CSS inaccessible to the callback,
  and didn't unlink ss->sibling.  As no modular controller uses
  ->use_id, this doesn't cause any actual problems.

* cgroup_unload_subsys() was forgetting to free idr, call
  ->pre_destroy() and clear ->active.  As there currently is no
  modular controller which uses ->use_id, ->pre_destroy() or ->active,
  this doesn't cause any actual problems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:37 -08:00
Tejun Heo 648bb56d07 cgroup: lock cgroup_mutex in cgroup_init_subsys()
Make cgroup_init_subsys() grab cgroup_mutex while initializing a
subsystem so that all helpers and callbacks are called under the
context they expect.  This isn't strictly necessary as
cgroup_init_subsys() doesn't race with anybody but will allow adding
lockdep assertions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo b48c6a80a0 cgroup: trivial cleanup for cgroup_init/load_subsys()
Consistently use @css and @dummytop in these two functions instead of
referring to them indirectly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo 38b53abaa3 cgroup: make CSS_* flags bit masks instead of bit positions
Currently, CSS_* flags are defined as bit positions and manipulated
using atomic bitops.  There's no reason to use atomic bitops for them
and bit positions are clunkier to deal with than bit masks.  Make
CSS_* bit masks instead and use the usual C bitwise operators to
access them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo febfcef60d cgroup: cgroup->dentry isn't a RCU pointer
cgroup->dentry is marked and used as a RCU pointer; however, it isn't
one - the final dentry put doesn't go through call_rcu().  cgroup and
dentry share the same RCU freeing rule via synchronize_rcu() in
cgroup_diput() (kfree_rcu() used on cgrp is unnecessary).  If cgrp is
accessible under RCU read lock, so is its dentry and dereferencing
cgrp->dentry doesn't need any further RCU protection or annotation.

While not being accurate, before the previous patch, the RCU accessors
served a purpose as memory barriers - cgroup->dentry used to be
assigned after the cgroup was made visible to cgroup_path(), so the
assignment and dereferencing in cgroup_path() needed the memory
barrier pair.  Now that list_add_tail_rcu() happens after
cgroup->dentry is assigned, this no longer is necessary.

Remove the now unnecessary and misleading RCU annotations from
cgroup->dentry.  To make up for the removal of rcu_dereference_check()
in cgroup_path(), add an explicit rcu_lockdep_assert(), which asserts
the dereference rule of @cgrp, not cgrp->dentry.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo 4e139afc22 cgroup: create directory before linking while creating a new cgroup
While creating a new cgroup, cgroup_create() links the newly allocated
cgroup into various places before trying to create its directory.
Because cgroup life-cycle is tied to the vfs objects, this makes it
impossible to use cgroup_rmdir() for rolling back creation - the
removal logic depends on having full vfs objects.

This patch moves directory creation above linking and collect linking
operations to one place.  This allows directory creation failure to
share error exit path with css allocation failures and any failure
sites afterwards (to be added later) can use cgroup_rmdir() logic to
undo creation.

Note that this also makes the memory barriers around cgroup->dentry,
which currently is misleadingly using RCU operations, unnecessary.
This will be handled in the next patch.

While at it, locking BUG_ON() on i_mutex is converted to
lockdep_assert_held().

v2: Patch originally removed %NULL dentry check in cgroup_path();
    however, Li pointed out that this patch doesn't make it
    unnecessary as ->create() may call cgroup_path().  Drop the
    change for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo 28fd6f30ac cgroup: open-code cgroup_create_dir()
The operation order of cgroup creation is about to change and
cgroup_create_dir() is more of a hindrance than a proper abstraction.
Open-code it by moving the parent nlink adjustment next to self nlink
adjustment in cgroup_create_file() and the rest to cgroup_create().

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:36 -08:00
Tejun Heo 2243076ad1 cgroup: initialize cgrp->allcg_node in init_cgroup_housekeeping()
Not strictly necessary but it's annoying to have uninitialized
list_head around.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-19 08:13:35 -08:00
Tejun Heo 175431635e cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()
cgroup_create_dir() does weird dancing with dentry refcnt.  On
success, it gets and then puts it achieving nothing.  On failure, it
puts but there isn't no matching get anywhere leading to the following
oops if cgroup_create_file() fails for whatever reason.

  ------------[ cut here ]------------
  kernel BUG at /work/os/work/fs/dcache.c:552!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  CPU 2
  Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff811d9c0c>]  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
  RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
  RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
  RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
  R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
  R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
  FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
  Stack:
   ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
   ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
   ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
  Call Trace:
   [<ffffffff811cc889>] done_path_create+0x19/0x50
   [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80
   [<ffffffff811d2009>] sys_mkdir+0x19/0x20
   [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b
  Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
  RIP  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
   RSP <ffff88001a3ebef8>
  ---[ end trace 1277bcfd9561ddb0 ]---

Fix it by dropping the unnecessary dget/dput() pair.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
2012-11-19 08:13:35 -08:00
Tejun Heo ef9fe980c6 cgroup_freezer: implement proper hierarchy support
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved.  They all operated
separately.

This patch implements proper hierarchy support.  If a cgroup is
frozen, all its descendants are frozen.  A cgroup is thawed iff it and
all its ancestors are THAWED.  freezer.self_freezing shows the current
freezing state for the cgroup itself.  freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.

freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state.  update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.

Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations.  Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.

As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.

Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too.  This behavior change is
intended and has been warned via .broken_hierarchy.

v2: Michal spotted a bug in freezer_change_state() - descendants were
    inheriting from the wrong ancestor.  Fixed.

v3: Documentation/cgroups/freezer-subsystem.txt updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 10:52:30 -08:00
Tejun Heo 5300a9b348 cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
A cgroup is online and visible to iteration between ->post_create()
and ->pre_destroy().  This patch introduces CGROUP_FREEZER_ONLINE and
toggles it from the newly added freezer_post_create() and
freezer_pre_destroy() while holding freezer->lock such that a
cgroup_freezer can be reilably distinguished to be online.  This will
be used by full hierarchy support.

ONLINE test is added to freezer_apply_state() but it currently doesn't
make any difference as freezer_write() can only be called for an
online cgroup.

Adjusting system_freezing_cnt on destruction is moved from
freezer_destroy() to the new freezer_pre_destroy() for consistency.

This patch doesn't introduce any noticeable behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 09:12:30 -08:00
Tejun Heo a225218060 cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
the two flags.  This is to prepare for full hierarchy support.

freezer_apply_date() is updated such that it can handle setting and
clearing of both flags.  The two flags are also exposed to userland
via read-only files self_freezing and parent_freezing.

Other than the added cgroupfs files, this patch doesn't introduce any
behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 09:12:30 -08:00
Tejun Heo d6a2fe1342 cgroup_freezer: make freezer->state mask of flags
freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
As the scheduled full hierarchy support requires more than one
freezing condition, switch it to mask of flags.  If FREEZING is not
set, it's thawed.  FREEZING is set if freezing or frozen.  If frozen,
both FREEZING and FROZEN are set.  Now that tasks can be attached to
an already frozen cgroup, this also makes freezing condition checks
more natural.

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 09:12:30 -08:00
Tejun Heo 04a4ec3257 cgroup_freezer: prepare freezer_change_state() for full hierarchy support
* Make freezer_change_state() take bool @freeze instead of enum
  freezer_state.

* Separate out freezer_apply_state() out of freezer_change_state().
  This makes freezer_change_state() a rather silly thin wrapper.  It
  will be filled with hierarchy handling later on.

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 09:12:30 -08:00
Tejun Heo bcd66c894a cgroup_freezer: trivial cleanups
* Clean-up indentation and line-breaks.  Drop the invalid comment
  about freezer->lock.

* Make all internal functions take @freezer instead of both @cgroup
  and @freezer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2012-11-09 09:12:29 -08:00
Tejun Heo 574bd9f7c7 cgroup: implement generic child / descendant walk macros
Currently, cgroup doesn't provide any generic helper for walking a
given cgroup's children or descendants.  This patch adds the following
three macros.

* cgroup_for_each_child() - walk immediate children of a cgroup.

* cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
  in pre-order tree traversal.

* cgroup_for_each_descendant_post() - visit all descendants of a
  cgroup in post-order tree traversal.

All three only require the user to hold RCU read lock during
traversal.  Verifying that each iterated cgroup is online is the
responsibility of the user.  When used with proper synchronization,
cgroup_for_each_descendant_pre() can be used to propagate state
updates to descendants in reliable way.  See comments for details.

v2: s/config/state/ in commit message and comments per Michal.  More
    documentation on synchronization rules.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-09 09:12:29 -08:00
Tejun Heo eb6fd5040e cgroup: use rculist ops for cgroup->children
Use RCU safe list operations for cgroup->children.  This will be used
to implement cgroup children / descendant walking which can be used by
controllers.

Note that cgroup_create() now puts a new cgroup at the end of the
->children list instead of head.  This isn't strictly necessary but is
done so that the iteration order is more conventional.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-09 09:12:29 -08:00
Tejun Heo a8638030f6 cgroup: add cgroup_subsys->post_create()
Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.

This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.

This patch adds ->post_create().  It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().

When used in combination with the to-be-added generic descendant
iterators, ->post_create() can be used to implement reliable state
inheritance.  It will be explained with the descendant iterators.

v2: Added a paragraph about its future use w/ descendant iterators per
    Michal.

v3: Forgot to add ->post_create() invocation to cgroup_load_subsys().
    Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-09 09:12:29 -08:00
Tao Ma 316eb661f1 cgroup: set 'start' with the right value in cgroup_path.
'start' is set to buf + buflen and do the '--' immediately.
Just set it to 'buf + buflen - 1' directly.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
2012-11-08 06:23:02 -08:00
Tejun Heo 4b1c7840b7 device_cgroup: add lockdep asserts
device_cgroup uses RCU safe ->exceptions list which is write-protected
by devcgroup_mutex and has had some issues using locking correctly.
Add lockdep asserts to utility functions so that future errors can be
easily detected.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
2012-11-06 12:28:04 -08:00
Tejun Heo 5b805f2a76 Merge branch 'cgroup/for-3.7-fixes' into cgroup/for-3.8
This is to receive device_cgroup fixes so that further device_cgroup
changes can be made in cgroup/for-3.8.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-06 12:26:23 -08:00
Tejun Heo 201e72acb2 device_cgroup: fix RCU usage
dev_cgroup->exceptions is protected with devcgroup_mutex for writes
and RCU for reads; however, RCU usage isn't correct.

* dev_exception_clean() doesn't use RCU variant of list_del() and
  kfree().  The function can race with may_access() and may_access()
  may end up dereferencing already freed memory.  Use list_del_rcu()
  and kfree_rcu() instead.

* may_access() may be called only with RCU read locked but doesn't use
  RCU safe traversal over ->exceptions.  Use list_for_each_entry_rcu().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
2012-11-06 12:25:51 -08:00
Aristeu Rozanski 64e1047713 device_cgroup: fix unchecked cgroup parent usage
In 4cef7299b4 ("device_cgroup: add proper checking when changing
default behavior") the cgroup parent usage is unchecked.  root will not
have a parent and trying to use device.{allow,deny} will cause problems.
For some reason my stressing scripts didn't test the root directory so I
didn't catch it on my regular tests.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-06 07:25:20 -08:00
Tejun Heo 1db1e31b1e Merge branch 'cgroup-rmdir-updates' into cgroup/for-3.8
Pull rmdir updates into for-3.8 so that further callback updates can
be put on top.  This pull created a trivial conflict between the
following two commits.

  8c7f6edbda ("cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them")
  ed95779340 ("cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs")

The former added a field to cgroup_subsys and the latter removed one
from it.  They happen to be colocated causing the conflict.  Keeping
what's added and removing what's removed resolves the conflict.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-05 09:21:51 -08:00
Tejun Heo bcf6de1b91 cgroup: make ->pre_destroy() return void
All ->pre_destory() implementations return 0 now, which is the only
allowed return value.  Make it return void.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
2012-11-05 09:16:59 -08:00
Michal Hocko 9d093cb10e hugetlb: do not fail in hugetlb_cgroup_pre_destroy
Now that pre_destroy callbacks are called from the context where neither
any task can attach the group nor any children group can be added there
is no other way to fail from hugetlb_pre_destroy.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-05 09:16:59 -08:00
Michal Hocko ab5196c202 memcg: make mem_cgroup_reparent_charges non failing
Now that pre_destroy callbacks are called from the context where neither
any task can attach the group nor any children group can be added there
is no other way to fail from mem_cgroup_pre_destroy.
mem_cgroup_pre_destroy doesn't have to take a reference to memcg's css
because all css' are marked dead already.

tj: Remove now unused local variable @cgrp from
    mem_cgroup_reparent_charges().

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-11-05 09:16:59 -08:00
Tejun Heo b25ed609d0 cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir()
CGRP_WAIT_ON_RMDIR is another kludge which was added to make cgroup
destruction rollback somewhat working.  cgroup_rmdir() used to drain
CSS references and CGRP_WAIT_ON_RMDIR and the associated waitqueue and
helpers were used to allow the task performing rmdir to wait for the
next relevant event.

Unfortunately, the wait is visible to controllers too and the
mechanism got exposed to memcg by 887032670d ("cgroup avoid permanent
sleep at rmdir").

Now that the draining and retries are gone, CGRP_WAIT_ON_RMDIR is
unnecessary.  Remove it and all the mechanisms supporting it.  Note
that memcontrol.c changes are essentially revert of 887032670d
("cgroup avoid permanent sleep at rmdir").

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Balbir Singh <bsingharora@gmail.com>
2012-11-05 09:16:59 -08:00
Tejun Heo 1a90dd508b cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()
Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.

  1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.

  2. Release cgroup_mutex and call ->pre_destroy().

  3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
     otherwise.

  4. Continue destroying.

In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.

Now that ->pre_destroy() is no longer allowed to fail.  We can switch
to the following.

  1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.

  2. Deactivate CSS's and mark the cgroup removed thus preventing any
     further operations which can invalidate the verification from #1.

  3. Release cgroup_mutex and call ->pre_destroy().

  4. Re-grab cgroup_mutex and continue destroying.

After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.

This removes the only reason ->pre_destroy() can fail - new task being
attached or child cgroup being created inbetween.  Error out path is
removed and ->pre_destroy() invocation is open coded in
cgroup_rmdir().

v2: cgroup_call_pre_destroy() removal moved to this patch per Michal.
    Commit message updated per Glauber.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-11-05 09:16:59 -08:00
Tejun Heo 976c06bccc cgroup: use cgroup_lock_live_group(parent) in cgroup_create()
This patch makes cgroup_create() fail if @parent is marked removed.
This is to prepare for further updates to cgroup_rmdir() path.

Note that this change isn't strictly necessary.  cgroup can only be
created via mkdir and the removed marking and dentry removal happen
without releasing cgroup_mutex, so cgroup_create() can never race with
cgroup_rmdir().  Even after the scheduled updates to cgroup_rmdir(),
cgroup_mkdir() and cgroup_rmdir() are synchronized by i_mutex
rendering the added liveliness check unnecessary.

Do it anyway such that locking is contained inside cgroup proper and
we don't get nasty surprises if we ever grow another caller of
cgroup_create().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-05 09:16:59 -08:00
Tejun Heo e93160803f cgroup: kill CSS_REMOVED
CSS_REMOVED is one of the several contortions which were necessary to
support css reference draining on cgroup removal.  All css->refcnts
which need draining should be deactivated and verified to equal zero
atomically w.r.t. css_tryget().  If any one isn't zero, all refcnts
needed to be re-activated and css_tryget() shouldn't fail in the
process.

This was achieved by letting css_tryget() busy-loop until either the
refcnt is reactivated (failed removal attempt) or CSS_REMOVED is set
(committing to removal).

Now that css refcnt draining is no longer used, there's no need for
atomic rollback mechanism.  css_tryget() simply can look at the
reference count and fail if it's deactivated - it's never getting
re-activated.

This patch removes CSS_REMOVED and updates __css_tryget() to fail if
the refcnt is deactivated.  As deactivation and removal are a single
step now, they no longer need to be protected against css_tryget()
happening from irq context.  Remove local_irq_disable/enable() from
cgroup_rmdir().

Note that this removes css_is_removed() whose only user is VM_BUG_ON()
in memcontrol.c.  We can replace it with a check on the refcnt but
given that the only use case is a debug assert, I think it's better to
simply unexport it.

v2: Comment updated and explanation on local_irq_disable/enable()
    added per Michal Hocko.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
2012-11-05 09:16:58 -08:00
Tejun Heo ed95779340 cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs
2ef37d3fe4 ("memcg: Simplify mem_cgroup_force_empty_list error
handling") removed the last user of __DEPRECATED_clear_css_refs.  This
patch removes __DEPRECATED_clear_css_refs and mechanisms to support
it.

* Conditionals dependent on __DEPRECATED_clear_css_refs removed.

* cgroup_clear_css_refs() can no longer fail.  All that needs to be
  done are deactivating refcnts, setting CSS_REMOVED and putting the
  base reference on each css.  Remove cgroup_clear_css_refs() and the
  failure path, and open-code the loops into cgroup_rmdir().

This patch keeps the two for_each_subsys() loops separate while open
coding them.  They can be merged now but there are scheduled changes
which need them to be separate, so keep them separate to reduce the
amount of churn.

local_irq_save/restore() from cgroup_clear_css_refs() are replaced
with local_irq_disable/enable() for simplicity.  This is safe as
cgroup_rmdir() is always called with IRQ enabled.  Note that this IRQ
switching is necessary to ensure that css_tryget() isn't called from
IRQ context on the same CPU while lower context is between CSS
deactivation and setting CSS_REMOVED as css_tryget() would hang
forever in such cases waiting for CSS to be re-activated or
CSS_REMOVED set.  This will go away soon.

v2: cgroup_call_pre_destroy() removal dropped per Michal.  Commit
    message updated to explain local_irq_disable/enable() conversion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-11-05 09:16:58 -08:00
Linus Torvalds 3d70f8c617 Linux 3.7-rc4 2012-11-04 11:07:39 -08:00
Linus Torvalds d4164973a0 NFS bugfixes for Linux 3.7
- Fix a bunch of deadlock situations:
   * State recovery can deadlock if we fail to release sequence ids before
     scheduling the recovery thread.
   * Calling deactivate_super() from an RPC workqueue thread can deadlock
     because of the call to rpc_shutdown_client.
 - Display the device name correctly in /proc/*/mounts
 - Fix a number of incorrect error return values:
   * When NFSv3 mounts fail due to a timeout.
   * On NFSv4.1 backchannel setup failure
   * On NFSv4 open access checks
 - pnfs_find_alloc_layout() must check the layout pointer for NULL
 - Fix a regression in the legacy DNS resolved
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQlXRyAAoJEGcL54qWCgDy1G4QALUjERF6VBaDUVRVDe3cyrNr
 1y5niKU19ysaXuNKtsGquVr3y72c0xyzCFLan8ZqJl1kXpYTDIyoDc0NBT3reffo
 T4rbw8SXVFVk+iVcJQgnFo5tblHVsVk+fvZPezBknaBb/uI/kxIbicZHHxvASyzM
 G0Ge/MC709x07tu3wJSyOtvD55j6bfpwbon6yjOKdN8/S2nOEsNOBjAunOWrKtsa
 2v3uWC9APQqIF5z6dfhAVCEMRrQMame6W6b5LuY/MSQpLAwUUYeCGzjc1TMeWvnS
 n66iCqQ0TCVgJH+7fYnrzPm5uNk4Am2XhM3QY1B0j8zl73ZWIs4PfjdoBytLnqF+
 SykNVUl72WqdJnWPCtk6LvoctyOsSMK5pJj4jadvNqDKgFkuRZkWb28bqux2OlHg
 ZCmnEZiew5tV+2anF6sgEaEhSyKNR/1h1ilXOy2TaV93qX+ec8tg+TE6C6YQR/R+
 IoOoYD8Hbe/R/D76WK0xOOQbrXIQvKb1zvlIRx7W6d5gWpg9IceNLcBfCiF1GjPf
 0PKxTbBSbSg44IRolPIprsZx6YzCI6wj2UzdPC7riozfZdSZ+T6Hh7z8rhhTjsHS
 D5emunX93kMp5QSSrw6EtdTkRJ++DB+cpjK5SP7UydHsWSGnKbd8eIv7viZu5J5O
 FLcHHWKIan9wQCslGQkO
 =NjEu
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Fix a bunch of deadlock situations:
   * State recovery can deadlock if we fail to release sequence ids
     before scheduling the recovery thread.
   * Calling deactivate_super() from an RPC workqueue thread can
     deadlock because of the call to rpc_shutdown_client.

 - Display the device name correctly in /proc/*/mounts

 - Fix a number of incorrect error return values:
   * When NFSv3 mounts fail due to a timeout.
   * On NFSv4.1 backchannel setup failure
   * On NFSv4 open access checks

 - pnfs_find_alloc_layout() must check the layout pointer for NULL

 - Fix a regression in the legacy DNS resolved

* tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS4: nfs4_opendata_access should return errno
  NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly
  SUNRPC: return proper errno from backchannel_rqst
  NFS: add nfs_sb_deactive_async to avoid deadlock
  nfs: Show original device name verbatim in /proc/*/mount{s,info}
  nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts
  nfs: Check whether a layout pointer is NULL before free it
  NFS: fix bug in legacy DNS resolver.
  NFSv4: nfs4_locku_done must release the sequence id
  NFSv4.1: We must release the sequence id when we fail to get a session slot
  NFS: Wait for session recovery to finish before returning
2012-11-03 15:27:21 -07:00
Linus Torvalds 225ff868e1 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management & ACPI update from Zhang Rui,

Ho humm.  Normally these things go through Len.  But it's just three
small fixes, I guess I can pull directly too.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  exynos4_tmu_driver_ids should be exynos_tmu_driver_ids.
  ACPI video: Ignore errors after _DOD evaluation.
  thermal: solve compilation errors in rcar_thermal
2012-11-03 15:25:14 -07:00
Linus Torvalds 209c510e36 Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux
Pull i2c embedded fixes from Wolfram Sang:
 "Two patches are usual stuff.

  The bigger patch is needed to correct a wrong decision made in this
  merge window.  We hoped to get the PIOQUEUE mode in the mxs driver
  working with DMA, but it turned out to be too broken (leading to data
  loss), so we now think it is best to remove it entirely and work only
  with DMA now.  The patch should be in 3.7.  IMO, so users never get
  the chance to use both modes in parallel."

* 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
  i2c: tegra: set irq name as device name
  i2c-nomadik: Fixup clock handling
  i2c: mxs: remove broken PIOQUEUE support
2012-11-03 15:14:54 -07:00
Linus Torvalds 53f9313f5c Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Scattered selection of fixes:

   - radeon: load detect fixes from SuSE/AMD
   - intel: misc i830, sdvo regression, vesafb kickoff ums fix
   - exynos: maintainers entry update + fixes
   - udl: fix stride scanout issue

  it's slightly bigger than I'd probably like, but nothing looked
  dangerous enough to hold off on."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/udl: fix stride issues scanning out stride != width*bpp
  drm/radeon: add load detection support for ext DAC on R200 (v2)
  DRM/radeon: For single CRTC GPUs move handling of CRTC_CRT_ON to crtc_dpms().
  DRM/Radeon: Fix TV DAC Load Detection for single CRTC chips.
  DRM/Radeon: Clean up code in TV DAC load detection.
  drm/radeon: fix ATPX function documentation
  drivers/gpu/drm/radeon/evergreen_cs.c: Remove unnecessary semicolon
  DRM/Radeon: On DVI-I use Load Detection when EDID is bogus.
  DRM/Radeon: Fix primary DAC Load Detection for RV100 chips.
  DRM/Radeon: Fix Load Detection on legacy primary DAC.
  drm: exynos: removed warning due to missing typecast for mixer driver data
  drm/exynos: add support for ARCH_MULTIPLATFORM
  MAINTAINERS: Add git repository for Exynos DRM
  drm/exynos: fix display on issue
  drm/i915: Only kick out vesafb if we takeover the fbcon with KMS
  drm/i915: be less verbose about inability to provide vendor backlight
  drm/i915: clear the entire sdvo infoframe buffer
  drm/i915: VGA needs to be on pipe A on i830M
  drm/i915: fix overlay on i830M
2012-11-03 15:13:49 -07:00
Linus Torvalds 0f89a5733a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "First post-Sandy pull request"

 1) Fix antenna gain handling and initialization of chan->max_reg_power
    in wireless, from Felix Fietkau.

 2) Fix nexthop handling in H.232 conntrack helper, from Julian
    Anastasov.

 3) Only process 80211 mesh config header in certain kinds of frames,
    from Javier Cardona.

 4) 80211 management frame header length needs to be validated, from
    Johannes Berg.

 5) Don't access free'd SKBs in ath9k driver, from Felix Fietkay.

 6) Test for permanent state correctly in VXLAN driver, from Stephen
    Hemminger.

 7) BNX2X bug fixes from Yaniv Rosner and Dmitry Kravkov.

 8) Fix off by one errors in bonding, from Nikolay ALeksandrov.

 9) Fix divide by zero in TCP-Illinois congestion control.  From Jesper
    Dangaard Brouer.

10) TCP metrics code says "Yo dawg, I heard you like sizeof, so I did a
    sizeof of a sizeof, so you can size your size" Fix from Julian
    Anastasov.

11) Several drivers do mdiobus_free without first doing an
    mdiobus_unregister leading to stray pointer references.  Fix from
    Peter Senna Tschudin.

12) Fix OOPS in l2tp_eth_create() error path, it's another danling
    pointer kinda situation.  Fix from Tom Parkin.

13) Hardware driven by the vmxnet driver can't handle larger than 16K
    fragments, so split them up when necessary.  From Eric Dumazet.

14) Handle zero length data length in tcp_send_rcvq() properly.  Fix
    from Pavel Emelyanov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  tcp-repair: Handle zero-length data put in rcv queue
  vmxnet3: must split too big fragments
  l2tp: fix oops in l2tp_eth_create() error path
  cxgb4: Fix unable to get UP event from the LLD
  drivers/net/phy/mdio-bitbang.c: Call mdiobus_unregister before mdiobus_free
  drivers/net/ethernet/nxp/lpc_eth.c: Call mdiobus_unregister before mdiobus_free
  bnx2x: fix HW initialization using fw 7.8.x
  tcp: Fix double sizeof in new tcp_metrics code
  net: fix divide by zero in tcp algorithm illinois
  net: sctp: Fix typo in net/sctp
  bonding: fix second off-by-one error
  bonding: fix off-by-one error
  bnx2x: Disable FCoE for 57840 since not yet supported by FW
  bnx2x: Fix no link on 577xx 10G-baseT
  bnx2x: Fix unrecognized SFP+ module after driver is loaded
  bnx2x: Fix potential incorrect link speed provision
  bnx2x: Restore global registers back to default.
  bnx2x: Fix link down in 57712 following LFA
  bnx2x: Fix 57810 1G-KR link against certain switches.
  ixgbe: PTP get_ts_info missing software support
  ...
2012-11-02 20:48:41 -07:00
Pavel Emelyanov c454e6111d tcp-repair: Handle zero-length data put in rcv queue
When sending data into a tcp socket in repair state we should check
for the amount of data being 0 explicitly. Otherwise we'll have an skb
with seq == end_seq in rcv queue, but tcp doesn't expect this to happen
(in particular a warn_on in tcp_recvmsg shoots).

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Reported-by: Giorgos Mavrikas <gmavrikas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 22:01:45 -04:00
Eric Dumazet a4d7e485bc vmxnet3: must split too big fragments
vmxnet3 has a 16Kbytes limit per tx descriptor, that happened to work
as long as we provided PAGE_SIZE fragments.

Our stack can now build larger fragments, so we need to split them to
the 16kbytes boundary.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: jongman heo <jongman.heo@samsung.com>
Tested-by: jongman heo <jongman.heo@samsung.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:58:09 -04:00
Tom Parkin 789336360e l2tp: fix oops in l2tp_eth_create() error path
When creating an L2TPv3 Ethernet session, if register_netdev() should fail for
any reason (for example, automatic naming for "l2tpeth%d" interfaces hits the
32k-interface limit), the netdev is freed in the error path.  However, the
l2tp_eth_sess structure's dev pointer is left uncleared, and this results in
l2tp_eth_delete() then attempting to unregister the same netdev later in the
session teardown.  This results in an oops.

To avoid this, clear the session dev pointer in the error path.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:56:35 -04:00