linux-2.6

dect

Archived

Author	SHA1	Message	Date
Neil Brown	f5a73672d1	NFS: allow close-to-open cache semantics to apply to root of NFS filesystem To obey NFS cache semantics, the client must verify the cached attributes when a file is opened. In most cases this is done by a call to d_validate as one of the last steps in path_walk. However for the root of a filesystem, d_validate is only ever called on the mounted-on filesystem (except when the path ends '.' or '..'). So NFS has no chance to validate the attributes. So, in nfs_opendir, we revalidate the attributes if the opened directory is the mountpoint. This may cause double-validation for "." and ".." lookups, but that is better than missing regular /path/name lookups completely. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-10 10:20:05 -04:00
Kevin Winchester	85c9fe8fca	vfs: fix warning: 'dirent' is used uninitialized in this function Using: gcc (GCC) 4.5.0 20100610 (prerelease) The following warnings appear: fs/readdir.c: In function `filldir64': fs/readdir.c:240:15: warning: `dirent' is used uninitialized in this function fs/readdir.c: In function `filldir': fs/readdir.c:155:15: warning: `dirent' is used uninitialized in this function fs/compat.c: In function `compat_filldir64': fs/compat.c:1071:11: warning: `dirent' is used uninitialized in this function fs/compat.c: In function `compat_filldir': fs/compat.c:984:15: warning: `dirent' is used uninitialized in this function The warnings are related to the use of the NAME_OFFSET() macro. Luckily, it appears as though the standard offsetof() macro is what is being implemented by NAME_OFFSET(), thus we can fix the warning and use a more standard code construct at the same time. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:05 -07:00
Jan Kara	7624ee72aa	mm: avoid resetting wb_start after each writeback round WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write out some huge inode for too long while starving writeout of other inodes. To avoid livelocks, we record time we started writeback in wbc->wb_start and do not write out inodes which were dirtied after this time. But currently, writeback_inodes_wb() resets wb_start each time it is called thus effectively invalidating this logic and making any WB_SYNC_NONE writeback prone to livelocks. This patch makes sure wb_start is set only once when we start writeback. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:03 -07:00
David Rientjes	51b1bd2ace	oom: deprecate oom_adj tunable /proc/pid/oom_adj is now deprecated so that that it may eventually be removed. The target date for removal is August 2012. A warning will be printed to the kernel log if a task attempts to use this interface. Future warning will be suppressed until the kernel is rebooted to prevent spamming the kernel log. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
David Rientjes	a63d83f427	oom: badness heuristic rewrite This a complete rewrite of the oom killer's badness() heuristic which is used to determine which task to kill in oom conditions. The goal is to make it as simple and predictable as possible so the results are better understood and we end up killing the task which will lead to the most memory freeing while still respecting the fine-tuning from userspace. Instead of basing the heuristic on mm->total_vm for each task, the task's rss and swap space is used instead. This is a better indication of the amount of memory that will be freeable if the oom killed task is chosen and subsequently exits. This helps specifically in cases where KDE or GNOME is chosen for oom kill on desktop systems instead of a memory hogging task. The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of "allowable" memory. "Allowable," in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current's cpuset, or a memory controller's limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space. The proportion is always relative to the amount of "allowable" memory and not the total amount of RAM systemwide so that mempolicies and cpusets may operate in isolation; they shall not need to know the true size of the machine on which they are running if they are bound to a specific set of nodes or mems, respectively. Root tasks are given 3% extra memory just like __vm_enough_memory() provides in LSMs. In the event of two tasks consuming similar amounts of memory, it is generally better to save root's task. Because of the change in the badness() heuristic's baseline, it is also necessary to introduce a new user interface to tune it. It's not possible to redefine the meaning of /proc/pid/oom_adj with a new scale since the ABI cannot be changed for backward compatability. Instead, a new tunable, /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may be used to polarize the heuristic such that certain tasks are never considered for oom kill while others may always be considered. The value is added directly into the badness() score so a value of -500, for example, means to discount 50% of its memory consumption in comparison to other tasks either on the system, bound to the mempolicy, in the cpuset, or sharing the same memory controller. /proc/pid/oom_adj is changed so that its meaning is rescaled into the units used by /proc/pid/oom_score_adj, and vice versa. Changing one of these per-task tunables will rescale the value of the other to an equivalent meaning. Although /proc/pid/oom_adj was originally defined as a bitshift on the badness score, it now shares the same linear growth as /proc/pid/oom_score_adj but with different granularity. This is required so the ABI is not broken with userspace applications and allows oom_adj to be deprecated for future removal. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
Andrew Morton	74bcbf4054	oom: move badness() declaration into oom.h Cc: Minchan Kim <minchan.kim@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
KOSAKI Motohiro	26ebc98491	oom: /proc/<pid>/oom_score treat kernel thread honestly If a kernel thread is using use_mm(), badness() returns a positive value. This is not a big issue because caller take care of it correctly. But there is one exception, /proc/<pid>/oom_score calls badness() directly and doesn't care that the task is a regular process. Another example, /proc/1/oom_score return !0 value. But it's unkillable. This incorrectness makes administration a little confusing. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:01 -07:00
Theodore Ts'o	6d0bf00512	ext4: clean up compiler warning in start_this_handle() Fix the compiler warning: fs/jbd2/transaction.c: In function ‘start_this_handle’: fs/jbd2/transaction.c:98: warning: unused variable ‘ts’ Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-09 17:28:38 -04:00
Al Viro	dca332528b	no need for list_for_each_entry_safe()/resetting with superblock list just delay __put_super() a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:49:02 -04:00
Al Viro	7a4dec5389	Fix sget() race with failing mount If sget() finds a matching superblock being set up, it'll grab an active reference to it and grab s_umount. That's fine - we'll wait for completion of foofs_get_sb() that way. However, if said foofs_get_sb() fails we'll end up holding the halfway-created superblock. deactivate_locked_super() called by foofs_get_sb() will just unlock the sucker since we are holding another active reference to it. What we need is a way to tell if superblock has been successfully set up. Unfortunately, neither ->s_root nor the check for MS_ACTIVE quite fit. Cheap and easy way, suitable for backport: new flag set by the (only) caller of ->get_sb(). If that flag isn't present by the time sget() grabbed s_umount on preexisting superblock it has found, it's seeing a stillborn and should just bury it with deactivate_locked_super() (and repeat the search). Longer term we want to set that flag in ->get_sb() instances (and check for it to distinguish between "sget() found us a live sb" and "sget() has allocated an sb, we need to set it up" in there, instead of checking ->s_root as we do now). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org	2010-08-09 16:49:01 -04:00
Tejun Heo	4f331f01b9	vfs: don't hold s_umount over close_bdev_exclusive() call Fix an obscure AB-BA deadlock in get_sb_bdev(). When a superblock is mounted more than once get_sb_bdev() calls close_bdev_exclusive() to drop the extra bdev reference while holding s_umount. However, sb->s_umount nests inside bd_mutex during __invalidate_device() and close_bdev_exclusive() acquires bd_mutex during blkdev_put(); thus creating an AB-BA deadlock. This condition doesn't trigger frequently. For this condition to be visible to lockdep, the filesystem must occupy the whole device (as __invalidate_device() only grabs bd_mutex for the whole device), the FS must be mounted more than once and partition rescan should be issued while the FS is still mounted. Fix it by dropping s_umount over close_bdev_exclusive(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ciprian Docan <docan@eden.rutgers.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:59 -04:00
Artem Bityutskiy	719f2c879f	sysv: do not mark superblock dirty on remount No need to mark the superblock as dirty in sysv_remount, synchronize it instead (only if mounting R/O). I did not find any docs about this file-system, and I have no possibility to test my changes. Thus, this is untested. I see other issues in sysv, e.g., why sysv_sync_fs writes only in the FSTYPE_SYSV4 case? However, it marks its SB bh's dirty for all types, and does not wait for them ever. With zero docs I'm unable to fix this. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:58 -04:00
Artem Bityutskiy	315671f3b8	sysv: do not mark superblock dirty on mount I did not find any docs about this file-system, and I have no possibility to test my changes. Thus, this is untested. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:56 -04:00
Artem Bityutskiy	696ac96c27	btrfs: remove junk sb_dirt change BTRFS does not define a '->write_super()' method, so it should not mark its superblock as dirty. This looks like some left-over. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:55 -04:00
Artem Bityutskiy	4e29d50a28	BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:53 -04:00
Artem Bityutskiy	7435d50611	AFFS: wait for sb synchronization when needed AFFS does not ever wait for superblock synchronization in ->put_super(), ->write_super, and ->sync_fs(). However, it should wait for synchronization in ->put_super() because it is about to be unmounted, in ->write_super() because this is periodic SB synchronization performed from a separate kernel thread, and in ->sync_fs() it should respect the 'wait' flag. This patch fixes the situation. Also, in ->put_super(), do not write the SB if it is not dirty. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:51 -04:00
Artem Bityutskiy	669d5f1f60	AFFS: clean up dirty flag usage In 'affs_write_super()': remove ancient and wrong commented code, remove unneeded 'clean' variable, so the function becomes a bit cleaner and simpler. In 'affs_remount(): remove unnecessary SB dirty flag changes. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:50 -04:00
Christoph Hellwig	1b9474635e	cifs: truncate fallout Remove the calls to inode_newsize_ok given that we already did it as part of inode_change_ok in the beginning of cifs_setattr_(no)unix. No need to call ->truncate if cifs doesn't have one, so remove the explicit call in cifs_vmtruncate, and replace the calls to vmtruncate with truncate_setsize which is vmtruncate minus inode_newsize_ok and the call to ->truncate. Rename cifs_vmtruncate to cifs_setsize to match the new calling conventions. Question 1: why does cifs do the pagecache munging and i_size update twice for each setattr call, once opencoded in cifs_vmtruncate, and once using the VFS helpers? Question 2: what is supposed to be protected by i_lock in cifs_vmtruncate? Do we need it around the call to inode_change_ok? [AV: fixed build breakage] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:48 -04:00
Andreas Gruenbacher	e566d48c9b	mbcache: fix shrinker function return value The shrinker function is supposed to return the number of cache entries after shrinking, not before shrinking. Fix that. Based on a patch from Wang Sheng-Hui <crosslonelyover@gmail.com>. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:47 -04:00
Andreas Gruenbacher	2aec7c5232	mbcache: Remove unused features The mbcache code was written to support a variable number of indexes, but all the existing users use exactly one index. Simplify to code to support only that case. There are also no users of the cache entry free operation, and none of the users keep extra data in cache entries. Remove those features as well. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:45 -04:00
Christoph Hellwig	365b181897	add f_flags to struct statfs(64) Add a flags field to help glibc implementing statvfs(3) efficiently. We copy the flag values from glibc, and add a new ST_VALID flag to denote that f_flags is implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:44 -04:00
Christoph Hellwig	ebabe9a900	pass a struct path to vfs_statfs We'll need the path to implement the flags field for statvfs support. We do have it available in all callers except: - ecryptfs_statfs. This one doesn't actually need vfs_statfs but just needs to do a caller to the lower filesystem statfs method. - sys_ustat. Add a non-exported statfs_by_dentry helper for it which doesn't won't be able to fill out the flags field later on. In addition rename the helpers for statfs vs fstatfs to do_*statfs instead of the misleading vfs prefix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:42 -04:00
Al Viro	b70a3e0702	All filesystems that need invalidate_inode_buffers() are doing that explicitly Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:39 -04:00
Al Viro	b57922d97f	convert remaining ->clear_inode() to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:37 -04:00
Al Viro	45321ac543	Make ->drop_inode() just return whether inode needs to be dropped ... and let iput_final() do the actual eviction or retention Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:35 -04:00
Al Viro	30140837f2	fs/inode.c:clear_inode() is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:34 -04:00
Al Viro	644da5960d	fs/inode.c:evict() doesn't care about delete vs. non-delete paths now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:33 -04:00
Al Viro	07958f9f5b	->delete_inode() is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:31 -04:00
Al Viro	0930fcc1ee	convert ext4 to ->evict_inode() pretty much brute-force... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:30 -04:00
Al Viro	7da08fd17a	convert logfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:28 -04:00
Al Viro	8e22c1a4e4	logfs: get rid of magical inodes ordering problems at ->kill_sb() time are solved by doing iput() of these suckers in ->put_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:26 -04:00
Al Viro	6fd1e5c994	convert nilfs2 to ->evict_inode() [folded build fix from sfr] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:25 -04:00
Al Viro	4ec70c9b46	convert exofs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:24 -04:00
Al Viro	845a2cc050	convert reiserfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:23 -04:00
Al Viro	bd55597520	convert btrfs to ->evict_inode() NB: do we want btrfs_wait_ordered_range() on eviction of inodes with positive i_nlink on subvolume with zero root_refs? If not, btrfs_evict_inode() can be simplified by unconditionally bailing out in case of i_nlink > 0 in the very beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:22 -04:00
Al Viro	d5c1515cf3	switch gfs2 to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:21 -04:00
Al Viro	066d92dcbf	convert ocfs2 to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:21 -04:00
Al Viro	94ee8494ac	switch ncpfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:20 -04:00
Al Viro	3aac2b62e0	switch udf to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:19 -04:00
Al Viro	d640e1b508	switch ubifs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:18 -04:00
Al Viro	62aff86fdf	switch jfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:17 -04:00
Al Viro	ea54400920	switch hpfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:17 -04:00
Al Viro	33b0daaa55	switch hppfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:16 -04:00
Al Viro	f8ad850f11	try to get rid of races in hostfs open() In case of mode mismatch, do not blindly close the descriptor another openers might be using right now. Open the underlying file with currently sufficient mode, then * if current mode has grown so that it's sufficient for us now, just close our new fd * if current mode has grown and our fd is not enough to cover it, close and repeat. * otherwise, install our fd if the file hadn't been opened at all or dup2() our fd over the current one (and close our fd). Critical section is protected by mutex; yes, system-wide. All we do under it is a bunch of comparison and maybe an overwriting dup2() on host. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:15 -04:00
Al Viro	f8d7e1877e	leak in hostfs_unlink() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:14 -04:00
Al Viro	e9193059b1	hostfs: fix races in dentry_name() and inode_name() calculating size, then doing allocation, then filling the path is a Bad Idea(tm), since the ancestors can be renamed, leading to buffer overrun. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:14 -04:00
Al Viro	c103135c14	new helper: __dentry_path() builds path relative to fs root, called under dcache_lock, doesn't append any nonsense to unlinked ones. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:13 -04:00
Al Viro	d0352d3ed7	hostfs: sanitize symlinks Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:12 -04:00
Al Viro	c5322220eb	hostfs: get rid of inode_dentry_name() it's equivalent to dentry_name() anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:11 -04:00
Al Viro	4754b82557	hostfs: get rid of file_type(), fold init_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:10 -04:00
Al Viro	39b743c619	switch stat_file() to passing a single struct rather than fsckloads of pointers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:10 -04:00
Al Viro	5e2df28cc6	hostfs: pass pathname to init_inode() We will calculate it in all callers anyway, so there's no need to duplicate that inside. Moreover, that way we lose all failure exits in init_inode(), so it doesn't need to return anything. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:09 -04:00
Al Viro	52b209f7b8	get rid of hostfs_read_inode() There are only two call sites; in one (hostfs_iget()) it's actually a no-op and in another (fill_super()) it's easier to expand the damn thing and use what we know about its arguments to simplify it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:08 -04:00
Al Viro	601d2c38b9	hostfs: don't keep a field in each inode when we are using it only in root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:07 -04:00
Al Viro	e971a6d7b9	stop icache pollution in hostfs, switch to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:06 -04:00
Al Viro	f053ddde75	switch affs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:06 -04:00
Al Viro	69c9e75017	switch omfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:05 -04:00
Al Viro	9df2f85128	switch bfs to ->evict_inode(), clean up Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:04 -04:00
Al Viro	ac14a95b52	convert ext3 to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:03 -04:00
Al Viro	58e8268c7b	switch ufs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:02 -04:00
Al Viro	deee3ce466	covert fatfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:01 -04:00
Al Viro	d3b4f9ae18	switch smbfs to evict_inode() NB: treatment of inode hash is completely braindead there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:00 -04:00
Al Viro	d299eadc09	switch sysv to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:59 -04:00
Al Viro	72edc4d087	merge ext2 delete_inode and clear_inode, switch to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:57 -04:00
Al Viro	3937871d91	Don't dirty the victim in ext2_xattr_delete_inode() ... it's beyond fs-writeback reach already - writeback won't be started at that point. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:56 -04:00
Al Viro	addacc7d6f	Take dirtying the inode to callers of ext2_free_blocks() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:55 -04:00
Al Viro	3889717d28	ext2: switch to dquot_free_block_nodirty() brute-force conversion Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:54 -04:00
Al Viro	5ccb4a78d8	switch minix to ->evict_inode(), fix write_inode/delete_inode race We need to wait for completion of possible writeback in progress before we clear on-disk inode during deletion. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:53 -04:00
Al Viro	01cd9fef6e	switch sysfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:53 -04:00
Al Viro	8267952b36	switch procfs to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:52 -04:00
Al Viro	77b8a75f5b	simplify get_cramfs_inode() simply don't hash the inodes that don't have real inumber instead of skipping them during iget5_locked(); as the result, simple iget_locked() would do and we can get rid of cramfs ->drop_inode() as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:51 -04:00
Al Viro	b0683aa638	new helper: end_writeback() Essentially, the minimal variant of ->evict_inode(). It's a trimmed-down clear_inode(), sans any fs callbacks. Once it returns we know that no async writeback will be happening; every ->evict_inode() instance should do that once and do that before doing anything ->write_inode() could interfere with (e.g. freeing the on-disk inode). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:49 -04:00
Al Viro	661074e91b	Take ->i_bdev/->i_cdev handling out of clear_inode() All call chains to clear_inode() pass through evict_inode() and clear_inode() should be called by evict_inode() exactly once. So we can pull i_bdev/i_cdev detaching up to evict_inode() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:48 -04:00
Al Viro	c6287315cb	generic_detach_inode() can be static now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:48 -04:00
Al Viro	2bbbda308f	switch hugetlbfs to ->evict_inode() The first spoils - hugetlb can use default ->drop_inode() now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:47 -04:00
Al Viro	be7ce4161f	New method - evict_inode() Hybrid of ->clear_inode() and ->delete_inode(); if present, does all fs work to be done when in-core inode is about to be gone, for whatever reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:46 -04:00
Al Viro	b4272d4c81	unify fs/inode.c callers of clear_inode() For now, just a straightforward merge Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:45 -04:00
Al Viro	a4ffdde6e5	simplify checks for I_CLEAR/I_FREEING add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING \| I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:44 -04:00
Al Viro	b5fc510c48	get rid of file_fsync() Copy and simplify in the only two users remaining. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:43 -04:00
Christoph Hellwig	fa9b227e90	xfs: new truncate sequence Convert XFS to the new truncate sequence. We still can have errors after updating the file size in xfs_setattr, but these are real I/O errors and lead to a transaction abort and filesystem shutdown, so they are not an issue. Errors from ->write_begin and write_end can now be handled correctly because we can actually get rid of the delalloc extents while previous the buffer state was stipped in block_invalidatepage. There is still no error handling for ->direct_IO, because doing so will need some major restructuring given that we only have the iolock shared and do not hold i_mutex at all. Fortunately leaving the normally allocated blocks behind there is not a major issue and this will get cleaned up by xfs_free_eofblock later. Note: the patch is against Al's vfs.git tree as that contains the nessecary preparations. I'd prefer to get it applied there so that we can get some testing in linux-next. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:42 -04:00
Boaz Harrosh	2f246fd0f1	exofs: New truncate sequence These changes are crafted based on the similar conversion done to ext2 by Nick Piggin. * Remove the deprecated ->truncate vector. Let exofs_setattr take care of on-disk size updates. * Call truncate_pagecache on the unused pages if write_begin/end fails. * Cleanup exofs_delete_inode that did stupid inode writes and updates on an inode that will be removed. * And finally get rid of exofs_get_block. We never had any blocks it was all for calling nobh_truncate_page. nobh_truncate_page is not actually needed in exofs since the last page is complete and gone, just like all the other pages. There is no partial blocks in exofs. I've tested with this patch, and there are no apparent failures, so far. CC: Nick Piggin <npiggin@suse.de> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:41 -04:00
Al Viro	41cce647f8	jffs2: don't open-code iget_failed() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:41 -04:00
Christoph Hellwig	2c27c65ed0	check ATTR_SIZE contraints in inode_change_ok Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:39 -04:00
Christoph Hellwig	db78b877f7	always call inode_change_ok early in ->setattr Make sure we call inode_change_ok before doing any changes in ->setattr, and make sure to call it even if our fs wants to ignore normal UNIX permissions, but use the ATTR_FORCE to skip those. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:38 -04:00
Christoph Hellwig	1025774ce4	remove inode_setattr Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:37 -04:00
Christoph Hellwig	eef2380c18	default to simple_setattr With the new truncate sequence every filesystem that wants to support file size changes on disk needs to implement its own ->setattr. So instead of calling inode_setattr which supports size changes call into a simple method that doesn't support this. simple_setattr is almost what we want except that it does not mark the inode dirty after changes. Given that marking the inode dirty is a no-op for the simple in-memory filesystems that use simple_setattr currently just add the mark_inode_dirty call. Also add a WARN_ON for the presence of a truncate method to simple_setattr to catch new instances of it during the transition period. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:36 -04:00
Christoph Hellwig	6a1a90ad1b	rename generic_setattr Despite its name it's now a generic implementation of ->setattr, but rather a helper to copy attributes from a struct iattr to the inode. Rename it to setattr_copy to reflect this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:35 -04:00
Christoph Hellwig	d39aae9ec4	add missing setattr methods For the new truncate sequence every filesystem that wants to truncate on-disk state needs a seattr method. Convert the remaining filesystems that implement the truncate inode operation to have its own setattr method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:34 -04:00
Christoph Hellwig	155130a4f7	get rid of block_write_begin_newtrunc Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:33 -04:00
Christoph Hellwig	6e1db88d53	introduce __block_write_begin Split up the block_write_begin implementation - __block_write_begin is a new trivial wrapper for block_prepare_write that always takes an already allocated page and can be either called from block_write_begin or filesystem code that already has a page allocated. Remove the handling of already allocated pages from block_write_begin after switching all callers that do it to __block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:32 -04:00
Christoph Hellwig	f4e420dc42	clean up write_begin usage for directories in pagecache For filesystem that implement directories in pagecache we call block_write_begin with an already allocated page for this code, while the normal regular file write path uses the default block_write_begin behaviour. Get rid of the __foofs_write_begin helper and opencode the normal write_begin call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for the directory code. The added benefit is that foofs_prepare_chunk has a much saner calling convention. Note that the interruptible flag passed into block_write_begin is always ignored if we already pass in a page (see next patch for details), and we never were doing truncations of exessive blocks for this case either so we can switch directly to block_write_begin_newtrunc. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:31 -04:00
Christoph Hellwig	282dc17884	get rid of cont_write_begin_newtrunc Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to cont_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:31 -04:00
Christoph Hellwig	ea0f04e595	get rid of nobh_write_begin_newtrunc Move the call to vmtruncate to get rid of accessive blocks to the only remaining caller and rename the non-truncating version to nobh_write_begin. Get rid of the superflous file argument to it while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:30 -04:00
Christoph Hellwig	eafdc7d190	sort out blockdev_direct_IO variants Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:29 -04:00
Al Viro	256249584b	fix leak in __logfs_create() if kmalloc fails, we still need to drop the inode, as we do on other failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:28 -04:00
Al Viro	0e4f6a791b	Fix reiserfs_file_release() a) count file openers correctly; i_count use was completely wrong b) use new mutex for exclusion between final close/open/truncate, to protect tailpacking logics. i_mutex use was wrong and resulted in deadlocks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:27 -04:00
Al Viro	918377b696	missing include in hppfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:26 -04:00
Al Viro	005a59ec74	Deal with missing exports for hostfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:25 -04:00
Lino Sanfilippo	21edad3220	ecryptfs: dont call lookup_one_len to avoid NULL nameidata I have encountered the same problem that Eric Sandeen described in this post http://lkml.org/lkml/fancy/2010/4/23/467 while experimenting with stackable filesystems. The reason seems to be that ecryptfs calls lookup_one_len() to get the lower dentry, which in turn calls the lower parent dirs d_revalidate() with a NULL nameidata object. If ecryptfs is the underlaying filesystem, the NULL pointer dereference occurs, since ecryptfs is not prepared to handle a NULL nameidata. I know that this cant happen any more, since it is no longer allowed to mount ecryptfs upon itself. But maybe this patch it useful nevertheless, since the problem would still apply for an underlaying filesystem that implements d_revalidate() and is not prepared to handle a NULL nameidata (I dont know if there actually is such a fs). With this patch (against 2.6.35-rc5) ecryptfs uses the vfs_lookup_path() function instead of lookup_one_len() which ensures that the nameidata passed to the lower filesystems d_revalidate(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-08-09 13:42:12 -05:00
Julia Lawall	ceeab92971	fs/ecryptfs/file.c: introduce missing free The comments in the code indicate that file_info should be released if the function fails. This releasing is done at the label out_free, not out. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,f1,l; position p1,p2; expression ptr != NULL; @@ x@p1 = kmem_cache_zalloc(...); ... if (x == NULL) S <... when != x when != if (...) { <+...x...+> } ( x->f1 = E \| (x->f1 == NULL \|\| ...) \| f(...,x->f1,...) ) ...> ( return <+...x...+>; \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmem_cache_zalloc %s" % (p1[0].file,p1[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-08-09 13:25:24 -05:00
Lino Sanfilippo	31f73bee3e	ecryptfs: release reference to lower mount if interpose fails In ecryptfs_lookup_and_interpose_lower() the lower mount is not decremented if allocation of a dentry info struct failed. As a result the lower filesystem cant be unmounted any more (since it is considered busy). This patch corrects the reference counting. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-08-09 10:33:05 -05:00
Tyler Hicks	c43f7b8fb0	eCryptfs: Handle ioctl calls with unlocked and compat functions Lower filesystems that only implemented unlocked_ioctl weren't being passed ioctl calls because eCryptfs only checked for lower_file->f_op->ioctl and returned -ENOTTY if it was NULL. eCryptfs shouldn't implement ioctl(), since it doesn't require the BKL. This patch introduces ecryptfs_unlocked_ioctl() and ecryptfs_compat_ioctl(), which passes the calls on to the lower file system. https://bugs.launchpad.net/ecryptfs/+bug/469664 Reported-by: James Dupin <james.dupin@gmail.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-08-09 10:33:04 -05:00
Prarit Bhargava	a1275c3b21	ecryptfs: Fix warning in ecryptfs_process_response() Fix warning seen with "make -j24 CONFIG_DEBUG_SECTION_MISMATCH=y V=1": fs/ecryptfs/messaging.c: In function 'ecryptfs_process_response': fs/ecryptfs/messaging.c:276: warning: 'daemon' may be used uninitialized in this function Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-08-09 10:33:03 -05:00
Frederic Weisbecker	d9a145fb6e	Merge commit 'linus/master' into bkl/core Merge reason: The staging tree has introduced the easycap driver lately. We need the latest updates to pushdown the bkl in its ioctl helper.	2010-08-09 02:14:15 +02:00
Phillip Lougher	66048c381b	Squashfs: fix checkpatch.pl warnings Checkpatch.pl in 2.6.34 added a check for spaces between tabs. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-08-08 22:29:33 +00:00
Arnd Bergmann	c9243f5bdd	autofs/autofs4: Move compat_ioctl handling into fs Handling of autofs ioctl numbers does not need to be generic and can easily be done directly in autofs itself. This also pushes the BKL into autofs and autofs4 ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Cc: Autofs <autofs@linux.kernel.org> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-08-09 00:13:34 +02:00
David Woodhouse	6ae0185fe2	mtd: Remove obsolete <mtd/compatmac.h> include Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2010-08-08 21:19:42 +01:00
Bill Pemberton	ffc1887990	omfs: fix uninitialized variable warning quiet the warning: fs/omfs/file.c: In function 'omfs_get_block': fs/omfs/file.c:225: warning: 'new_block' may be used uninitialized in this function new_block is used properly by the call to omfs_grow_extent() Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Bob Copeland <me@bobcopeland.com>	2010-08-08 12:02:05 -04:00
David Woodhouse	6088c05877	jffs2: Update copyright notices Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2010-08-08 14:15:22 +01:00
Linus Torvalds	78417334b5	Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: do_coredump: Do not take BKL init: Remove the BKL from startup code	2010-08-07 17:06:54 -07:00
Linus Torvalds	0d9f9e122c	Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux * 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd4: fix file open accounting for RDWR opens nfsd: don't allow setting maxblksize after svc created nfsd: initialize nfsd versions before creating svc net: sunrpc: removed duplicated #include nfsd41: Fix a crash when a callback is retried nfsd: fix startup/shutdown order bug nfsd: minor nfsd read api cleanup gcc-4.6: nfsd: fix initialized but not read warnings nfsd4: share file descriptors between stateid's nfsd4: fix openmode checking on IO using lock stateid nfsd4: miscellaneous process_open2 cleanup nfsd4: don't pretend to support write delegations nfsd: bypass readahead cache when have struct file nfsd: minor nfsd_svc() cleanup nfsd: move more into nfsd_startup() nfsd: just keep single lockd reference for nfsd nfsd: clean up nfsd_create_serv error handling nfsd: fix error handling in __write_ports_addxprt nfsd: fix error handling when starting nfsd with rpcbind down nfsd4: fix v4 state shutdown error paths ...	2010-08-07 14:24:41 -07:00
David Howells	df44f9f4f9	AFS: Fix the module init error handling Fix the module init error handling. There are a bunch of goto labels for aborting the init procedure at different points and just undoing what needs undoing - they aren't all in the right places, however. This can lead to an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff81042a31>] destroy_workqueue+0x17/0xc0 ... Modules linked in: kafs(+) dns_resolver rxkad af_rxrpc fscache Pid: 2171, comm: insmod Not tainted 2.6.35-cachefs+ #319 DG965RY/ ... Process insmod (pid: 2171, threadinfo ffff88003ca6a000, task ffff88003dcc3050) ... Call Trace: [<ffffffffa0055994>] afs_callback_update_kill+0x10/0x12 [kafs] [<ffffffffa007d1c5>] afs_init+0x190/0x1ce [kafs] [<ffffffffa007d035>] ? afs_init+0x0/0x1ce [kafs] [<ffffffff810001ef>] do_one_initcall+0x59/0x14e [<ffffffff8105f7ee>] sys_init_module+0x9c/0x1de [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-07 14:23:37 -07:00
Linus Torvalds	5df6b8e65a	Merge branch 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits) NFS: NFSv4.1 is no longer a "developer only" feature NFS: NFS_V4 is no longer an EXPERIMENTAL feature NFS: Fix /proc/mount for legacy binary interface NFS: Fix the locking in nfs4_callback_getattr SUNRPC: Defer deleting the security context until gss_do_free_ctx() SUNRPC: prevent task_cleanup running on freed xprt SUNRPC: Reduce asynchronous RPC task stack usage SUNRPC: Move the bound cred to struct rpc_rqst SUNRPC: Clean up of rpc_bindcred() SUNRPC: Move remaining RPC client related task initialisation into clnt.c SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task SUNRPC: Make the credential cache hashtable size configurable SUNRPC: Store the hashtable size in struct rpc_cred_cache NFS: Ensure the AUTH_UNIX credcache is allocated dynamically NFS: Fix the NFS users of rpc_restart_call() SUNRPC: The function rpc_restart_call() should return success/failure NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks NFSv4: Clean up the process of renewing the NFSv4 lease NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly NFS: nfs_rename() should not have to flush out writebacks ...	2010-08-07 13:19:36 -07:00
Linus Torvalds	fe21ea18c7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add retrieve request fuse: add store request fuse: don't use atomic kmap	2010-08-07 13:18:36 -07:00
Linus Torvalds	a57f9a3e81	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits) nilfs2: reject filesystem with unsupported block size nilfs2: avoid rec_len overflow with 64KB block size nilfs2: simplify nilfs_get_page function nilfs2: reject incompatible filesystem nilfs2: add feature set fields to super block nilfs2: clarify byte offset in super block format nilfs2: apply read-ahead for nilfs_btree_lookup_contig nilfs2: introduce check flag to btree node buffer nilfs2: add btree get block function with readahead option nilfs2: add read ahead mode to nilfs_btnode_submit_block nilfs2: fix buffer head leak in nilfs_btnode_submit_block nilfs2: eliminate inline keywords in btree implementation nilfs2: get maximum number of child nodes from bmap object nilfs2: reduce repetitive calculation of max number of child nodes nilfs2: optimize calculation of min/max number of btree node children nilfs2: remove redundant pointer checks in bmap lookup functions nilfs2: get rid of nilfs_bmap_union nilfs2: unify bmap set_target_v operations nilfs2: get rid of nilfs_btree uses nilfs2: get rid of nilfs_direct uses ...	2010-08-07 13:10:55 -07:00
Linus Torvalds	09dc942c2a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Adding error check after calling ext4_mb_regular_allocator() ext4: Fix dirtying of journalled buffers in data=journal mode ext4: re-inline ext4_rec_len_(to\|from)_disk functions jbd2: Remove t_handle_lock from start_this_handle() jbd2: Change j_state_lock to be a rwlock_t jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop ext4: Add mount options in superblock ext4: force block allocation on quota_off ext4: fix freeze deadlock under IO ext4: drop inode from orphan list if ext4_delete_inode() fails ext4: check to make make sure bd_dev is set before dereferencing it jbd2: Make barrier messages less scary ext4: don't print scary messages for allocation failures post-abort ext4: fix EFBIG edge case when writing to large non-extent file ext4: fix ext4_get_blocks references ext4: Always journal quota file modifications ext4: Fix potential memory leak in ext4_fill_super ext4: Don't error out the fs if the user tries to make a file too big ext4: allocate stripe-multiple IOs on stripe boundaries ext4: move aio completion after unwritten extent conversion ... Fix up conflicts in fs/ext4/inode.c as per Ted. Fix up xfs conflicts as per earlier xfs merge.	2010-08-07 13:03:53 -07:00
Linus Torvalds	90e0c22596	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Fix dirtying of journalled buffers in data=journal mode ext3: default to ordered mode quota: Use mark_inode_dirty_sync instead of mark_inode_dirty quota: Change quota error message to print out disk and function name MAINTAINERS: Update entries of ext2 and ext3 MAINTAINERS: Update address of Andreas Dilger ext3: Avoid filesystem corruption after a crash under heavy delete load ext3: remove vestiges of nobh support ext3: Fix set but unused variables quota: clean up quota active checks quota: Clean up the namespace in dqblk_xfs.h quota: check quota reservation on remove_dquot_ref	2010-08-07 12:57:07 -07:00
Linus Torvalds	938a73b959	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: fs/dlm: Drop unnecessary null test dlm: use genl_register_family_with_ops()	2010-08-07 12:56:48 -07:00
Linus Torvalds	45480aa75b	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: super.c Fix warning: variable 'sbi' set but not used udf: remove duplicated #include	2010-08-07 12:56:27 -07:00
Linus Torvalds	1fc7995d19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [DNS RESOLVER] Minor typo correction DNS: Fixes for the DNS query module cifs: Include linux/err.h for IS_ERR and PTR_ERR DNS: Make AFS go to the DNS for AFSDB records for unknown cells DNS: Separate out CIFS DNS Resolver code cifs: account for new creduid=0x%x parameter in spnego upcall string cifs: reduce false positives with inode aliasing serverino autodisable CIFS: Make cifs_convert_address() take a const src pointer and a length cifs: show features compiled in as part of DebugData cifs: update README Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes	2010-08-07 12:54:46 -07:00
Linus Torvalds	3b7433b8a8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/ as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c	2010-08-07 12:42:58 -07:00
Tristan Ye	415cf32c9c	O2net: Disallow o2net accept connection request from itself. Currently, o2net_accept_one() is allowed to accept a connection from listening node itself, such a fake connection will not be successfully established due to no handshake detected afterwards, and later end up with triggering connecting worker in a loop. We're going to fix this by treating such connection request as 'invalid', since we've got no chance of requesting connection from a node to itself in a OCFS2 cluster. The fix doesn't hurt user's scan for o2net-listener, it always gets a successful connection from userpace. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:50:33 -07:00
Wengang Wang	b11f1f1ab7	ocfs2/dlm: remove potential deadlock -V3 When we need to take both dlm_domain_lock and dlm->spinlock, we should take them in order of: dlm_domain_lock then dlm->spinlock. There is pathes disobey this order. That is calling dlm_lockres_put() with dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at the ref and dlm_put() locks on dlm_domain_lock. Fix: Don't grab/put the dlm when the initialising/releasing lockres. That grab is not required because we don't call dlm_unregister_domain() based on refcount. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:50:30 -07:00
Wengang Wang	a524812b7e	ocfs2/dlm: avoid incorrect bit set in refmap on recovery master In the following situation, there remains an incorrect bit in refmap on the recovery master. Finally the recovery master will fail at purging the lockres due to the incorrect bit in refmap. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. For mormal, re-send the deref message to it to recovery master can fix it. Well, ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And the later is simpler. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:49:41 -07:00
Jiaju Zhang	845b6cf341	Fix the nested PR lock calling issue in ACL Hi, Thanks a lot for all the review and comments so far;) I'd like to send the improved (V4) version of this patch. This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2 and Samba integration using scenario, the symptom is several smbd processes will be hung under heavy workload. Finally we found out it is the nested PR lock calling that leads to this deadlock: node1 node2 gr PR \| V PR(EX)---> BAST:OCFS2_LOCK_BLOCKED \| V rq PR \| V wait=1 After requesting the 2nd PR lock, the process "smbd" went into D state. It can only be woken up when the 1st PR lock's RO holder equals zero. There should be an ocfs2_inode_unlock in the calling path later on, which can decrement the RO holder. But since it has been in uninterruptible sleep, the unlock function has no chance to be called. The related stack trace is: smbd D ffff8800013d0600 0 9522 5608 0x00000000 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340 Call Trace: [<ffffffff80350425>] schedule_timeout+0x175/0x210 [<ffffffff8034f580>] wait_for_common+0xf0/0x210 [<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2] [<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2] [<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2] [<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2] [<ffffffff800e3507>] acl_permission_check+0x57/0xb0 [<ffffffff800e357d>] generic_permission+0x1d/0xc0 [<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2] [<ffffffff800e3f65>] inode_permission+0x45/0x100 [<ffffffff800d86b3>] sys_chdir+0x53/0x90 [<ffffffff80007458>] system_call_fastpath+0x16/0x1b [<00007f34a4ef6927>] 0x7f34a4ef6927 For details, please see: https://bugzilla.novell.com/show_bug.cgi?id=614332 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278 Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:46:46 -07:00
Tao Ma	8a2e70c40f	ocfs2: Count more refcount records in file system fragmentation. The refcount record calculation in ocfs2_calc_refcount_meta_credits is too optimistic that we can always allocate contiguous clusters and handle an already existed refcount rec as a whole. Actually because of file system fragmentation, we may have the chance to split a refcount record into 3 parts during the transaction. So consider the worst case in record calculation. Cc: stable@kernel.org Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:44:49 -07:00
Srinivas Eeda	7beaf24378	ocfs2 fix o2dlm dlm run purgelist (rev 3) This patch fixes two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:44:40 -07:00
Wengang Wang	6d98c3ccb5	ocfs2/dlm: fix a dead lock When we have to take both dlm->master_lock and lockres->spinlock, take them in order lockres->spinlock and then dlm->master_lock. The patch fixes a violation of the rule. We can simply move taking dlm->master_lock to where we have dropped res->spinlock since when we access res->state and free mle memory we don't need master_lock's protection. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:44:20 -07:00
Tiger Yang	6eda3dd33f	ocfs2: do not overwrite error codes in ocfs2_init_acl Setting the acl while creating a new inode depends on the error codes of posix_acl_create_masq. This patch fix a issue of overwriting the error codes of it. Reported-by: Pawel Zawora <pzawora@gmail.com> Cc: <stable@kernel.org> [ .33, .34 ] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-08-07 10:43:25 -07:00
Artem Bityutskiy	6467716a37	writeback: optimize periodic bdi thread wakeups Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which should take care of the periodic background write-out. However, the write-out will actually start only 'dirty_writeback_interval' centisecs later, so we can delay the wake-up. This change was requested by Nick Piggin who pointed out that if we delay the wake-up, we weed out 2 unnecessary contex switches, which matters because '__mark_inode_dirty()' is a hot-path function. This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which sets up a timer to wake-up the bdi thread and returns. So the wake-up is delayed. We also delete the timer in bdi threads just before writing-back. And synchronously delete it when unregistering bdi. At the unregister point the bdi does not have any users, so no one can arm it again. Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes this change as well. This patch also moves the 'bdi_wb_init()' function down in the file to avoid forward-declaration of 'bdi_wakeup_thread_delayed()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:56 +02:00
Artem Bityutskiy	253c34e9b1	writeback: prevent unnecessary bdi threads wakeups Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very bad for battery-driven devices. There are two types of activities bdi threads do: 1. process bdi works from the 'bdi->work_list' 2. periodic write-back So there are 2 sources of wake-up events for bdi threads: 1. 'bdi_queue_work()' - submits bdi works 2. '__mark_inode_dirty()' - adds dirty I/O to bdi's The former already has bdi wake-up code. The latter does not, and this patch adds it. '__mark_inode_dirty()' is hot-path function, but this patch adds another 'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when the bdi has no dirty inodes. So adding this spinlock should be fine and should not affect performance. This patch makes sure bdi threads and the forker thread do not wake-up if there is nothing to do. The forker thread will nevertheless wake up at least every 5 min. to check whether it has to kill a bdi thread. This can also be optimized, but is not worth it. This patch also tidies up the warning about unregistered bid, and turns it from an ugly crocodile to a simple 'WARN()' statement. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:56 +02:00
Artem Bityutskiy	fff5b85aa4	writeback: move bdi threads exiting logic to the forker thread Currently, bdi threads can decide to exit if there were no useful activities for 5 minutes. However, this causes nasty races: we can easily oops in the 'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up. And even if we do not oops, but the bdi tread exits immediately after we wake it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs) in the bdi work processing. This patch makes the forker thread to be the central place which not only creates bdi threads, but also kills them if they were inactive long enough. This better design-wise. Another reason why this change was done is to prepare for the further changes which will prevent the bdi threads from waking up every 5 sec and wasting power. Indeed, when the task does not wake up periodically anymore, it won't be able to exit either. This patch also moves the the 'wake_up_bit()' call from the bdi thread to the forker thread as well. So now the forker thread sets the BDI_pending bit, then forks the task or kills it, then clears the bit and wakes up the waiting process. The only process which may wain on the bit is 'bdi_wb_shutdown()'. This function was changed as well - now it first removes the bdi from the 'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is guaranteed that the forker thread won't race with it, because the bdi is not visible. Note, the forker thread sets the 'BDI_pending' bit under the 'bdi->wb_lock' which is essential for proper serialization. And additionally, when we change 'bdi->wb.task', we now take the 'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise would when raced with, say, 'bdi_queue_work()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:56 +02:00
Artem Bityutskiy	ecd584030d	writeback: move last_active to bdi Currently bdi threads use local variable 'last_active' which stores last time when the bdi thread did some useful work. Move this local variable to 'struct bdi_writeback'. This is just a preparation for the further patches which will make the forker thread decide when bdi threads should be killed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:56 +02:00
Artem Bityutskiy	78c40cb658	writeback: do not remove bdi from bdi_list The forker thread removes bdis from 'bdi_list' before forking the bdi thread. But this is wrong for at least 2 reasons. Reason #1: if we temporary remove a bdi from the list, we may miss works which would otherwise be given to us. Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when it races with the forker thread, it can shut down the bdi thread at the same time as the forker creates it. This patch makes sure the forker thread never removes bdis from 'bdi_list' (which was suggested by Christoph Hellwig). In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending' flag. NOTE! The error path is interesting. Currently, when we fail to create a bdi thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the bdi from the list, we cannot move it to the tail either, because then we can mess up the RCU readers which walk the list. And also, we'll have the race described above in "Reason #2". But I not think that adding to the tail is any important so I just do not do that. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:56 +02:00
Artem Bityutskiy	297252c81d	writeback: do not lose wake-ups in bdi threads Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For example, if 'bdi_queue_work()' is executed after the bdi thread have had finished 'wb_do_writeback()' but before it called 'schedule_timeout_interruptible()'. To fix this issue, we have to check whether we have works to process after we have changed the task state to 'TASK_INTERRUPTIBLE'. This patch also clean-ups handling of the cases when 'dirty_writeback_interval' is zero or non-zero. Additionally, this patch also removes unneeded 'list_empty_careful()' call. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:55 +02:00
Artem Bityutskiy	6f904ff0e3	writeback: harmonize writeback threads naming The write-back code mixes words "thread" and "task" for the same things. This is not a big deal, but still an inconsistency. hch: a convention I tend to use and I've seen in various places is to always use _task for the storage of the task_struct pointer, and thread everywhere else. This especially helps with having foo_thread for the actual thread and foo_task for a global variable keeping the task_struct pointer This patch renames: * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()' * 'bdi_forker_task()' -> 'bdi_forker_thread()' because bdi threads are 'bdi_writeback_thread()', so these names are more consistent. This patch also amends commentaries and makes them refer the forker and bdi threads as "thread", not "task". Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use 'static void' instead of 'void static' and make checkpatch.pl happy. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:16 +02:00
Jens Axboe	4aeefdc69f	coda: fixup clash with block layer REQ_* defines CODA should not be using defines in the global name space of that nature, prefix them with CODA_. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:13 +02:00
Minchan Kim	08852b6d6c	writeback: remove wb in get_next_work_item `83ba7b07` cleans up the writeback. So we don't use wb any more in get_next_work_item. Let's remove unnecessary argument. CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:53:01 +02:00
Miklos Szeredi	6965031d33	splice: fix misuse of SPLICE_F_NONBLOCK SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the pipe. In __generic_file_splice_read(), however, it causes an EAGAIN if the page is currently being read. This makes it impossible to write an application that only wants failure if the pipe is full. For example if the same process is handling both ends of a pipe and isn't otherwise able to determine whether a splice to the pipe will fill it or not. We could make the read non-blocking on O_NONBLOCK or some other splice flag, but for now this is the simplest fix. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:52:56 +02:00
Arnd Bergmann	6e9624b8ca	block: push down BKL into .open and .release The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:25:34 +02:00
Dave Chinner	028c2dd184	writeback: Add tracing to balance_dirty_pages Tracing high level background writeback events is good, but it doesn't give the entire picture. Add visibility into write throttling to catch IO dispatched by foreground throttling of processing dirtying lots of pages. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:24:25 +02:00
Dave Chinner	455b286468	writeback: Initial tracing support Trace queue/sched/exec parts of the writeback loop. This provides insight into when and why flusher threads are scheduled to run. e.g a sync invocation leaves traces like: sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 This also lays the foundation for adding more writeback tracing to provide deeper insight into the whole writeback path. The original tracing code is from Jens Axboe, though this version is a rewrite as a result of the code being traced changing significantly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:24:23 +02:00
Andi Kleen	1676effca4	gcc-4.6: fs: fix unused but set warnings No real bugs I believe, just some dead code, and some shut up code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:23:12 +02:00
Christoph Hellwig	082439004b	writeback: merge bdi_writeback_task and bdi_start_fn Move all code for the writeback thread into fs/fs-writeback.c instead of splitting it over two functions in two files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:23:06 +02:00
Christoph Hellwig	c1955ce32f	writeback: remove wb_list The wb_list member of struct backing_device_info always has exactly one element. Just use the direct bdi->wb pointer instead and simplify some code. Also remove bdi_task_init which is now trivial to prepare for the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:23:03 +02:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Christoph Hellwig	41f2df6289	block: BARRIER request should imply SYNC A barrier request should by defintion have priority in get_request and let the queue be unplugged immediately as it's blocking all forward progress due to the queue draining. Most filesystems already get this implicitly by the way how submit_bh treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs and XFS are still forgetting to set the flag, as is blkdev_issue_flush and some places in DM/MD. For XFS on metadata heavy workloads this gives a consistent speedup in the 2-3% range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:15:44 +02:00
J. Bruce Fields	998db52c03	nfsd4: fix file open accounting for RDWR opens Commit `f9d7562fdb` "nfsd4: share file descriptors between stateid's" didn't correctly account for O_RDWR opens. Symptoms include leaked files, resulting in failures to unmount and/or warnings about orphaned inodes on reboot. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-08-07 09:50:05 -04:00
J. Bruce Fields	7fa53cc872	nfsd: don't allow setting maxblksize after svc created It's harmless to set this after the server is created, but also ineffective, since the value is only used at the time of svc_create_pooled(). So fail the attempt, in keeping with the pattern set by write_versions, write_{lease,grace}time and write_recoverydir. (This could break userspace that tried to write to nfsd/max_block_size between setting up sockets and starting the server. However, such code wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd in nfs-utils, probably the only user of the interface, doesn't do that.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-08-06 18:00:33 -04:00
J. Bruce Fields	e844a7b980	nfsd: initialize nfsd versions before creating svc Commit `59db4a0c10` "nfsd: move more into nfsd_startup()" inadvertently moved nfsd_versions after nfsd_create_svc(). On older distributions using an rpc.nfsd that does not explicitly set the list of nfsd versions, this results in svc-create_pooled() being called with an empty versions array. The resulting incomplete initialization leads to a NULL dereference in svc_process_common() the first time a client accesses the server. Move nfsd_reset_versions() back before the svc_create_pooled(); this time, put it closer to the svc_create_pooled() call, to make this mistake more difficult in the future. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-08-06 17:05:40 -04:00
Boaz Harrosh	c18c821fd4	nfsd41: Fix a crash when a callback is retried If a callback is retried at nfsd4_cb_recall_done() due to some error, the returned rpc reply crashes here: @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream xdr, struct nfsd4_cb_sequence res, u32 dummy; __be32 *p; + BUG_ON(!res); if (res->cbs_minorversion == 0) return 0; [BUG_ON added for demonstration] This is because the nfsd4_cb_done_sequence() has NULLed out the task->tk_msg.rpc_resp pointer. Also eventually the rpc would use the new slot without making sure it is free by calling nfsd41_cb_setup_sequence(). This problem was introduced by a 4.1 protocol addition patch: [`0421b5c5`] nfsd41: Backchannel: Implement cb_recall over NFSv4.1 Which was overlooking the possibility of an RPC callback retries. For not-4.1 case redoing the _prepare is harmless. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-08-06 17:05:39 -04:00
J. Bruce Fields	774f8bbd9e	nfsd: fix startup/shutdown order bug We must create the server before we can call init_socks or check the number of threads. Symptoms were a NULL pointer dereference in nfsd_svc(). Problem identified by Jeff Layton. Also fix a minor cleanup-on-error case in nfsd_startup(). Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-08-06 17:05:30 -04:00
Linus Torvalds	ab69bcd66f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (28 commits) driver core: device_rename's new_name can be const sysfs: Remove owner field from sysfs struct attribute powerpc/pci: Remove owner field from attribute initialization in PCI bridge init regulator: Remove owner field from attribute initialization in regulator core driver leds: Remove owner field from attribute initialization in bd2802 driver scsi: Remove owner field from attribute initialization in ARCMSR driver scsi: Remove owner field from attribute initialization in LPFC driver cgroupfs: create /sys/fs/cgroup to mount cgroupfs on Driver core: Add BUS_NOTIFY_BIND_DRIVER driver core: fix memory leak on one error path in bus_register() debugfs: no longer needs to depend on SYSFS sysfs: Fix one more signature discrepancy between sysfs implementation and docs. sysfs: fix discrepancies between implementation and documentation dcdbas: remove a redundant smi_data_buf_free in dcdbas_exit dmi-id: fix a memory leak in dmi_id_init error path sysfs: sysfs_chmod_file's attr can be const firmware: Update hotplug script Driver core: move platform device creation helpers to .init.text (if MODULE=n) Driver core: reduce duplicated code for platform_device creation Driver core: use kmemdup in platform_device_add_resources ...	2010-08-06 11:36:30 -07:00
Trond Myklebust	3dce9a5c3a	NFS: NFSv4.1 is no longer a "developer only" feature Mark it as 'experimental' instead, since in practice, NFSv4.1 should now be relatively stable. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-06 13:41:41 -04:00
Trond Myklebust	b3edc2bc19	NFS: NFS_V4 is no longer an EXPERIMENTAL feature Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-06 13:41:40 -04:00
Bryan Schumaker	d5eff1a341	NFS: Fix /proc/mount for legacy binary interface Add a flag so we know if we mounted the NFS server using the legacy binary interface. If we used the legacy interface, then we should not show the mountd options. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-06 13:41:39 -04:00
Trond Myklebust	761fe93cdf	NFS: Fix the locking in nfs4_callback_getattr The delegation is protected by RCU now, so we need to replace the nfsi->rwsem protection with an rcu protected section. Reported-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-06 13:41:39 -04:00
Linus Torvalds	4aed2fd8e3	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c	2010-08-06 09:30:52 -07:00
Linus Torvalds	3a3527b646	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "net: Make accesses to ->br_port safe for sparse RCU" mce: convert to rcu_dereference_index_check() net: Make accesses to ->br_port safe for sparse RCU vfs: add fs.h to define struct file lockdep: Add an in_workqueue_context() lockdep-based test function rcu: add __rcu API for later sparse checking rcu: add an rcu_dereference_index_check() tree/tiny rcu: Add debug RCU head objects mm: remove all rcu head initializations fs: remove all rcu head initializations, except on_stack initializations powerpc: remove all rcu head initializations	2010-08-06 09:23:07 -07:00
David Howells	31d1d48e19	Fix init ordering of /dev/console vs callers of modprobe Make /dev/console get initialised before any initialisation routine that invokes modprobe because if modprobe fails, it's going to want to open /dev/console, presumably to write an error message to. The problem with that is that if the /dev/console driver is not yet initialised, the chardev handler will call request_module() to invoke modprobe, which will fail, because we never compile /dev/console as a module. This will lead to a modprobe loop, showing the following in the kernel log: request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 This can happen, for example, when the built in md5 module can't find the built in cryptomgr module (because the latter fails to initialise). The md5 module comes before the call to tty_init(), presumably because 'crypto' comes before 'drivers' alphabetically. Fix this by calling tty_init() from chrdev_init(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-06 09:17:02 -07:00
Phillip Lougher	4f86b8fd48	Squashfs: fix filename typo Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-08-05 23:52:53 +01:00
Phillip Lougher	4b676d2dbe	Squashfs: update Kconfig and documentation for LZO Update compression types supported and add some help text for the LZO Kconfig option. Also add missing "default n" line and make some trivial whitespace cleanups too. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-08-05 23:42:54 +01:00
Sage Weil	0eb6cd49f6	ceph: only queue async writeback on cap revocation if there is dirty data Normally, if the Fb cap bit is being revoked, we queue an async writeback. If there is no dirty data but we still hold the cap, this leaves the client sitting around doing nothing until the cap timeouts expire and the cap is released on its own (as it would have been without the revocation). Instead, only queue writeback if the bit is actually used (i.e., we have dirty data). If not, we can reply to the revocation immediately. Signed-off-by: Sage Weil <sage@newdream.net>	2010-08-05 13:53:40 -07:00
Jean Delvare	49c19400f6	sysfs: sysfs_chmod_file's attr can be const sysfs_chmod_file doesn't change the attribute it operates on, so this attribute can be marked const. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-08-05 13:53:34 -07:00
Aditya Kali	6c7a120ac6	ext4: Adding error check after calling ext4_mb_regular_allocator() If the bitmap block on disk is bad, ext4_mb_load_buddy() returns an error. This error is returned to the caller, ext4_mb_regular_allocator() and then to ext4_mb_new_blocks(). But ext4_mb_new_blocks() did not check for the return value of ext4_mb_regular_allocator() and would repeatedly try to load the bitmap block. The fix simply catches the return value and exits out of the 'repeat' loop after cleanup. We also take the opportunity to clean up the error handling in ext4_mb_new_blocks(). Google-Bug-Id: 2853530 Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-05 16:22:24 -04:00
Satoru Takeuchi	642b5123ac	aio: fix wrong subsystem comments - sys_io_destroy(): acutually return -EINVAL if the context pointed to is invalidIndex: linux-2.6.33-rc4/fs/aio.c - sys_io_getevents(): An argument specifying timeout is not `when', but `timeout'. - sys_io_getevents(): Should describe what is returned if this syscall succeeds. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-05 13:21:23 -07:00
Jan Kara	5f11e6a440	ext3: Fix dirtying of journalled buffers in data=journal mode In data=journal mode, we still use block_write_begin() to prepare page for writing. This function can occasionally mark buffer dirty which violates journalling assumptions - when a buffer is part of a transaction, it should be dirty and a buffer can be already part of a forget list of some transaction when block_write_begin() gets called. This violation of journalling assumptions then results in "JBD: Spotted dirty metadata buffer..." warnings. In fact, temporary dirtying the buffer while the page is still locked does not really cause problems to the journalling because we won't write the buffer until the page gets unlocked. So we just have to make sure to clear dirty bits before unlocking the page. Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2010-08-05 21:28:28 +02:00
Julia Lawall	f70cb33b9c	fs/dlm: Drop unnecessary null test hlist_for_each_entry binds its first argument to a non-null value, and thus any null test on the value of that argument is superfluous. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ iterator I; expression x,E,E1,E2; statement S,S1,S2; @@ I(x,...) { <... - (x != NULL) && E ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David Teigland <teigland@redhat.com>	2010-08-05 14:23:45 -05:00
Changli Gao	a4d935bd97	dlm: use genl_register_family_with_ops() Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>	2010-08-05 14:22:01 -05:00
Jan Kara	56d35a4cd1	ext4: Fix dirtying of journalled buffers in data=journal mode In data=journal mode, we still use block_write_begin() to prepare page for writing. This function can occasionally mark buffer dirty which violates journalling assumptions - when a buffer is part of a transaction, it should be dirty and a buffer can be already part of a forget list of some transaction when block_write_begin() gets called. This violation of journalling assumptions then results in "JBD: Spotted dirty metadata buffer..." warnings. In fact, temporary dirtying the buffer while the page is still locked does not really cause problems to the journalling because we won't write the buffer until the page gets unlocked. So we just have to make sure to clear dirty bits before unlocking the page. Signed-off-by: Jan Kara <jack@suse.cz>	2010-08-05 14:41:42 -04:00
Wang Lei	07567a5509	DNS: Make AFS go to the DNS for AFSDB records for unknown cells Add DNS query support for AFS so that it can get the IP addresses of Volume Location servers from the DNS using an AFSDB record. This requires userspace support. /etc/request-key.conf must be configured to invoke a helper for dns_resolver type keys with a subtype of "afsdb:" in the description. Signed-off-by: Wang Lei <wang840925@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:51 +00:00
Wang Lei	1a4240f476	DNS: Separate out CIFS DNS Resolver code Separate out the DNS resolver key type from the CIFS filesystem into its own module so that it can be made available for general use, including the AFS filesystem module. This facility makes it possible for the kernel to upcall to userspace to have it issue DNS requests, package up the replies and present them to the kernel in a useful form. The kernel is then able to cache the DNS replies as keys can be retained in keyrings. Resolver keys are of type "dns_resolver" and have a case-insensitive description that is of the form "[<type>:]<domain_name>". The optional <type> indicates the particular DNS lookup and packaging that's required. The <domain_name> is the query to be made. If <type> isn't given, a basic hostname to IP address lookup is made, and the result is stored in the key in the form of a printable string consisting of a comma-separated list of IPv4 and IPv6 addresses. This key type is supported by userspace helpers driven from /sbin/request-key and configured through /etc/request-key.conf. The cifs.upcall utility is invoked for UNC path server name to IP address resolution. The CIFS functionality is encapsulated by the dns_resolve_unc_to_ip() function, which is used to resolve a UNC path to an IP address for CIFS filesystem. This part remains in the CIFS module for now. See the added Documentation/networking/dns_resolver.txt for more information. Signed-off-by: Wang Lei <wang840925@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:51 +00:00
Jeff Layton	ba5dadbf4e	cifs: account for new creduid=0x%x parameter in spnego upcall string The commit that added the creduid=0x%x parameter failed to increase the buffer allocation to account for it. Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:50 +00:00
Jeff Layton	5acfec2502	cifs: reduce false positives with inode aliasing serverino autodisable It turns out that not all directory inodes with dentries on the i_dentry list are unusable here. We only consider them unusable if they are still hashed or if they have a root dentry attached. Full disclosure -- this check is inherently racy. There's nothing that stops someone from slapping a new dentry onto this inode just after this check, or hashing an existing one that's already attached. So, this is really a "best effort" thing to work around misbehaving servers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:50 +00:00
David Howells	67b7626a05	CIFS: Make cifs_convert_address() take a const src pointer and a length Make cifs_convert_address() take a const src pointer and a length so that all the strlen() calls in their can be cut out and to make it unnecessary to modify the src string. Also return the data length from dns_resolve_server_name_to_ip() so that a strlen() can be cut out of cifs_compose_mount_options() too. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:50 +00:00
Suresh Jayaraman	f579903ef3	cifs: show features compiled in as part of DebugData Fixed the nit pointed out by Jeff. From: Suresh Jayaraman <sjayaraman@suse.de> Subject: [PATCH 1/2] cifs: show features compiled in as part of DebugData This patch adds the features that are compiled in to the CIFS debugging data as shown below: $cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 1.64 Features: dfs fscache posix spnego xattr Active VFS Requests: 0 ... This patch provides a definitive way to tell what features are currently enabled in the running kernel. This could also help debugging. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:50 +00:00
Suresh Jayaraman	95c99904f6	cifs: update README Update the README file to reflect that now DebugData shows all the features enabled. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Cc: Jeff Layton <jlayton@redhat.com> -- fs/cifs/README \| 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-08-05 17:17:50 +00:00
Eric Sandeen	0cfc9255a1	ext4: re-inline ext4_rec_len_(to\|from)_disk functions commit `3d0518f4`, "ext4: New rec_len encoding for very large blocksizes" made several changes to this path, but from a perf perspective, un-inlining ext4_rec_len_from_disk() seems most significant. This function is called from ext4_check_dir_entry(), which on a file-creation workload is called extremely often. I tested this with bonnie: # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32 (this does 200 iterations) and got this for the file creations: ext4 stock: Average = 21206.8 files/s ext4 inlined: Average = 22346.7 files/s (+5%) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-05 01:46:37 -04:00
Phillip Lougher	f3065f60dd	Squashfs: fix block size use in LZO decompressor Sizing the buffer using block size alone is incorrect leading to a potential buffer over-run on 4K block size file systems (because the metadata block size is always 8K). Srclength is set to the maximum expected size of the decompressed block and it is block_size or 8K depending on whether a data or metadata block is being decompressed. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-08-05 04:51:50 +01:00
Chan Jeong	79cb8ced7e	Squashfs: Add LZO compression support Signed-off-by: Chan Jeong <chan.jeong@lge.com> Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-08-05 02:29:59 +01:00
Linus Torvalds	3cfc2c42c1	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits) Documentation: update broken web addresses. fix comment typo "choosed" -> "chosen" hostap:hostap_hw.c Fix typo in comment Fix spelling contorller -> controller in comments Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault fs/Kconfig: Fix typo Userpace -> Userspace Removing dead MACH_U300_BS26 drivers/infiniband: Remove unnecessary casts of private_data fs/ocfs2: Remove unnecessary casts of private_data libfc: use ARRAY_SIZE scsi: bfa: use ARRAY_SIZE drm: i915: use ARRAY_SIZE drm: drm_edid: use ARRAY_SIZE synclink: use ARRAY_SIZE block: cciss: use ARRAY_SIZE comment typo fixes: charater => character fix comment typos concerning "challenge" arm: plat-spear: fix typo in kerneldoc reiserfs: typo comment fix update email address ...	2010-08-04 15:31:02 -07:00
Linus Torvalds	6ba74014c1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits) phy/marvell: add 88ec048 support igb: Program MDICNFG register prior to PHY init e1000e: correct MAC-PHY interconnect register offset for 82579 hso: Add new product ID can: Add driver for esd CAN-USB/2 device l2tp: fix export of header file for userspace can-raw: Fix skb_orphan_try handling Revert "net: remove zap_completion_queue" net: cleanup inclusion phy/marvell: add 88e1121 interface mode support u32: negative offset fix net: Fix a typo from "dev" to "ndev" igb: Use irq_synchronize per vector when using MSI-X ixgbevf: fix null pointer dereference due to filter being set for VLAN 0 e1000e: Fix irq_synchronize in MSI-X case e1000e: register pm_qos request on hardware activation ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice net: Add getsockopt support for TCP thin-streams cxgb4: update driver version cxgb4: add new PCI IDs ... Manually fix up conflicts in: - drivers/net/e1000e/netdev.c: due to pm_qos registration infrastructure changes - drivers/net/phy/marvell.c: conflict between adding 88ec048 support and cleaning up the IDs - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req conflict (registration change vs marking it static)	2010-08-04 11:47:58 -07:00
Tejun Heo	e75aa85892	block_dev: always serialize exclusive open attempts bd_prepare_to_claim() incorrectly allowed multiple attempts for exclusive open to progress in parallel if the attempting holders are identical. This triggered BUG_ON() as reported in the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=16393 __bd_abort_claiming() is used to finish claiming blocks and doesn't work if multiple openers are inside a claiming block. Allowing multiple parallel open attempts to continue doesn't gain anything as those are serialized down in the call chain anyway. Fix it by always allowing only single open attempt in a claiming block. This problem can easily be reproduced by adding a delay after bd_prepare_to_claim() and attempting to mount two partitions of a disk. stable: only applicable to v2.6.35 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-04 11:17:10 -07:00
Linus Torvalds	7e6880951d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (90 commits) AppArmor: fix build warnings for non-const use of get_task_cred selinux: convert the policy type_attr_map to flex_array AppArmor: Enable configuring and building of the AppArmor security module TOMOYO: Use pathname specified by policy rather than execve() AppArmor: update path_truncate method to latest version AppArmor: core policy routines AppArmor: policy routines for loading and unpacking policy AppArmor: mediation of non file objects AppArmor: LSM interface, and security module initialization AppArmor: Enable configuring and building of the AppArmor security module AppArmor: update Maintainer and Documentation AppArmor: functions for domain transitions AppArmor: file enforcement routines AppArmor: userspace interfaces AppArmor: dfa match engine AppArmor: contexts used in attaching policy to system objects AppArmor: basic auditing infrastructure. AppArmor: misc. base functions and defines TOMOYO: Update version to 2.3.0 TOMOYO: Fix quota check. ...	2010-08-04 10:28:39 -07:00
Jiri Kosina	d790d4d583	Merge branch 'master' into for-next	2010-08-04 15:14:38 +02:00
Uwe Kleine-König	73b2c7165b	fix comment typo "choosed" -> "chosen" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-08-04 15:08:39 +02:00
Trond Myklebust	a17c2153d2	SUNRPC: Move the bound cred to struct rpc_rqst This will allow us to save the original generic cred in rpc_message, so that if we migrate from one server to another, we can generate a new bound cred without having to punt back to the NFS layer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-04 08:54:09 -04:00
Boaz Harrosh	5002dd18c5	exofs: Fix groups code when num_devices is not divisible by group_width There is a bug when num_devices is not divisible by group_width * mirrors. We would not return to the proper device and offset when looping on to the next group. The fix makes code simpler actually. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2010-08-04 13:17:58 +03:00
Boaz Harrosh	6e31609b1d	exofs: Remove useless optimization We used to compact all used devices in an IO to the beginning of the device array in an io_state. And keep a last device used so in later loops we don't iterate on all device slots. This does not prevent us from checking if slots are empty since in reads we only read from a single mirror and jump to the next mirror-set. This optimization is marginal, and needlessly complicates the code. Specially when we will later want to support raid/456 with same abstract code. So remove the distinction between "dev" and "comp". Only "dev" is used both as the device used and as the index (component) in the device array. [Note that now the io_state->dev member is redundant but I keep it because I might want to optimize by only IOing a single group, though keeping a group_width*mirrors devices in io_state, we now keep num-devices in each io_state] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2010-08-04 13:17:57 +03:00
Boaz Harrosh	b284834929	exofs: exofs_file_fsync and exofs_file_flush correctness As per Christoph advise: no need to call filemap_write_and_wait(). In exofs all metadata is at the inode so just writing the inode is all is needed. ->fsync implies this must be done synchronously. But now exofs_file_fsync can not be used by exofs_file_flush. vfs_fsync() should do that job correctly. FIXME: remove the sb_sync and fix that sb_update better. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2010-08-04 13:17:56 +03:00
Boaz Harrosh	85dc7878c6	exofs: Remove superfluous dependency on buffer_head and writeback exofs_releasepage && exofs_invalidatepage are never called. Leave the WARN_ONs but remove any code. Remove the cleanup other stale #includes. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2010-08-04 13:17:55 +03:00
Trond Myklebust	d05dd4e98f	NFS: Fix the NFS users of rpc_restart_call() Fix up those functions that depend on knowing whether or not rpc_restart_call is successful or not. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:44 -04:00
Trond Myklebust	a6f03393ec	NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks There is no real reason to have RPC_ASSASSINATED() checks in the NFS code. As far as it is concerned, this is just an RPC error... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:43 -04:00
Trond Myklebust	452e93523d	NFSv4: Clean up the process of renewing the NFSv4 lease Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:42 -04:00
Trond Myklebust	14516c3a30	NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly In RFC5661, an NFS4ERR_DELAY error on a SEQUENCE operation has the special meaning that the server is not finished processing the request. In this case we want to just retry the request without touching the slot. Also fix a bug whereby we would fail to update the sequence id if the server returned any error other than NFS_OK/NFS4ERR_DELAY. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:42 -04:00
Trond Myklebust	0a8ebba943	NFS: nfs_rename() should not have to flush out writebacks We don't really support nfs servers that invalidate the file handle after a rename, so precautions such as flushing out dirty data before renaming the file are superfluous. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:41 -04:00
Trond Myklebust	1b924e5f87	NFS: Clean up the callers of nfs_wb_all() There is no need to flush out writes before calling nfs_wb_all(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:40 -04:00
Trond Myklebust	af7fa16506	NFS: Fix up the fsync code Christoph points out that the VFS will always flush out data before calling nfs_fsync(), so we can dispense with a full call to nfs_wb_all(), and replace that with a simpler call to nfs_commit_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-08-03 22:06:07 -04:00
Theodore Ts'o	8dd420466c	jbd2: Remove t_handle_lock from start_this_handle() This should remove the last exclusive lock from start_this_handle(), so that we should now be able to start multiple transactions at the same time on large SMP systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-03 21:38:29 -04:00
Theodore Ts'o	a931da6ac9	jbd2: Change j_state_lock to be a rwlock_t Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-03 21:35:12 -04:00

... 2 3 4 5 6 ...

19326 Commits