linux-2.6

dect

Archived

Author	SHA1	Message	Date
Darrick J. Wong	5a9ae68a34	ext4: ext4_fill_super shouldn't return 0 on corruption At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit `30ca22c70e`. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-19 09:56:44 -05:00
Dan Carpenter	f4c8cc652d	ext4: missing unlock in ext4_clear_request_list() If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-17 21:46:25 -05:00
Tejun Heo	d4d7762995	block: clean up blkdev_get() wrappers and their users After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:18 +01:00
Tejun Heo	e525fd89d3	block: make blkdev_get/put() handle exclusive access Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:17 +01:00
Theodore Ts'o	7ff9c073dd	ext4: Add new ext4 inode tracepoints Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and ext4_begin_ordered_truncate() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-08 13:51:33 -05:00
Dmitry Monakhov	87009d86dc	ext4: do not try to grab the s_umount semaphore in ext4_quota_off It's not needed to sync the filesystem, and it fixes a lock_dep complaint. Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2010-11-08 13:47:33 -05:00
Theodore Ts'o	f7ad6d2e92	ext4: handle writeback of inodes which are being freed The following BUG can occur when an inode which is getting freed when it still has dirty pages outstanding, and it gets deleted (in this because it was the target of a rename). In ordered mode, we need to make sure the data pages are written just in case we crash before the rename (or unlink) is committed. If the inode is being freed then when we try to igrab the inode, we end up tripping the BUG_ON at fs/ext4/page-io.c:146. To solve this problem, we need to keep track of the number of io callbacks which are pending, and avoid destroying the inode until they have all been completed. That way we don't have to bump the inode count to keep the inode from being destroyed; an approach which doesn't work because the count could have already been dropped down to zero before the inode writeback has started (at which point we're not allowed to bump the count back up to 1, since it's already started getting freed). Thanks to Dave Chinner for suggesting this approach, which is also used by XFS. kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146! Call Trace: [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307 [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2 [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5 [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d [<ffffffff81087910>] do_writepages+0x1c/0x25 [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10 [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2 [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c [<ffffffff810c14a3>] evict+0x22/0x92 [<ffffffff810c1a3d>] iput+0x212/0x249 [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9 [<ffffffff810bdf6b>] d_kill+0x3d/0x5d [<ffffffff810be613>] dput+0x13a/0x147 [<ffffffff810b990d>] sys_renameat+0x1b5/0x258 [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38 [<ffffffff810b99c6>] sys_rename+0x16/0x18 [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Nick Bowler <nbowler@elliptictech.com>	2010-11-08 13:43:33 -05:00
Theodore Ts'o	ce7e010aef	ext4: initialize the percpu counters before replaying the journal We now initialize the percpu counters before replaying the journal, but after the journal, we recalculate the global counters, to deal with the possibility of the per-blockgroup counts getting updated by the journal replay. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-03 12:03:21 -04:00
Theodore Ts'o	b2c78cd09b	ext4: "ret" may be used uninitialized in ext4_lazyinit_thread() Newer GCC's reported the following build warning: fs/ext4/super.c: In function 'ext4_lazyinit_thread': fs/ext4/super.c:2702: warning: 'ret' may be used uninitialized in this function Fix it by removing the need for the ret variable in the first place. Signed-off-by: "Lukas Czerner" <lczerner@redhat.com> Reported-by: "Stefan Richter" <stefanr@s5r6.in-berlin.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-02 14:19:30 -04:00
Lukas Czerner	f4245bd4eb	ext4: fix lazyinit hang after removing request When the request has been removed from the list and no other request has been issued, we will end up with next wakeup scheduled to MAX_JIFFY_OFFSET which is bad. So check for that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-02 14:07:17 -04:00
Al Viro	152a083666	new helper: mount_bdev() ... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:13 -04:00
Theodore Ts'o	a107e5a3a4	Merge branch 'next' into upstream-merge Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h	2010-10-27 23:44:47 -04:00
Nicolas Kaiser	beed5ecbaa	ext4: fix unbalanced mutex unlock in error path of ext4_li_request_new Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 22:08:42 -04:00
Theodore Ts'o	1f109d5a17	ext4: make various ext4 functions be static These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:14 -04:00
Theodore Ts'o	5dabfc78dc	ext4: rename {exit,init}_ext4_() to ext4_{exit,init}_() This is a cleanup to avoid namespace leaks out of fs/ext4 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:14 -04:00
Theodore Ts'o	7f93cff90f	ext4: fix kernel oops if the journal superblock has a non-zero j_errno Commit `84061e0` fixed an accounting bug only to introduce the possibility of a kernel OOPS if the journal has a non-zero j_errno field indicating that the file system had detected a fs inconsistency. After the journal replay, if the journal superblock indicates that the file system has an error, this indication is transfered to the file system and then ext4_commit_super() is called to write this to the disk. But since the percpu counters are now initialized after the journal replay, the call to ext4_commit_super() will cause a kernel oops since it needs to use the percpu counters the ext4 superblock structure. The fix is to skip setting the ext4 free block and free inode fields if the percpu counter has not been set. Thanks to Ken Sumrall for reporting and analyzing the root causes of this bug. Addresses-Google-Bug: #3054080 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:13 -04:00
Lukas Czerner	27ee40df2b	ext4: add batched_discard into ext4 feature list Should be applied on the top of "lazy inode table initialization" and "batched discard support" patch-sets. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:12 -04:00
Lukas Czerner	7360d1731e	ext4: Add batched discard support for ext4 Walk through allocation groups and trim all free extents. It can be invoked through FITRIM ioctl on the file system. The main idea is to provide a way to trim the whole file system if needed, since some SSD's may suffer from performance loss after the whole device was filled (it does not mean that fs is full!). It search for free extents in allocation groups specified by Byte range start -> start+len. When the free extent is within this range, blocks are marked as used and then trimmed. Afterwards these blocks are marked as free in per-group bitmap. Since fstrim is a long operation it is good to have an ability to interrupt it by a signal. This was added by Dmitry Monakhov. Thanks Dimitry. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:12 -04:00
Theodore Ts'o	bd2d0210cf	ext4: use bio layer instead of buffer layer in mpage_da_submit_io Call the block I/O layer directly instad of going through the buffer layer. This should give us much better performance and scalability, as well as lowering our CPU utilization when doing buffered writeback. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:10 -04:00
Maciej Żenczykowski	c41303ced6	ext4: don't update sb journal_devnum when RO dev An ext4 filesystem on a read-only device, with an external journal which is at a different device number then recorded in the superblock will fail to honor the read-only setting of the device and trigger a superblock update (write). For example: - ext4 on a software raid which is in read-only mode - external journal on a read-write device which has changed device num - attempt to mount with -o journal_dev=<new_number> - hits BUG_ON(mddev->ro = 1) in md.c Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:06 -04:00
Lukas Czerner	857ac889cc	ext4: add interface to advertise ext4 features in sysfs User-space should have the opportunity to check what features doest ext4 support in each particular copy. This adds easy interface by creating new "features" directory in sys/fs/ext4/. In that directory files advertising feature names can be created. Add lazy_itable_init to the feature list. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:05 -04:00
Lukas Czerner	bfff68738f	ext4: add support for lazy inode table initialization When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:05 -04:00
Sergey Senozhatsky	a1c6c5698d	ext4: fix NULL pointer dereference in print_daily_error_info() Fix NULL pointer dereference in print_daily_error_info, when called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error reporting timer in ext4_put_super. Google-Bug-Id: 3017663 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:04 -04:00
Linus Torvalds	79f14b7c56	Merge branch 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits) BKL: remove BKL from freevxfs BKL: remove BKL from qnx4 autofs4: Only declare function when CONFIG_COMPAT is defined autofs: Only declare function when CONFIG_COMPAT is defined ncpfs: Lock socket in ncpfs while setting its callbacks fs/locks.c: prepare for BKL removal BKL: Remove BKL from ncpfs BKL: Remove BKL from OCFS2 BKL: Remove BKL from squashfs BKL: Remove BKL from jffs2 BKL: Remove BKL from ecryptfs BKL: Remove BKL from afs BKL: Remove BKL from USB gadgetfs BKL: Remove BKL from autofs4 BKL: Remove BKL from isofs BKL: Remove BKL from fat BKL: Remove BKL from ext2 filesystem BKL: Remove BKL from do_new_mount() BKL: Remove BKL from cgroup BKL: Remove BKL from NTFS ...	2010-10-22 10:52:01 -07:00
Jan Blunck	f2143c4e2e	BKL: Remove BKL from ext4 filesystem The BKL is still used in ext4_put_super(), ext4_fill_super() and ext4_remount(). All three calles are protected against concurrent calls by the s_umount rw semaphore of struct super_block. Therefore the BKL is protecting nothing in this case. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:38 +02:00
Jan Blunck	db71922217	BKL: Explicitly add BKL around get_sb/fill_super This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-04 21:10:10 +02:00
Patrick J. LoPresti	30ca22c70e	ext3/ext4: Factor out disk addressability check As part of adding support for OCFS2 to mount huge volumes, we need to check that the sector_t and page cache of the system are capable of addressing the entire volume. An identical check already appears in ext3 and ext4. This patch moves the addressability check into its own function in fs/libfs.c and modifies ext3 and ext4 to invoke it. [Edited to -EINVAL instead of BUG_ON() for bad blocksize_bits -- Joel] Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:41:42 -07:00
Linus Torvalds	5f248c9c25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c	2010-08-10 11:26:52 -07:00
Al Viro	0930fcc1ee	convert ext4 to ->evict_inode() pretty much brute-force... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:30 -04:00
Linus Torvalds	09dc942c2a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Adding error check after calling ext4_mb_regular_allocator() ext4: Fix dirtying of journalled buffers in data=journal mode ext4: re-inline ext4_rec_len_(to\|from)_disk functions jbd2: Remove t_handle_lock from start_this_handle() jbd2: Change j_state_lock to be a rwlock_t jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop ext4: Add mount options in superblock ext4: force block allocation on quota_off ext4: fix freeze deadlock under IO ext4: drop inode from orphan list if ext4_delete_inode() fails ext4: check to make make sure bd_dev is set before dereferencing it jbd2: Make barrier messages less scary ext4: don't print scary messages for allocation failures post-abort ext4: fix EFBIG edge case when writing to large non-extent file ext4: fix ext4_get_blocks references ext4: Always journal quota file modifications ext4: Fix potential memory leak in ext4_fill_super ext4: Don't error out the fs if the user tries to make a file too big ext4: allocate stripe-multiple IOs on stripe boundaries ext4: move aio completion after unwritten extent conversion ... Fix up conflicts in fs/ext4/inode.c as per Ted. Fix up xfs conflicts as per earlier xfs merge.	2010-08-07 13:03:53 -07:00
Theodore Ts'o	a931da6ac9	jbd2: Change j_state_lock to be a rwlock_t Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-03 21:35:12 -04:00
Theodore Ts'o	8b67f04ab9	ext4: Add mount options in superblock Allow mount options to be stored in the superblock. Also add default mount option bits for nobarrier, block_validity, discard, and nodelalloc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-01 23:14:20 -04:00
Dmitry Monakhov	ca0e05e4b1	ext4: force block allocation on quota_off Perform full sync procedure so that any delayed allocation blocks are allocated so quota will be consistent. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-01 17:48:36 -04:00
Eric Sandeen	437f88cc03	ext4: fix freeze deadlock under IO Commit `6b0310fbf0` caused a regression resulting in deadlocks when freezing a filesystem which had active IO; the vfs_check_frozen level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing through. Duh. Changing the test to FREEZE_TRANS should let the normal freeze syncing get through the fs, but still block any transactions from starting once the fs is completely frozen. I tested this by running fsstress in the background while periodically snapshotting the fs and running fsck on the result. I ran into occasional deadlocks, but different ones. I think this is a fine fix for the problem at hand, and the other deadlocky things will need more investigation. Reported-by: Phillip Susi <psusi@cfl.rr.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-08-01 17:33:29 -04:00
Theodore Ts'o	f613dfcb33	ext4: check to make make sure bd_dev is set before dereferencing it There are some drivers which may not set bdev->bd_dev. So make sure it is non-NULL before dereferencing it. Google-Bug-Id: 1773557 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:08 -04:00
Jan Kara	62d2b5f2dc	ext4: Always journal quota file modifications When journaled quota options are not specified, we do writes to quota files just in data=ordered mode. This actually causes warnings from JBD2 about dirty journaled buffer because ext4_getblk unconditionally treats a block allocated by it as metadata. Since quota actually is filesystem metadata, the easiest way to get rid of the warning is to always treat quota writes as metadata... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:07 -04:00
Cyrill Gorcunov	dcc7dae3cb	ext4: Fix potential memory leak in ext4_fill_super Under heavy memory pressure we may hit out of memory situation and as result kstrdup'ed options will not be freed. Fix it. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:07 -04:00
Theodore Ts'o	66e61a9e95	ext4: Once a day, printk file system error information to dmesg This allows us to grab any file system error messages by scraping /var/log/messages. This will make it easy for us to do error analysis across the very large number of machines as we deploy ext4 across the fleet. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:04 -04:00
Theodore Ts'o	1c13d5c087	ext4: Save error information to the superblock for analysis Save number of file system errors, and the time function name, line number, block number, and inode number of the first and most recent errors reported on the file system in the superblock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:03 -04:00
Theodore Ts'o	c398eda0e4	ext4: Pass line numbers to ext4_error() and friends Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:40 -04:00
Theodore Ts'o	90c7201b97	ext4: Pass line number to ext4_journal_abort_handle() This allows the error messages to include the line number Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 14:53:24 -04:00
Theodore Ts'o	e29136f80e	ext4: Enhance ext4_grp_locked_error() to take block and function numbers Also use a macro definition so that __func__ and __LINE__ is implicit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 12:54:28 -04:00
Theodore Ts'o	c67d859e39	ext4: clean up ext4_abort() so __func__ is now implicit Use a macro definition for ext4_abort() to clean up the .c files a wee bit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 11:07:07 -04:00
Jiri Kosina	f1bbbb6912	Merge branch 'master' into for-next	2010-06-16 18:08:13 +02:00
Uwe Kleine-König	421f91d21a	fix typos concerning "initiali[zs]e" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-16 18:05:05 +02:00
Christoph Hellwig	206f7ab4f4	ext4: remove vestiges of nobh support The nobh option was only supported for writeback mode, but given that all write paths actually create buffer heads it effectively was a no-op already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-14 14:42:49 -04:00
Linus Torvalds	d28619f156	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Convert quota statistics to generic percpu_counter ext3 uses rb_node = NULL; to zero rb_root. quota: Fixup dquot_transfer reiserfs: Fix resuming of quotas on remount read-write pohmelfs: Remove dead quota code ufs: Remove dead quota code udf: Remove dead quota code quota: rename default quotactl methods to dquot_ quota: explicitly set ->dq_op and ->s_qcop quota: drop remount argument to ->quota_on and ->quota_off quota: move unmount handling into the filesystem quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers quota: move remount handling into the filesystem ocfs2: Fix use after free on remount read-only Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c	2010-05-30 09:11:11 -07:00
Christoph Hellwig	287a80958c	quota: rename default quotactl methods to dquot_ Follow the dquot_* style used elsewhere in dquot.c. [Jan Kara: Fixed up missing conversion of ext2] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-24 14:10:17 +02:00
Christoph Hellwig	307ae18a56	quota: drop remount argument to ->quota_on and ->quota_off Remount handling has fully moved into the filesystem, so all this is superflous now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-24 14:09:12 +02:00
Christoph Hellwig	e0ccfd959c	quota: move unmount handling into the filesystem Currently the VFS calls into the quotactl interface for unmounting filesystems. This means filesystems with their own quota handling can't easily distinguish between user-space originating quotaoff and an unount. Instead move the responsibily of the unmount handling into the filesystem to be consistent with all other dquot handling. Note that we do call dquot_disable a lot later now, e.g. after a sync_filesystem. But this is fine as the quota code does all its writes via blockdev's mapping and that is synced even later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-24 14:09:12 +02:00
Christoph Hellwig	0f0dd62fdd	quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers Instead of having wrappers in the VFS namespace export the dquot_suspend and dquot_resume helpers directly. Also rename vfs_quota_disable to dquot_disable while we're at it. [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-24 14:06:40 +02:00
Christoph Hellwig	c79d967de3	quota: move remount handling into the filesystem Currently do_remount_sb calls into the dquot code to tell it about going from rw to ro and ro to rw. Move this code into the filesystem to not depend on the dquot code in the VFS - note ocfs2 already ignores these calls and handles remount by itself. This gets rid of overloading the quotactl calls and allows to unify the VFS and XFS codepaths in that area later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-24 14:06:39 +02:00
Theodore Ts'o	60e6679e28	ext4: Drop whitespace at end of lines This patch was generated using: #!/usr/bin/perl -i while (<>) { s/[ ]+$//; print; } Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-17 07:00:00 -04:00
Dmitry Monakhov	12e9b89200	ext4: Use bitops to read/modify i_flags in struct ext4_inode_info At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 22:00:00 -04:00
Jan Kara	39a4bade8c	ext4: Show journal_checksum option We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 17:00:00 -04:00
Curt Wohlgemuth	fbe845ddf3	ext4: Remove extraneous newlines in ext4_msg() calls Addresses-Google-Bug: #2562325 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 13:00:00 -04:00
Curt Wohlgemuth	d4c402d9fd	ext4: Print mount options in when mounting and add a remount message This adds a "re-mounted" message to ext4_remount(), and both it and the mount message in ext4_fill_super() now have the original mount options data string. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 12:00:00 -04:00
Dmitry Monakhov	84061e07c5	ext4: init statistics after journal recovery Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 08:00:00 -04:00
Eric Sandeen	6b0310fbf0	ext4: don't return to userspace after freezing the fs with a mutex held ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 02:00:00 -04:00
Linus Torvalds	39f1cd635c	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs ext4: Don't use delayed allocation by default when used instead of ext3 ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3 ext4: Fix estimate of # of blocks needed to write indirect-mapped files	2010-03-25 14:10:53 -07:00
Jan Kara	ba69f9ab7d	ext4: Don't use delayed allocation by default when used instead of ext3 When ext4 driver is used to mount a filesystem instead of the ext3 file system driver (through CONFIG_EXT4_USE_FOR_EXT23), do not enable delayed allocation by default since some ext3 users and application writers have developed unfortunate expectations about the safety of writing files on systems subject to sudden and violent death without using fsync(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-24 20:18:37 -04:00
Theodore Ts'o	37f328eb60	ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3 Oops. (Blush.) Thanks to Sedat Dilek for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-24 20:06:41 -04:00
Linus Torvalds	c32da02342	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX	2010-03-12 16:04:50 -08:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Linus Torvalds	e213e26ab3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c	2010-03-05 13:20:53 -08:00
Christoph Hellwig	871a293155	dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	9f75475802	dquot: cleanup dquot drop routine Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	257ba15ced	dquot: move dquot drop responsibility into the filesystem Currently clear_inode calls vfs_dq_drop directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the drop inside the ->clear_inode superblock operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	b43fa8284d	dquot: cleanup dquot transfer routine Get rid of the transfer dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_transfer helper to __dquot_transfer and vfs_dq_transfer to dquot_transfer to have a consistent namespace, and make the new dquot_transfer return a normal negative errno value which all callers expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	63936ddaa1	dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Christoph Hellwig	5dd4056db8	dquot: cleanup space allocation / freeing routines Get rid of the alloc_space, free_space, reserve_space, claim_space and release_rsv dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Move shared logic into the common __dquot_alloc_space, dquot_claim_space_nodirty and __dquot_free_space low-level methods, and rationalize the wrappers around it to move as much as possible code into the common block for CONFIG_QUOTA vs not. Also rename all these helpers to be named dquot_* instead of vfs_dq_*. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Dmitry Monakhov	67eeb5685d	ext4: Fix ext4_quota_write cross block boundary behaviour We always assume what dquot update result in changes in one data block But ext4_quota_write() function may handle cross block boundary writes In fact if this ever happen it will result in incorrect journal credits reservation, and later a BUG_ON. As soon this never happen the boundary cross loop is NOOP. In order to make things straight let's remove this loop and assert cross boundary condition. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 08:08:51 -05:00
Frank Mayhar	273df556b6	ext4: Convert BUG_ON checks to use ext4_error() instead Convert a bunch of BUG_ONs to emit a ext4_error() message and return EIO. This is a first pass and most notably does _not_ cover mballoc.c, which is a morass of void functions. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 11:46:09 -05:00
Jiaying Zhang	744692dc05	ext4: use ext4_get_block_write in buffer write Allocate uninitialized extent before ext4 buffer write and convert the extent to initialized after io completes. The purpose is to make sure an extent can only be marked initialized after it has been written with new data so we can safely drop the i_mutex lock in ext4 DIO read without exposing stale data. This helps to improve multi-thread DIO read performance on high-speed disks. Skip the nobh and data=journal mount cases to make things simple for now. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-04 16:14:02 -05:00
Jiaying Zhang	c7064ef13b	ext4: mechanical rename some of the direct I/O get_block's identifiers This commit renames some of the direct I/O's block allocation flags, variables, and functions introduced in Mingming's "Direct IO for holes and fallocate" patches so that they can be used by ext4's buffered write path as well. Also changed the related function comments accordingly to cover both direct write and buffered write cases. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 13:28:44 -05:00
Dmitry Monakhov	437ca0fda3	ext4: deprecate obsoleted mount options Declare following list of mount options as deprecated: - bsddf, miniddf - grpid, bsdgroups, nogrpid, sysvgroups Declare following list of default mount options as deprecated: - bsdgroups Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-01 22:29:21 -05:00
Dmitry Monakhov	56c50f11f4	ext4: trivial quota cleanup The patch is aimed to reorganize and simplify quota code a bit. Quota code is itself complex enough, but we can make it more readable in some places: - Move quota option parsing to separate functions. - Simplify old-quota and journaled-quota mix check. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-01 23:28:41 -05:00
Dmitry Monakhov	482a74258f	ext4: mount flags manipulation cleanup Replace intermediate EXT4_MOUNT_XXX flags manipulation to corresponding macro. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-24 11:35:32 -05:00
Eric Sandeen	12062dddda	ext4: move __func__ into a macro for ext4_warning, ext4_error Just a pet peeve of mine; we had a mishash of calls with either __func__ or "function_name" and the latter tends to get out of sync. I think it's easier to just hide the __func__ in a macro, and it'll be consistent from then on. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 14:19:27 -05:00
Thadeu Lima de Souza Cascardo	d6b198bc8a	fix ext3/ext4 comment typo compain -> complain Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:36 +01:00
Theodore Ts'o	9d0be50230	ext4: Calculate metadata requirements more accurately In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit `0637c6f`, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-01-01 02:41:30 -05:00
Andrew Morton	a6b43e3825	ext4: fix unsigned long long printk warning in super.c sparc64 allmodconfig: fs/ext4/super.c: In function `lifetime_write_kbytes_show': fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-23 07:48:08 -05:00
Theodore Ts'o	51b7e3c9fb	ext4: add module aliases for ext2 and ext3 Add module aliases for ext2 and ext3 when CONFIG_EXT4_USE_FOR_EXT23 is set. This makes the existing user-space stuff like mkinitrd working as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-21 10:56:09 -05:00
Dmitry Monakhov	a9e7f44720	ext4: Convert to generic reserved quota's space management. This patch also fixes write vs chown race condition. Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-23 13:33:55 +01:00
André Goddard Rosa	e7d2860b69	tree-wide: convert open calls to remove spaces to skip_spaces() lib function Makes use of skip_spaces() defined in lib/string.c for removing leading spaces from strings all over the tree. It decreases lib.a code size by 47 bytes and reuses the function tree-wide: text data bss dec hex filename 64688 584 592 65864 10148 (TOTALS-BEFORE) 64641 584 592 65817 10119 (TOTALS-AFTER) Also, while at it, if we see (str && isspace(str)), we can be sure to remove the first condition (str) as the second one (isspace(str)) also evaluates to 0 whenever str == 0, making it redundant. In other words, "a char equals zero is never a space". Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below, and found occurrences of this pattern on 3 more files: drivers/leds/led-class.c drivers/leds/ledtrig-timer.c drivers/video/output.c @@ expression str; @@ ( // ignore skip_spaces cases while (str && isspace(str)) { $str++;\\|++str;$ } \| - str && isspace(*str) ) Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Julia Lawall <julia@diku.dk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Neil Brown <neilb@suse.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: David Howells <dhowells@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:32 -08:00
Linus Torvalds	3126c136bc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits) ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() ext3: Fix data / filesystem corruption when write fails to copy data ext4: Support for 64-bit quota format ext3: Support for vfsv1 quota format quota: Implement quota format with 64-bit space and inode limits quota: Move definition of QFMT_OCFS2 to linux/quota.h ext2: fix comment in ext2_find_entry about return values ext3: Unify log messages in ext3 ext2: clear uptodate flag on super block I/O error ext2: Unify log messages in ext2 ext3: make "norecovery" an alias for "noload" ext3: Don't update the superblock in ext3_statfs() ext3: journal all modifications in ext3_xattr_set_handle ext2: Explicitly assign values to on-disk enum of filetypes quota: Fix WARN_ON in lookup_one_len const: struct quota_format_ops ubifs: remove manual O_SYNC handling afs: remove manual O_SYNC handling kill wait_on_page_writeback_range vfs: Implement proper O_SYNC semantics ...	2009-12-11 15:31:13 -08:00
Jan Kara	5a20bdfcdc	ext4: Support for 64-bit quota format Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Theodore Ts'o	a214238d3b	ext4: Do not override ext2 or ext3 if built they are built as modules The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:09:58 -05:00
Eric Sandeen	15121c18a2	ext4: Fix optional-arg mount options We have 2 mount options, "barrier" and "auto_da_alloc" which may or may not take a 1/0 argument. This causes the ext4 superblock mount code to subtract uninitialized pointers and pass the result to kmalloc, which results in very noisy failures. Per Ted's suggestion, initialize the args struct so that we know whether match_token() found an argument for the option, and skip match_int() if not. Also, return error (0) from parse_options if we thought we found an argument, but match_int() Fails. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 20:17:55 -05:00
Jan Kara	b436b9bef8	ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 23:51:10 -05:00
Josef Bacik	d4edac314e	ext4: wait for log to commit when umounting There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:48:58 -05:00
Theodore Ts'o	24b584240a	ext4: Use ext4 file system driver for ext2/ext3 file system mounts Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-07 14:08:51 -05:00
Eric Sandeen	e3bb52ae2b	ext4: make "norecovery" an alias for "noload" Users on the linux-ext4 list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext4. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-19 14:28:50 -05:00
Eric Sandeen	5328e63531	ext4: make trim/discard optional (and off by default) It is anticipated that when sb_issue_discard starts doing real work on trim-capable devices, we may see issues. Make this mount-time optional, and default it to off until we know that things are working out OK. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-19 14:25:42 -05:00
Theodore Ts'o	3f8fb9490e	ext4: don't update the superblock in ext4_statfs() commit `a71ce8c6c9` updated ext4_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-11-23 07:24:52 -05:00
Theodore Ts'o	cf40db137c	ext4: remove failed journal checksum check Now that we are checking for failed journal checksums in the jbd2 layer, we don't need to check in the ext4 mount path --- since a checksum fail will result in ext4_load_journal() returning an error, causing the file system to refuse to be mounted until e2fsck can deal with the problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-22 21:00:01 -05:00
Theodore Ts'o	503358ae01	ext4: avoid divide by zero when trying to mount a corrupted file system If s_log_groups_per_flex is greater than 31, then groups_per_flex will will overflow and cause a divide by zero error. This can cause kernel BUG if such a file system is mounted. Thanks to Nageswara R Sastry for analyzing the failure and providing an initial patch. http://bugzilla.kernel.org/show_bug.cgi?id=14287 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-11-23 07:24:46 -05:00
Linus Torvalds	d4da6c9ccf	Revert "ext4: Remove journal_checksum mount option and enable it by default" This reverts commit `d0646f7b63`, as requested by Eric Sandeen. It can basically cause an ext4 filesystem to miss recovery (and thus get mounted with errors) if the journal checksum does not match. Quoth Eric: "My hand-wavy hunch about what is happening is that we're finding a bad checksum on the last partially-written transaction, which is not surprising, but if we have a wrapped log and we're doing the initial scan for head/tail, and we abort scanning on that bad checksum, then we are essentially running an unrecovered filesystem. But that's hand-wavy and I need to go look at the code. We lived without journal checksums on by default until now, and at this point they're doing more harm than good, so we should revert the default-changing commit until we can fix it and do some good power-fail testing with the fixes in place." See http://bugzilla.kernel.org/show_bug.cgi?id=14354 for all the gory details. Requested-by: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mathias Burén <mathias.buren@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-02 10:15:27 -08:00
Eric Sandeen	f0e2dfa7f3	ext4: drop ext4dev compat Kconfig & super.c promised it'd be gone by 2.6.31, so it's about time to drop it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-10-01 02:21:07 -04:00
Theodore Ts'o	296c355cd6	ext4: Use tracepoints for mb_history trace file The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 00:32:42 -04:00
Theodore Ts'o	90576c0b9a	ext4, jbd2: Drop unneeded printks at mount and unmount time There are a number of kernel printk's which are printed when an ext4 filesystem is mounted and unmounted. Disable them to economize space in the system logs. In addition, disabling the mballoc stats by default saves a number of unneeded atomic operations for every block allocation or deallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 15:51:30 -04:00
Curt Wohlgemuth	d3d1faf6a7	ext4: Handle nested ext4_journal_start/stop calls without a journal This patch fixes a problem with handling nested calls to ext4_journal_start/ext4_journal_stop, when there is no journal present. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 11:01:03 -04:00
Mingming Cao	8d5d02e6b1	ext4: async direct IO for holes and fallocate support For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2009-09-28 15:48:29 -04:00
Mingming Cao	4c0425ff68	ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:48:41 -04:00
Theodore Ts'o	55138e0bc2	ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks Work around problems in the writeback code to force out writebacks in larger chunks than just 4mb, which is just too small. This also works around limitations in the ext4 block allocator, which can't allocate more than 2048 blocks at a time. So we need to defeat the round-robin characteristics of the writeback code and try to write out as many blocks in one inode before allowing the writeback code to move on to another inode. We add a a new per-filesystem tunable, max_writeback_mb_bump, which caps this to a default of 128mb per inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 13:31:31 -04:00
Alexey Dobriyan	0d54b217a2	const: make struct super_block::s_qcop const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	61e225dc34	const: make struct super_block::dq_op const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Eric Sandeen	fb0a387dcd	ext4: limit block allocations for indirect-block files to < 2^32 Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-16 14:45:10 -04:00
Theodore Ts'o	3661d28615	ext4: Fix include/trace/events/ext4.h to work with Systemtap Using relative pathnames in #include statements interacts badly with SystemTap, since the fs/ext4/*.h header files are not packaged up as part of a distribution kernel's header files. Since systemtap doesn't use TP_fast_assign(), we can use a blind structure definition and then make sure the needed header files are defined before the ext4 source files #include the trace/events/ext4.h header file. https://bugzilla.redhat.com/show_bug.cgi?id=512478 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-14 22:59:50 -04:00
Theodore Ts'o	7ad9bb651f	ext4: Fix initalization of s_flex_groups The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-11 16:51:28 -04:00
Theodore Ts'o	71290b368a	ext4: Don't update superblock write time when filesystem is read-only This avoids updating the superblock write time when we are mounting the root file system read/only but we need to replay the journal; at that point, for people who are east of GMT and who make their clock tick in localtime for Windows bug-for-bug compatibility, and this will cause e2fsck to complain and force a full file system check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-10 17:31:04 -04:00
Theodore Ts'o	d0646f7b63	ext4: Remove journal_checksum mount option and enable it by default There's no real cost for the journal checksum feature, and we should make sure it is enabled all the time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-05 12:50:43 -04:00
Eric Sandeen	a13fb1a453	ext4: Add feature set check helper for mount & remount paths A user reported that although his root ext4 filesystem was mounting fine, other filesystems would not mount, with the: "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF" error on his 32-bit box built without CONFIG_LBDAF. This is because the test at mount time for this situation was not being re-checked on remount, and the normal boot process makes an ro->rw transition, so this was being missed. Refactor to make a common helper function to test the filesystem features against the type of mount request (RO vs. RW) so that we stay consistent. Addresses Red-Hat-Bugzilla: #517650 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-18 00:20:23 -04:00
Eric Sandeen	bf43d84b18	ext4: reject too-large filesystems on 32-bit kernels ext4 will happily mount a > 16T filesystem on a 32-bit box, but this is not safe; writes to the block device will wrap past 16T and the page cache can't index past 16T (232 index * 4k pages). Adding another test to the existing "too many sectors" test should do the trick. Add a comment, a relevant return value, and fix the reference to the CONFIG_LBD(AF) option as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 23:48:51 -04:00
Theodore Ts'o	78f1ddbb49	ext4: Avoid null pointer dereference when decoding EROFS w/o a journal We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 23:09:47 -04:00
Al Viro	d4bfe2f76d	switch ext4 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Linus Torvalds	31583d6acf	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR	2009-06-19 17:43:04 -07:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Andreas Dilger	11013911da	ext4: teach the inode allocator to use a goal inode number Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:45:35 -04:00
Theodore Ts'o	4ab2f15b7f	ext4: move the abort flag from s_mount_opts to s_mount_flags We're running out of space in the mount options word, and EXT4_MOUNT_ABORT isn't really a mount option, but a run-time flag. So move it to become EXT4_MF_FS_ABORTED in s_mount_flags. Also remove bogus ext2_fs.h / ext4.h simultaneous #include protection, which can never happen. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:36 -04:00
Theodore Ts'o	7f4520cc62	ext4: change s_mount_opt to be an unsigned int We can only fit 32 options in s_mount_opt because an unsigned long is 32-bits on a x86 machine. So use an unsigned int to save space on 64-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:41 -04:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	ebc1ac1645	->write_super lock_super pushdown Push down lock_super into ->write_super instances and remove it from the caller. Following filesystem don't need ->s_lock in ->write_super and are skipped: * bfs, nilfs2 - no other uses of s_lock and have internal locks in ->write_super * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in ->write_super * xfs - no other uses of s_lock and uses internal lock (buffer lock on superblock buffer) to serialize ->write_super. Also xfs_fs_write_super is superflous and will go away in the next merge window Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:09 -04:00
Al Viro	bbd6851a32	Push lock_super() into the ->remount_fs() of filesystems that care about it Note that since we can't run into contention between remount_fs and write_super (due to exclusion on s_umount), we have to care only about filesystems that touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs do need it; fat doesn't since its ->remount_fs() only accesses assign-once data (basically, it's "we have no atime on directories and only have atime on files for vfat; force nodiratime and possibly noatime into *flags"). [folded a build fix from hch] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Al Viro	a9e220f832	No need to do lock_super() for exclusion in generic_shutdown_super() We can't run into contention on it. All other callers of lock_super() either hold s_umount (and we have it exclusive) or hold an active reference to superblock in question, which prevents the call of generic_shutdown_super() while the reference is held. So we can replace lock_super(s) with get_fs_excl() in generic_shutdown_super() (and corresponding change for unlock_super(), of course). Since ext4 expects s_lock held for its put_super, take lock_super() into it. The rest of filesystems do not care at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	8c85e12512	remove ->write_super call in generic_shutdown_super We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Eric Sandeen	b31e15527a	ext4: Change all super.c messages to print the device This patch changes ext4 super.c to include the device name with all warning/error messages, by using a new utility function ext4_msg. It's a rather large patch, but very mechanic. I left debug printks alone. This is a straightforward port of a patch which Andi Kleen did for ext3. Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-04 17:36:36 -04:00
Andreas Dilger	0b8e58a140	ext4: super.c whitespace cleanup Cleanup of whitespace and formatting. Initially driven by confusing indents for the ext4_{block,inode}_bitmap() et. al. helper routines, but figured I'd cleanup some other 80-column wrapping and other indenting problems at the same time. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-03 17:59:28 -04:00
Theodore Ts'o	88b6edd17c	ext4: Clean up calls to ext4_get_group_desc() If the caller isn't planning on modifying the block group descriptors, there's no need to pass in a pointer to a struct buffer_head. Nuking this saves a tiny amount of CPU time and stack space usage. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-25 11:50:39 -04:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Manish Katiyar	f68301656b	ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 23:52:44 -04:00
Theodore Ts'o	6fd058f779	ext4: Add a comprehensive block validity check to ext4_get_blocks() To catch filesystem bugs or corruption which could lead to the filesystem getting severly damaged, this patch adds a facility for tracking all of the filesystem metadata blocks by contiguous regions in a red-black tree. This allows quick searching of the tree to locate extents which might overlap with filesystem metadata blocks. This facility is also used by the multi-block allocator to assure that it is not allocating blocks out of the system zone, as well as by the routines used when reading indirect blocks and extents information from disk to make sure their contents are valid. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 15:38:01 -04:00
Aneesh Kumar K.V	955ce5f5be	ext4: Convert ext4_lock_group to use sb_bgl_lock We have sb_bgl_lock() and ext4_group_info.bb_state bit spinlock to protech group information. The later is only used within mballoc code. Consolidate them to use sb_bgl_lock(). This makes the mballoc.c code much simpler and also avoid confusion with two locks protecting same info. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 20:35:09 -04:00
Curt Wohlgemuth	f40339031b	ext4: Make the length of the mb_history file tunable In memory-constrained systems with many partitions, the ~68K for each partition for the mb_history buffer can be excessive. This patch adds a new mount option, mb_history_length, as well as a way of setting the default via a module parameter (or via a sysfs parameter in /sys/module/ext4/parameter/default_mb_history_length). If the mb_history_length is set to zero, the mb_history facility is disabled entirely. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 20:27:20 -04:00
Theodore Ts'o	bb23c20a85	ext4: Move fs/ext4/group.h into ext4.h Move the function prototypes in group.h into ext4.h so they are all defined in one place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 19:44:44 -04:00
Theodore Ts'o	596397b77c	ext4: Move fs/ext4/namei.h into ext4.h The fs/ext4/namei.h header file had only a single function declaration, and should have never been a standalone file. Move it into ext4.h, where should have been from the beginning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 13:49:15 -04:00
Theodore Ts'o	9bffad1ed2	ext4: convert instrumentation from markers to tracepoints Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 11:48:11 -04:00
Theodore Ts'o	32ed5058ce	ext4: Replace lock/unlock_super() with an explicit lock for resizing Use a separate lock to protect s_groups_count and the other block group descriptors which get changed via an on-line resize operation, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:53:39 -04:00
Theodore Ts'o	3b9d4ed266	ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Use a separate lock to protect the orphan list, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:54:04 -04:00
Theodore Ts'o	a63c9eb2ce	ext4: ext4_mark_recovery_complete() doesn't need to use lock_super The function ext4_mark_recovery_complete() is called from two call paths: either (a) while mounting the filesystem, in which case there's no danger of any other CPU calling write_super() until the mount is completed, and (b) while remounting the filesystem read-write, in which case the fs core has already locked the superblock. This also allows us to take out a very vile unlock_super()/lock_super() pair in ext4_remount(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 01:59:42 -04:00
Theodore Ts'o	114e9fc907	ext4: Remove outdated comment about lock_super() ext4_fill_super() is no longer called by read_super(), and it is no longer called with the superblock locked. The unlock_super()/lock_super() is no longer present, so this comment is entirely superfluous. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 15:48:07 -04:00
Theodore Ts'o	8df9675f8b	ext4: Avoid races caused by on-line resizing and SMP memory reordering Ext4's on-line resizing adds a new block group and then, only at the last step adjusts s_groups_count. However, it's possible on SMP systems that another CPU could see the updated the s_group_count and not see the newly initialized data structures for the just-added block group. For this reason, it's important to insert a SMP read barrier after reading s_groups_count and before reading any (for example) the new block group descriptors allowed by the increased value of s_groups_count. Unfortunately, we rather blatently violate this locking protocol documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes happen relatively rarely, and (2) it seems rare that the filesystem code will immediately try to use just-added block group before any memory ordering issues resolve themselves. So apparently problems here are relatively hard to hit, since ext3 has been vulnerable to the same issue for years with no one apparently complaining. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 08:50:38 -04:00
Theodore Ts'o	9ca92389c5	ext4: Use separate super_operations structure for no_journal filesystems By using a separate super_operations structure for filesystems that have and don't have journals, we can simply ext4_write_super() --- which is only needed when no journal is present --- and ext4_freeze(), ext4_unfreeze(), and ext4_sync_fs(), which are only needed when the journal is present. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 12:52:25 -04:00
Theodore Ts'o	7234ab2a55	ext4: Fix and simplify s_dirt handling The s_dirt flag wasn't completely handled correctly, but it didn't really matter when journalling was enabled. It turns out that when ext4 runs without a journal, we don't clear s_dirt in places where we should have, with the result that the high-level write_super() function was writing the superblock when it wasn't necessary. So we fix this by making ext4_commit_super() clear the s_dirt flag, and removing many of the other places where s_dirt is manipulated. When journalling is enabled, the s_dirt flag might be left set more often, but s_dirt really doesn't matter when journalling is enabled. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-30 21:24:04 -04:00
Theodore Ts'o	e2d670523c	ext4: Simplify ext4_commit_super()'s function signature The ext4_commit_super() function took both a struct super_block * and a struct ext4_super_block *, but the struct ext4_super_block can be derived from the struct super_block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 00:33:44 -04:00
Theodore Ts'o	f7c439504c	ext4: Use is_power_of_2() for clarity Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 23:31:59 -04:00
Theodore Ts'o	c5ca7c7636	ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array For very large filesystems, the s_flex_groups array can get quite big. For example, a filesystem that can be resized up to 16TB will have 8192 flex groups (assuming the default flex_bg size of 16), so the array is 96k, which is very marginal for kmalloc(). On the other hand, a 160GB filesystem without the resize_inode feature will only require 960 bytes. So we try to allocate the array first using kmalloc(), and if that fails, we'll try to use vmalloc() instead. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-27 22:48:48 -04:00
From: Thiemo Nagel	0f2ddca66d	ext4: check block device size on mount Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-07 14:07:47 -04:00
Theodore Ts'o	06705bff91	ext4: Regularize mount options Add support for using the mount options "barrier" and "nobarrier", and "auto_da_alloc" and "noauto_da_alloc", which is more consistent than "barrier=<0\|1>" or "auto_da_alloc=<0\|1>". Most other ext3/ext4 mount options use the foo/nofoo naming convention. We allow the old forms of these mount options for backwards compatibility. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-28 10:59:57 -04:00
Theodore Ts'o	afd4672dc7	ext4: Add auto_da_alloc mount option Add a mount option which allows the user to disable automatic allocation of blocks whose allocation by delayed allocation when the file was originally truncated or when the file is renamed over an existing file. This feature is intended to save users from the effects of naive application writers, but it reduces the effectiveness of the delayed allocation code. This mount option disables this safety feature, which may be desirable for prodcutions systems where the risk of unclean shutdowns or unexpected system crashes is low. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-16 23:12:23 -04:00
Theodore Ts'o	7d39db14a4	ext4: Use struct flex_groups to calculate get_orlov_stats() Instead of looping over all of the block groups in a flex group summing their summary statistics, start tracking used_dirs in struct flex_groups, and use struct flex_groups instead. This should save a bit of CPU for mkdir-heavy workloads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:31:53 -05:00
Theodore Ts'o	9f24e4208f	ext4: Use atomic_t's in struct flex_groups Reduce pressure on the sb_bgl_lock family of locks by using atomic_t's to track the number of free blocks and inodes in each flex_group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:09:10 -05:00
Theodore Ts'o	b713a5ec55	ext4: remove /proc tuning knobs Remove tuning knobs in /proc/fs/ext4/<dev/* since they have been replaced by knobs in sysfs at /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-31 09:11:14 -04:00
Theodore Ts'o	3197ebdb13	ext4: Add sysfs support Add basic sysfs support so that information about the mounted filesystem and various tuning parameters can be accessed via /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-31 09:10:09 -04:00
Theodore Ts'o	afc32f7ee9	ext4: Track lifetime disk writes Add a new superblock value which tracks the lifetime amount of writes to the filesystem. This is useful in estimating the amount of wear on solid state drives (SSD's) caused by writes to the filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 19:39:58 -05:00
Pekka Enberg	705895b611	ext4: allocate ->s_blockgroup_lock separately As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit which makes it a very bad fit for SLAB allocators. The culprit of the wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when NR_CPUS >= 32. To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2 page in the worst case, separately. This shinks down struct ext4_sb_info enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of 32 KB saving 14 KB of memory. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-15 18:07:52 -05:00
Jan Kara	a269eb1829	ext4: Use lowercase names of quota functions Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mingming Cao <cmm@us.ibm.com> CC: linux-ext4@vger.kernel.org	2009-03-26 02:18:36 +01:00
Mingming Cao	60e58e0f30	ext4: quota reservation for delayed allocation Uses quota reservation/claim/release to handle quota properly for delayed allocation in the three steps: 1) quotas are reserved when data being copied to cache when block allocation is defered 2) when new blocks are allocated. reserved quotas are converted to the real allocated quota, 2) over-booked quotas for metadata blocks are released back. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:34 +01:00
Jan Kara	edf7245362	ext4: Remove unnecessary quota functions ext4_dquot_initialize() and ext4_dquot_drop() is no longer needed because of modified quota locking. Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:34 +01:00
Theodore Ts'o	8b1a8ff8b3	ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze() Commit `c4be0c1d` added error checking to ext4_freeze() when calling ext4_commit_super(). Unfortunately the patch failed to remove the original call to ext4_commit_super(), with the net result that when freezing the filesystem, the superblock gets written twice, the first time without error checking. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 00:08:53 -05:00
Jan Kara	9eddacf9e9	Revert "ext4: wait on all pending commits in ext4_sync_fs()" This undoes commit `14ce0cb411`. Since jbd2_journal_start_commit() is now fixed to return 1 when we started a transaction commit, there's some transaction waiting to be committed or there's a transaction already committing, we don't need to call ext4_force_commit() in ext4_sync_fs(). Furthermore ext4_force_commit() can unnecessarily create sync transaction which is expensive so it's worthwhile to remove it when we can. http://bugzilla.kernel.org/show_bug.cgi?id=12224 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: linux-ext4@vger.kernel.org	2009-02-10 06:46:05 -05:00
Takashi Sato	c4be0c1dc4	filesystem freeze: add error handling of write_super_lockfs/unlockfs Currently, ext3 in mainline Linux doesn't have the freeze feature which suspends write requests. So, we cannot take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. In many case, a commercial filesystem (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. If Linux's standard filesystem ext3 has the freeze feature, we can do it without a commercial filesystem. So I have implemented the ioctls of the freeze feature. I think we can take the consistent backup with the following steps. 1. Freeze the filesystem with the freeze ioctl. 2. Separate the replication volume or create the snapshot with the storage device's feature. 3. Unfreeze the filesystem with the unfreeze ioctl. 4. Take the backup from the separated replication volume or the snapshot. This patch: VFS: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they can return an error. Rename write_super_lockfs and unlockfs of the super block operation freeze_fs and unfreeze_fs to avoid a confusion. ext3, ext4, xfs, gfs2, jfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that write_super_lockfs returns an error if needed, and unlockfs always returns 0. reiserfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they always return 0 (success) to keep a current behavior. Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-09 16:54:42 -08:00
Linus Torvalds	2150edc6c5	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...	2009-01-08 17:14:59 -08:00
Theodore Ts'o	83982b6f47	ext4: Remove "extents" mount option This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 14:53:16 -05:00
Theodore Ts'o	abda141892	ext4: Make printk's consistently prefixed with "EXT4-fs: " Previously, some were "ext4: ", and some were "EXT4: "; change them to be consistent with most ext4 printk's, which is to use "EXT4-fs: ". Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 00:20:32 -05:00
Theodore Ts'o	4ec1102813	ext4: Add sanity checks for the superblock before mounting the filesystem This avoids insane superblock configurations that could lead to kernel oops due to null pointer derefences. http://bugzilla.kernel.org/show_bug.cgi?id=12371 Thanks to David Maciejak at Fortinet's FortiGuard Global Security Research Team who discovered this bug independently (but at approximately the same time) as Thiemo Nagel, who submitted the patch. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-06 14:53:26 -05:00
Theodore Ts'o	b3881f74b3	ext4: Add mount option to set kjournald's I/O priority Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>	2009-01-05 22:46:26 -05:00
Jan Kara	a5b5ee3201	ext4: Add default allocation routines for quota structures Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Jan Kara	17bd13b31c	ext4: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Theodore Ts'o	c319106723	ext4: Remove code to create the journal inode This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 11:14:25 -05:00
Toshiyuki Okajima	c39a7f84d7	ext4: provide function to release metadata pages under memory pressure Pages in the page cache belonging to ext4 data files are released via the ext4_releasepage() function specified in the ext4 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a release_metadata function which calls jbd2_journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org	2009-01-05 22:38:48 -05:00
Aneesh Kumar K.V	560671a0d3	ext4: Use high 16 bits of the block group descriptor's free counts fields Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:20:24 -05:00
Aneesh Kumar K.V	5d1b1b3f49	ext4: fix BUG when calling ext4_error with locked block group The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:19:52 -05:00
Jens Axboe	b3a6ffe16b	Get rid of CONFIG_LSF We have two seperate config entries for large devices/files. One is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF that handles large files. This doesn't make a lot of sense, you typically want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording to indicate that it covers both. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Aneesh Kumar K.V	3a06d778df	ext4: sparse fixes * Change EXT4_HAS__FEATURE to return a boolean Add a function prototype for ext4_fiemap() in ext4.h * Make ext4_ext_fiemap_cb() and ext4_xattr_fiemap() be static functions * Add lock annotations to mb_free_blocks() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-22 15:04:59 -05:00
Theodore Ts'o	a9df9a4910	ext4: Make ext4_group_t be an unsigned int Nearly all places in the ext3/4 code which uses "unsigned long" is probably a bug, since on 32-bit systems a ulong a 32-bits, which means we are wasting stack space on 64-bit systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:18:16 -05:00
Theodore Ts'o	30773840c1	ext4: add fsync batch tuning knobs Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-03 20:27:38 -05:00
Theodore Ts'o	fde4d95ad8	ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() This removes annoying blank syslog entries emitted by ext4_error() or ext4_warning(), since these functions add their own newline. Signed-off-by: Nick Warne <nick@ukfsn.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:17:35 -05:00
Frank Mayhar	0390131ba8	ext4: Allow ext4 to run without a journal A few weeks ago I posted a patch for discussion that allowed ext4 to run without a journal. Since that time I've integrated the excellent comments from Andreas and fixed several serious bugs. We're currently running with this patch and generating some performance numbers against both ext2 (with backported reservations code) and ext4 with and without a journal. It just so happens that running without a journal is slightly faster for most everything. We did iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2 which creates 4 threads, each of which create and do reads and writes on a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens to bypass the page cache. Results: ext2 ext4, default ext4, no journal initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s reads 15.2 MB/s 16.9 MB/s 17.2 MB/s re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s So it seems that, so far, this was a useful exercise. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-07 00:06:22 -05:00
Roel Kluin	23475e264c	ext4: Use simple_strtol() instead of simple_strtoul() in ext4_ui_proc_open Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-26 02:23:19 -05:00
Aneesh Kumar K.V	565a9617b2	ext4: avoid ext4_error when mounting a fs with a single bg Remove some completely unneeded code which which caused an ext4_error to be generated when mounting a file system with only a single block group. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-05 21:51:07 -05:00
Theodore Ts'o	14ce0cb411	ext4: wait on all pending commits in ext4_sync_fs() In ext4_sync_fs, we only wait for a commit to finish if we started it, but there may be one already in progress which will not be synced. In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. To ensure all commits are synced, we flush all journal commits now when sync_fs'ing ext4. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org>	2008-11-03 18:10:55 -05:00
Aneesh Kumar K.V	d94e99a64c	ext4: Convert to host order before using the values. Use le16_to_cpu to read the s_reserved_gdt_blocks values from super block. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-04 09:11:26 -05:00
Theodore Ts'o	f99b25897a	ext4: Add support for non-native signed/unsigned htree hash algorithms The original ext3 hash algorithms assumed that variables of type char were signed, as God and K&R intended. Unfortunately, this assumption is not true on some architectures. Userspace support for marking filesystems with non-native signed/unsigned chars was added two years ago, but the kernel-side support was never added (until now). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-28 13:21:44 -04:00
Hidehiro Kawai	ef2cabf7c6	ext4: fix a bug accessing freed memory in ext4_abort Vegard Nossum reported a bug which accesses freed memory (found via kmemcheck). When journal has been aborted, ext4_put_super() calls ext4_abort() after freeing the journal_t object, and then ext4_abort() accesses it. This patch fix it. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-27 22:53:05 -04:00
Linus Torvalds	2248485640	Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits) [PATCH] kill the rest of struct file propagation in block ioctls [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET [PATCH] get rid of blkdev_locked_ioctl() [PATCH] get rid of blkdev_driver_ioctl() [PATCH] sanitize blkdev_get() and friends [PATCH] remember mode of reiserfs journal [PATCH] propagate mode through swsusp_close() [PATCH] propagate mode through open_bdev_excl/close_bdev_excl [PATCH] pass fmode_t to blkdev_put() [PATCH] kill the unused bsize on the send side of /dev/loop [PATCH] trim file propagation in block/compat_ioctl.c [PATCH] end of methods switch: remove the old ones [PATCH] switch sr [PATCH] switch sd [PATCH] switch ide-scsi [PATCH] switch tape_block [PATCH] switch dcssblk [PATCH] switch dasd [PATCH] switch mtd_blkdevs [PATCH] switch mmc ...	2008-10-23 10:23:07 -07:00
Al Viro	8264613def	[PATCH] switch quota_on-related stuff to kern_path() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 05:12:44 -04:00
Al Viro	9a1c354276	[PATCH] pass fmode_t to blkdev_put() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:58 -04:00
Theodore Ts'o	f287a1a561	ext4: Remove automatic enabling of the HUGE_FILE feature flag If the HUGE_FILE feature flag is not set, don't allow the creation of large files, instead of automatically enabling the feature flag. Recent versions of mke2fs will set the HUGE_FILE flag automatically anyway for ext4 filesystems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-16 22:50:48 -04:00
Theodore Ts'o	01436ef2e4	ext4: Remove unused mount options: nomballoc, mballoc, nocheck These mount options don't actually do anything any more, so remove them. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-17 07:22:35 -04:00
Eric Sesterhenn	5128273a32	ext4: Add missing newlines to printk messages There are some newlines missing in ext4_check_descriptors, which cause the printk level to be printed out when the next printk call is made: [ 778.847265] EXT4-fs: ext4_check_descriptors: Block bitmap for group 0 not in group (block 1509949442)!<3>EXT4-fs: group descriptors corrupted! [ 802.646630] EXT4-fs: ext4_check_descriptors: Inode bitmap for group 0 not in group (block 9043971)!<3>EXT4-fs: group descriptors corrupted! Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-17 09:16:19 -04:00
Aneesh Kumar K.V	c2774d84fd	ext4: Do mballoc init before doing filesystem recovery During filesystem recovery we may be doing a truncate which expects some of the mballoc data structures to be initialized. So do ext4_mb_init before recovery. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 20:07:20 -04:00
Steven Whitehouse	a447c09324	vfs: Use const for kernel parser table This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 10:10:37 -07:00
Alexander Beregalov	3244fcb1ae	ext4: fix build failure without procfs fs/ext4/super.c: In function 'ext4_fill_super': fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use in this function) fs/ext4/super.c:2226: error: (Each undeclared identifier is reported only once fs/ext4/super.c:2226: error: for each function it appears in.) Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-12 17:27:49 -04:00
Hidehiro Kawai	5bf5683a33	ext4: add an option to control error handling on file data If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount an ext4 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Here is the corresponding patch of the ext3 version: http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374 Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 22:12:43 -04:00
Hidehiro Kawai	7ffe1ea894	ext4: add checks for errors from jbd2 If the journal has aborted due to a checkpointing failure, we have to keep the contents of the journal space. Otherwise, the filesystem will lose uncheckpointed metadata completely and become inconsistent. To avoid this, we need to keep needs_recovery flag if checkpoint has failed. With this patch, ext4_put_super() detects a checkpointing failure from the return value of journal_destroy(), then it invokes ext4_abort() to make the filesystem read only and keep needs_recovery flag. Errors from jbd2_journal_flush() are also handled by this patch in some places. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 20:29:21 -04:00
Theodore Ts'o	03010a3350	ext4: Rename ext4dev to ext4 The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 20:02:48 -04:00

... 2 3 4 5 6 ...

478 Commits