Crash Consistency: Ensure file system is recoverable after crash
- One fs operation is multiple non-atomic actions
- Power failure can occur at any time
- Disk driver can reorder requests
Consistent State: Failure atomicity. Either look like nothing happened, or operations completed.
Approaches
- UPS battery for a clean shutdown
- fsck: Do nothing, try to repair afterwards
- Detect crash using a clean-unmount flag
- Scan entire fs for consistency rules
- Problem: Very slow, cannot fix data corruption
- Journaling: Treat fs operations as transactions (allow rollback / redo)/green
- Record writes as they happen (write data → write meta journal → commit journal)
- Know exactly what to do after a crash
- ext3: Store journal data as regular large file (for backward compatibility)
Journaling Issues
- Stale Metadata: File data overwritten during meta playback if they occupy the same inode.
(mkdir → rmdir → make file that overwrites the dir → crash)
- Solution: Revoke record in journal
- Journal Corruption: Bit flip in journal data. Ext4 solution: checksums.
FSCK
- Superblocks: Restore from another dup if corrupted
- Free blocks: Scan inodes, build in-memory bitmap, compare with fs bitmap
- Inode state: Check inode fields for corruption (if corrupted, remove inode)
- Inode links: Verify link count by traversing directory tree (if orphaned, move to lost+found)
- Duplicates: Check if two inodes refer to the same block (make copy of the block)
- Bad blocks: Bad pointers outside of valid range (remove)
- Directory checks: Make sure . and .. exist & directories are linked only once (prevent cycle)