Abstract Requirements
Persistent Storage: Store large data, survive process termination, concurrent access
- User view: Logical organization of info
- OS view: Manage disk, enforce permissions, sharing between process, people, machines
Abstract File Operations
- Creation: Find space in fs, add entry to directory (map path to addr and meta)
- Writing: Overwrite or append to a file
- Reading: File as a stream (sequential / random direct access)
- Repositioning: Change the next read/write position
- Deleting: Remove a file
- Truncating: Erase parts of a file (while keeping meta)
Directories: Logical tree structure
- Naming interface that separate logical organization from physical placement
- Store file meta (e.g. size, permissions, dates, location)
- Directory data is typically stored as files
Links
- Hard Link: In the target dir, create a new dir entry pointing to the same physical addr as src
(Need to store link count for deletion)
- Deleting the src file does not affect the target link
- Symbolic Link: In the target dir, create a link-type entry that points to the src path
(Path can be invalid, only checked on read)
- Deleting the src file will make the target link unavailable
Implementations
Directory FIle Structure
- Lists: List of file meta - O(n) find
- Hash Table: dict[file path, meta] - O(1) find, extra space required for hash table
Allocation Strategies
Hardware Definitions
- Logical blocks (managed by fs) are typically 4 KiB
- Pointer is typically 4iB
- Minimum transfer size is 1 block
- Sector size: Minimum physical hardware block size (typically 512 B)
Contiguous Allocation: For each file, point to start block and length (like malloc)
- Problem: External fragmentation/red
Linked Allocation: (FAT32) Point to a starting block, and each block point to the next block
- Problem: O(n) for indexed / random read
- Optimized (a little bit) using a tail pointer
- Optimized again by storing the linked list of pointers separate from data blocks
Indexed Structure: (Unix ext2 inode)
- All meta are stored in an inode (typically 128 B), uniquely identified by inode number
- Each inode contains 15 block pointers:
- 12 direct block pointers (first 12 blocks) - 48 KiB
- 13th: Indirect block pointer (point to a block table that’s stored in a block) - 4 MiB
- 14th: Double indirect block pointer (points to an indirect BT) - 4 GiB
- 15th: Triple indirect block pointer (points to a double indirect BT) - 4 TiB
Extent-based Allocation: TODO 11-22 1:00
File System Structure / Formatting
Superblock: Well-known location that store metadata about the overall file system.
- Identify fs type, size, location of other meta
- Typically duplicated for reliability
Free Space Bitmap: Track which blocks are free
- Array DAT, indexed by block number, one bit to store if it’s used
- Use multiple blocks for larger disks
Inode: FS metadata object
- mode (f/d, rwx)
- uid (owner), gid (group)
- size
- atime, ctime, mtime, dtime
- link count, sector count
- flags (compressed, immutable, append-only)
- block pointers
Inode Table: Array DAT for inodes, allocated on mkfs
- Size cannot be easily modified
Free Inode Bitmap: Track which inode is free
Root Directory: Stored in the first data block, inode 0

Directory Data Block: Map file names under a directory to their inode numbers
File System Operations
Open: Cache the file inode in an “open” file table, return a fd (file descriptor)
Optimization
ext2 Long Seek Problems
- Fragmentation: After a long time, free space and new files will be scattered by deletes
- Inodes and actual content are placed far from each other
Berkeley Fast FS: A device-aware fs
- Cylinder Group: Tracks that are the same distance from the center
- Allocate file location based on cylinder groups
- Each cylinder group has its own inode table, superblock, bitmap, etc.
- Free space across cylinders (keep 10% space reserved)
- Fragments: Sub-block allocation to reduce internal fragmentation (tail packing)
- Problem: Modern devices are not very open about underlying structure.
NTFS
Master File Table (MFT): Like inode table, sequence of 1kb records
- If the file is small, it’s stored inside the MFT header


Extent based allocation: Allocate files in consecutive blocks