Read Caching Implementation

**Figure:** Read caching flowchart. Heavy boxes are decision points.
$\includegraphics[width=0.9\linewidth]{read-flowchart.eps}$

Note that we cannot avoid all server traffic even when reading unchanged files. The stateless nature of the NFS protocol requires us to confirm with the server that the file has not changed since we last read it. However, when reading large files, the majority of the disk traffic is due to many read RPCs, all of which are eliminated.⁹

All cached file data exists only in disk-based structures. Each remote file from which a page is cached has a corresponding local cache file. The name of the local file is the concatenation of the server name, the server's superblock number, and the inode number of the file being cached. For example a local cache file called holden,2-23 stores page data from the remote file with inode number 23 on the server named ``holden'' in the partition whose superblock is on device 2. These files are all stored in a single-level directory structure (thus precluding disconnected operation, see section

, on page

). We exploit ext2fs's ability to efficiently store a sparse file (i.e., one where only a small portion of the total blocks have had any data written to them), and store only those data blocks that the client has actually read (not whole files). Also note that after a read is satisfied out of the local disk cache, we send a setattr to the server to update its last-accessed time (atime) if the prior access was more than 30 seconds ago. Since the disk cache can persists for months or longer, it is important that the access times are accurate.¹⁰

**Figure:** Kernel data structures. Bold text denotes new data members, heavy boxes denote added data structures.
$\includegraphics[width=0.9\linewidth]{data-structures.eps}$

Obviously, files may not be read in their entirety (e.g., executables, which are paged-in on demand). Thus, we must maintain in-kernel data structures to track which pages of each inode have been cached to local disk. We use a simple packed binary array representation, and also include the number of total pages, and the number valid, along with some information needed by nfsfillind to finishing reading a file after the NFS inode may have left memory. See Figure

for details. When we recognize that a file has had its last page written to the local cache (done in constant time with the count of valid pages, not with the bitmap), we mark the cache file as complete using the u+x mode attribute bit, and can then deallocate the bitmap of valid pages.

In order to support the relationship between an inode for a file on the server and the inode of the file on the local disk that is caching the remote file, we made two significant changes to the kernel's data structures: 1) all inode's now have a pointer to a structure about the inode they are caching; and 2) all inode's can specify a clear_inode_hook to be called when that inode is chosen to be reused (``putting'' an inode to a 0 count does not remove it from memory). See Figure

for details.

**Figure:** The life-cycle of an inode, and its corresponding cache file's *inode*. From A to B a remote file is first accessed, thus assigning an inode structure to that file. From B to C the first page is read from that remote file, so another *inode* is assigned to the corresponding local cache file; the NFS *inode* keeps a pointer to the cache *inode* so read pages can be written to the local disk. We stay in state C while the file is open, and then move to either D or F when the closing of the remote file results in *put*ting of the NFS *inode*. When the remote file is closed, we choose to discard the cached pages if we've only read less than 10% of the file--that case corresponds to F, where we *put* the cache *inode* back on the free list, and can reuse both *inode*s (back to state A). If we read more that 10% of the file, we move instead from C to D. While in D, the `nfsfillind` reads subsequent pages of the file in the background until the local cache contains all of the remote file's data pages. When the cache file is complete, we progress to E, where we have *put* the cache *inode* back on the free list, and can then return to A after calling the appropriate inode-clearing hooks. States G and H represent a slight complication of the filling-in procedure when the *inode* that was used for the remote file needs to be reused.
$\includegraphics[width=0.85\linewidth]{inode-lifetime.eps}$

Because files may change on the server (due to either our machine or another client on the system), we must also invalidate cache entries occasionally. When we notice that the NFS inode that we are caching has a new modification time or file size, we mark our local cache as invalid by turning off its u+x bit (if it was complete) and updating the in-kernel data structures (e.g., removing the bitmap, and resetting the count of valid pages to zero). If the cache file's inode is later cleared from memory without having read sufficiently many pages to justify filling in, the cache file is unlinked, and the space is reused.

Ideally we would be able to update the disk cache for local writes. However, the NFS protocol has no way of letting a host know that it is the only writer to the file.¹¹ After changing only a single byte of a 4MB file that exists in our local disk cache, all the NFS client can subsequently tell when it reads that file again is that the modification time (mtime) has changed. Since we cannot conclusively confirm that it was only our client that affected the change, we must invalidate all of the pages we had cached locally.¹²