Introduction to FAT



 

Concepts

FAT file system was first implemented in the year 197? by a student Bill Gates. Then it grew into the most used file system among IBM PC computers. I will explain the FAT idea in this paper. FAT implementations will be described in later documents.

In FAT file system, storage device is of a fixed size. It is logically divided into chunks of data of equal size called clusters. Any file takes up a natural number of clusters. Disk map, called File Allocation Table (FAT), is located at a known position on disk. FAT is an array of the entries of equal size. Number of entries is equal to number of clusters on disk, so there exists a unique relationship between FAT entries and clusters. FAT entry is one of the following:
 
Value Meaning
0 Free Cluster
-8..-1 End of File
-9 Bad Cluster
-10h..-0ah Reserved Cluster
Other Next Cluster In Chain
 

Free clusters are those free for allocation. The cluster marked as EOF (End Of File) is the last cluster for the file chain. Although several values may serve as EOF, only -1 is used by DOS. Bad clusters are physically damaged clusters. They cannot contain any data. Reserved clusters are reserved for DOS file system usage. They should not be used by anyone but their creator. In every FAT, the very first entry is reserved

If FAT entry is none of the above values, it should contain the number of the next cluster for this file. As you see, FAT is just a variation of a linked list.

Another FAT file system concept is directory, or folder. Directory is a file that contains file descriptions. File description consists of file name, date, flags, size, and the first cluster in the file's cluster chain. Directory itself is stored just like a normal file. It also has a FAT chain. Directory entry may point to a nested directory, which leads to directory hierarchy. But where does this all start? Most FAT variations expect the first-level, or root, directory at a certain fixed location on disk.
 

Example

To make things clear, let us read 512 bytes of the file "C:\DOC\INTEL\INTEL386.TXT" starting from offset 2048 relative to the beginning of this file. Filename should first be split into meaningful tokens. They are "C:", "DOC", "INTEL", and "INTEL386.TXT" in this order. The first token tells us to select the first primary partition on the first hard drive for reading. "DOC" is name of the directory that is in the root directory of "C:". Since we know where the root directory is on disk (and root directory size too), let us look through it and find the entry named "DOC". As I said earlier, one of the fields of this entry is FirstCluster. Now we know where the FAT chain for the directory "DOC" starts.
  Hopefully, we have found the subdirectory "INTEL". We know the first cluster of this subdirectory, so let us look for the entry "INTEL386.TXT" in the same manner as we have just looked for subdirectory.

Now we should know FirstCluster of our file. Let's imagine that cluster size for our disk is 1024 bytes. Thus, to reach offset 2048 we must skip two 1024-byte blocks, or two clusters. You know how to do it, don't you? Hint: look through the FAT chain. Finally, let's read our 512 bytes.

The example was intentionally oversimplified, but I hope, it showed the idea. As I said, implementation will be discussed in later documents.

Limitations of FAT

If you carefully followed the example, you noticed that I cheated. I did not check all possible error conditions, and some errors in FAT might have caused troubles, up to infinite loops. Let us take a closer look at FAT limitations.

First of all, FAT system is terribly inefficient for large files. You can imagine how many FAT entries I will have to scan to access some file at offset 96 MB. Even if I have all FAT cached, time to look through the chain in memory is not acceptable. This situation can be somewhat corrected by keeping current position in FAT, but this will only help for sequential access patterns.

Secondly, FAT may create high file fragmentation. Because today's storage devices are slow to move heads across the disk surface, space for files should be allocated carefully. This is especially seen under multitasking systems, when many files are written to at the same time. So, when the file is being expanded, it is wise to allocate a number of clusters for it in advance.

Thirdly, FAT clustering eats off space. If cluster size is 8 KB, it is reasonable to expect that each file will take up 4 KB more of disk space than it actually needs. If there are 10,000 files on disk, 40 MB of space is waisted. However, FAT32 increased the number of available clusters and reasonably reduced cluster size.

Fourthy, FAT is very sensitive to errors. It is almost impossible to restore the file whose chain was corrupted. Consider the following examples:

Correcting any of these problems will likely corrupt the file, but leaving them uncorrected will sooner or later hang the system. FAT provides no ways for insuring data integrity, compression, encryption, or any other nice perversions. With FAT you cannot even detect that a file was corrupted - forget about correction. Besides the problems that are related to the FAT concept itself, there exists a large group of problems related to specific FAT implementations. For example, earlier FAT systems expected much vital data at fixed disk addresses, which made disks unusable because of corruption of just a few sectors. FAT32 finally made root directory floating and gave a choice of selecting the working copy of FAT, but File Allocation Tables are still at fixed physical addresses. Besides, directory format of FAT, especially of VFAT, is a mess. This mess is even hard to handle normally - any error handling becomes nightmare.

Conclusion

Next documents will describe FAT implementations in detail. I had a choice of either describing them separately from each other, or merging together. I will use the later method for several reasons. First of all, there are many similarities between different types of FAT. Most structures are transferred without any modifications or with minor extensions to next versions. Secondly, implementing all types of FAT at once is probably more productive than implementing them separately. And thirdly, I will try to build a base for integrating different file systems under common protocols. This base will, hopefully, save you much time for coding in future.



Author:  Alex Verstak  3/11/1998