This project is sponsored by the Innovative Software AG!



Definitions and Objectives

Introduction


Definition of Terms
  • internal-fragmentation
  • external-fragmentation
  • standard formulation
  • fragmentationX
  • fragmented path


  • Link to my old page


    Copyright and Contact



    Tool: Fibmap

    Description
  • download



  • Tool: Read

    Description
  • download



  • Tool: Agesystem

    Description
  • download


  • Gauge Measurements

  • Test System

  • 500 MB Partition

  • file size distribution
  • results for Ext2FS
  • results for ReiserFS
  • results for JFS
  • results for XFS


  • 1GB Partition
  • file size distribution
  • all results


  • 4 GB Partition
  • file size distribution
  • all results


  • Gauge Verifications

  • Test System

  • 500 MB Partition

  • 1 GB Partition

  • 4 GB Partition


  • TODO



    Gauge Performance

    Description
  • download


  • Measurements on 2.4.5

  • Test System

  • Output Explanation

  • variable file size
  • create fixed size
  • append variable size


  • Measurements on 2.4.8

  • Test System

  • Output Explanation

  • variable file size



  • Append Tests

    Description
  • download


  • Measurements on 2.4.8

  • Test System

  • Output Explanation

  • fixed append size
  • variable append size
  • low statistics


  • With Preallocation

  • Output Explanation

  • fixed append size


  • Measurements on 2.4.10
  • Test System

  • Output Explanation

  • fixed append size


  • With Preallocation

  • Output Explanation

  • fixed append size


  • Tool: Agesystem3

    Description
  • download



  • Aging Tests

    Description
  • download


  • Measurements on 2.4.8

  • Test System

  • Explanation of Terms

  • generic file size dist


  • Measurements on 2.4.10
  • Test System

  • Explanation of Terms

  • generic file size dist



  • External Links

    Filesystem Homepages
  • ReiserFS
  • Ext2FS
  • Ext3FS
  • XFS
  • JFS


  • Benchmarks & Results
  • ReiserFS benchs
  • Denis'+Kresimir's site
  • Randy's tests
  • Yves' YAFB tests


  • Literature
  • Smith's site
  • My master thesis




  • Additional Notes

    Copyright 2001 by
    Constantin Loizides,
    loizides AT informatik DOT uni-frankfurt DOT de

    Last changes:
    Do Feb  5 09:40:01 CET 2004
    

    Journaling-Filesystem Fragmentation Project

    Tool: Agesystem




    Description

    In order to measure how a filesystem behaves over time and workload I decided first to measure how the write performance depends on the free space that is left in the partition. For reasons you will understand later I call that measurements ''gauge measurements''.

    To do the measurements I have written a small C-program called agesystem. It is a tool that starting from a fresh filesystem constantly creates new files completely filling it while measuring the throughput and the CPU usage. The filling of the filesystem proceeds as follows:
    First it generates a directory structure which subsequently will be used for the creation of new files. The structure is chosen to be quite flat, eg. most for most of the tests 100 directories, each having 50 directories. That should be fair enough for all different filesystems, having not more than 100 files per directory at the end of the simulation. The directories are then homogeneously filled by constantly creating new files. For each file to be created the program generates a file size according to a given file size distribution. For all I/O operations agesystem uses the standard system calls.

    Download

    You may download agesystem and try it yourself. After downloading the package create a directory where to untargz it. You probably want to change first some paths in the Makefile. Maybe you also have to download the GNU Scientific Library. Then either use ''agesystem.sh'' or type ''make agesystem'' followed by ''agesystem -h'' to get additional help.
    The ''timing result'' output tells you for every 5% of used blocks in the partition the time needed, cpu needed and number of bytes written. If you send the output to me I can make a histogram out of it.
    There is a define ''#define GAUGE'' which actually prevents agesystem from really aging the system.

    Usage of agesystem v0.3
             -b blksize                     blocksize to use for buffered i/o
             -c dir_counter                 number of subdirectories to create and use for aging
             -d dirname                     change to directory "dirname"
             -f                             use fixed length of bytes, given in blocksize
             -l minimum_file_size_in_bytes  set minimum file size to minimum_file_size_in_bytes""
             -n number_of_files             maximum number of files to create
             -m mean_file_size_in_bytes     set mean file size to minimum_file_size_in_bytes""
             -o access_mode                 set access mode to  "access_mode" 
                (eg. cw=65, cws=4161, cwa=1089,cwas=5185, cwt=577,cwts=4673)
             -h,-?                          this text
             -s file_no                     start with filenumber "file_no"
             -t                             test input and exit
             -u max_fs_usage                max filesystem usage to reach
             -v                             be verbose
             -y seed                        set random generator seed
             -z                             sync with unmount
    
    You may play around with it, but notice that I silently update agesystem.



    Gauge Measurements

    Test System

    The gauge measurements for writing are taken on the following test system

    • Hardware: AMD Duron 700 Mhz, 128 MB, 40 GB EIDE-harddisk for the system with the tests taken on
      • Adaptec AHA-2940U2/W host adapter
      • 9 GB IBM DNES-309170W SCSI-harddisk
    • Software: Redhat 7.1 (gcc 2.96 and 2.95-3) kernel 2.4.5 with the following official patches for the filesystem that were not shipped with it:
      • XFS version 1.0 (06112001) for 2.4.5
      • JFS version 1.0 for 2.4.5
      • Ext3 version 0.8 for 2.4.5
    using different sizes for the test partition of 500 MB, 1GB and 4 GB. Unfortunately the system was not completely untouched during the measurements (eg. X and cron jobs were running, too).



    Results for the 500 MB partition

    At first I used a partition of 550 MB which means creating between 120.000 and 250.000 files dependent of the filesystem type of the partition, block size and mount options used (eg. -notail).

    The file size distribution looks as shown for the 550 MB case. It is created using three types of files:

    1. with 89 % probability choose from normal distribution (tailed) with minimum file size of 0 and average of 2^10 Byte,
    2. with 10 % probability choose from normal distribution (tailed) with minimum file size of 2^10 and average of 2^14 Byte and
    3. with 1 % probability choose from normal distribution (tailed) with minimum file size of 2^14 and average of 2^17 Byte.

    • Ext2's write performance is fairly constant over the used blocks space. For a block size of 1024 I repeated the measurement as it is peculiar, that the performance improves when the partition is almost full! Anyone any idea, why? Please let me know!!! (corresponding Ext2's CPU usage)
      Here and in the following:
      "100x50" always means: a directory hierarchy of 100 dirs in each again 50
      "without-deletion" always means: "gauging" as I introduced it above
    • ReiserFS's write performance quickly drops down, probably because of the overhead in adjusting the tree. Omitting the fragments by mounting with "-notail" improves the performance (absolutely and in shape), omitting writing the log with "-nolog" does not always improve the performance which might be due to low statistics and other processes on the test machine, but what is more important in my case, it does not change the shape of the curve. Another good thing, the layout of the directories (completely flat 1000 directories and the 100x50 version mentioned above) have no influence on the shape. For people that judge performance and not the shape (!!!) between the file systems I want to add, that ReiserFS without "-notail" has the lowest internal fragmentations of 10 percent, compared to all other file systems (except ext2 with 1024 block size) which have an internal fragmentation of about 50 percent!!! (corresponding ReiserFS's CPU usage)
    • JFS performs quite well for the first half of the plot and then breaks down at 50 percent usage. I did a second measurement restricting the log to 40MB and to 10 MB, because the performance loss could be because of the default log size of 40 percent JFS uses when not given a different option to mkfs.jfs. The difference is almost not visible. I am not aware of the details of JFS but -at first glance, the shape of the curve looks really bad. (corresponding JFS's CPU usage)
    • XFS's performance looks pretty much the same as JFS, it also breaks down at 50 percent usage. To be sure I measured again with a 10 MB log size (notice ReiserFS's log size is per default 32 MB). It seems to make a small difference. In a third measurement I also used a completely different partition at the beginning of the (SCSI) disk (as opposed to at the end as usual) with the same number of cylinders of course, which also did not make any significant changes. Though it might be possible, that both -JFS and XFS- are designed for larger partitions but why is XFS then called ''scalable''? Look beyond for tests on bigger partition sizes and find the same problem. (corresponding XFS's CPU usage)

    Results for the 1 GB partition

    For the gauge measurements on a partition of 1 GB and 4 GB I used bigger files on average according to a creation rule as