This project is sponsored by the Innovative Software AG!



Definitions and Objectives

Introduction


Definition of Terms
  • internal-fragmentation
  • external-fragmentation
  • standard formulation
  • fragmentationX
  • fragmented path


  • Link to my old page


    Copyright and Contact



    Tool: Fibmap

    Description
  • download



  • Tool: Read

    Description
  • download



  • Tool: Agesystem

    Description
  • download


  • Gauge Measurements

  • Test System

  • 500 MB Partition

  • file size distribution
  • results for Ext2FS
  • results for ReiserFS
  • results for JFS
  • results for XFS


  • 1GB Partition
  • file size distribution
  • all results


  • 4 GB Partition
  • file size distribution
  • all results


  • Gauge Verifications

  • Test System

  • 500 MB Partition

  • 1 GB Partition

  • 4 GB Partition


  • TODO



    Gauge Performance

    Description
  • download


  • Measurements on 2.4.5

  • Test System

  • Output Explanation

  • variable file size
  • create fixed size
  • append variable size


  • Measurements on 2.4.8

  • Test System

  • Output Explanation

  • variable file size



  • Append Tests

    Description
  • download


  • Measurements on 2.4.8

  • Test System

  • Output Explanation

  • fixed append size
  • variable append size
  • low statistics


  • With Preallocation

  • Output Explanation

  • fixed append size


  • Measurements on 2.4.10
  • Test System

  • Output Explanation

  • fixed append size


  • With Preallocation

  • Output Explanation

  • fixed append size


  • Tool: Agesystem3

    Description
  • download



  • Aging Tests

    Description
  • download


  • Measurements on 2.4.8

  • Test System

  • Explanation of Terms

  • generic file size dist


  • Measurements on 2.4.10
  • Test System

  • Explanation of Terms

  • generic file size dist



  • External Links

    Filesystem Homepages
  • ReiserFS
  • Ext2FS
  • Ext3FS
  • XFS
  • JFS


  • Benchmarks & Results
  • ReiserFS benchs
  • Denis'+Kresimir's site
  • Randy's tests
  • Yves' YAFB tests


  • Literature
  • Smith's site
  • My master thesis




  • Additional Notes

    Copyright 2001 by
    Constantin Loizides,
    loizides AT informatik DOT uni-frankfurt DOT de

    Last changes:
    Do Feb  5 09:40:02 CET 2004
    

    Journaling-Filesystem Fragmentation Project

    Gauge Performance




    Description

    To measure the gauge performance -this time for reading and writing-- of the file systems for my test system and test program (eg. agesystem) I did arrange the following setup as a shell script
    In this way I find out about the performance of an fresh system filled up until a certain percentage.

    Download

    You may download the necessary files and try it yourself. After downloading the package create a directory where to untargz it. You probably need to adjust some paths, then type ''make agesystem; make fibmap; make read'' and look for the script ''rwtest.sh'' that shows how to setup the options described above. If you like to get some graphs out of the resulting output, send the output files to me (targzed, please!).



    Measurements on 2.4.5

    Test System

    The hardware is the same as in the agesystem section.

    • AMD Duron 650 Mhz, 128 MB, 40 GB EIDE-harddisk for the system with the tests taken on
      • Adaptec AHA-2940U2/W host adapter
      • 9 GB IBM DNES-309170W SCSI-harddisk
    The file system tested are ReiserFS with and without tails, XFS and JFS on 2.4.5
    • Software: SuSE 7.2 (gcc 2.95-3) kernel 2.4.5 with the following official patches for the filesystem that were not shipped with it:
      • XFS version 1.0 (06112001) for 2.4.5
      • JFS version 1.0 for 2.4.5



    Graphical Output Explanation

    The shown graphical output is divided in 6 parts and reads as follows:
    • The first part consists of the legend and some remarks describing the graph as a total.
    • The second part shows the absolute write performance in KB per second from the agesystem output file for 25%, 50%, 75% and 100% usage.
    • The third part shows the read result for 25%, 50%, 75% and 100% usage. For each file system and usage you see two small bars. The first is the absolute read performance in KB per second measured with the timer starting just before the read system call and stopping right after it while recursively walking through all files in all directories. The second small bar should be lower than the first. It includes the time needed to traverse all files in all directories plus reading them eg. the timer is started before all file reads and stopped after the last file is read. At best for a read test the bars should not differ too much with the second bar smaller than the first.
    • The forth part shows some meta data performance. The first small bar is the number of files stated (and read) during the recursive walk of read divided by the above time difference, eg. total run time of read minus accumulated read time of read which mainly is the time needed to open and read the directories plus stating the files in them. The second bar is the number of files found while fibmap recursively walks through all directories stating (and fibmaping) all files but not reading them divided by its running time. Again, the two bars should be of the same order in magnitude with the second less high than the first. At least I thought that before the measurements...
    • The fifth part shows the internal fragmentation (first small bar) and the external fragmentation (second small bar) reported by fibmap.
    • The sixth part shows the fragmentation path result of fibmap.


    Variable File Size Results for different directory structures

    In the
    agesystem section was shown a dependence between the write performance and the chosen directory structure. To test this again, I used the following setup:
    • Partition Size: 1 GB
    • File systems: Reiser, Reiser with -o notail, XFS and JFS
    • File size distribution: as in the agesystem section
    • Directory structures: 100x1, 100x10, 100x50 and 100x100

    The resulting graphs per file system comparing the different directory structures:
    • ReiserFS
      The performance ordering is clear, 100x1 is best followed by 100x10, 100x50 and 100x100 is worst. Only the write performance behaves strange up to 50 % usage. Note, that there is no internal fragmentation.
    • Reiser FS with notail
      The overall picture is the same as for Reiser. But the write performance behaves even stranger: 100x100 is better than 100x50 that in turn is better 100x10 with the big jump to 100x1 which outperforms the other structures.
    • XFS
      The picture is again the same as for Reiser. Note that XFS shows no external fragmentation.
    • JFS
      Again the picture is already known. Note, that the 25 value for fibs is in all cases very high but breaks down for the fuller case.
    Without comments, the resulting graphs as a comparison between the file systems: 100x1 100x10 100x50 100x100


    Fixed Size Results for different sizes

    In this case I am creating files with a fixed size until the partition is filled. This will give a feeling of the performance of the different file size magnitudes:
    • Partition Size: 1 GB
    • File systems: Reiser, Reiser with -o notail, XFS and JFS
    • Directory structure: 100x10
    • File sizes used: 4 KB, 64 KB and 1024 KB
    The resulting graphs per file system show the expected result, that the performance is best for less and larger files.
    Without comments, the resulting graphs as a comparison between the file systems: 4 KB size 64 KB size 1 MB size


    Append Variable Size Results for different number of files

    In this case I am creating and then appending a maximum number of files. If the file does not exist in the first place it will be created, if it exists it will be appended. In both case the number of bytes are taken from the same distribution.

    As expected, in all cases the external fragmentation becomes less with growing number of files. But the read and sometimes also the write performance do not seem to be correlated to that (abstract) number. It rather seems that the total number of files and therefore the length of each file together with the various caches (on disc and buffer/page) determines the overall performance. The resulting graphs per file system comparing the different number of files: Without comments, the resulting graphs as a comparison between the file systems: 10000 files 20000 files 30000 files



    Measurements on 2.4.8-ac1

    Improvements or at least changes to above measurements:
    • Newer kernel with a working VM (2.4.8ac1)
    • Restrict memory to 32 MB to reduce cache effects (append="mem=32" to lilo.conf)
    • Swap off swap to prohibit swapping during measurements
    • Sync for sure using umount instead of sync (look for clsync in the code)
    • Cache flushes due to using umount between different parts of the measurement
    • Add of a random read test where randomly chosen files are sequentially read

    Test System

    The hardware is the same as above.

    • AMD Duron 650 Mhz, 128 MB, 40 GB EIDE-harddisk for the system with the tests taken on
      • Adaptec AHA-2940U2/W host adapter
      • 9 GB IBM DNES-309170W SCSI-harddisk
    Due to lack of simulation time, problems with the Linux 2.4 VM and no time to patch each day another kernel the file system tested are ReiserFS with and without tails and Ext2 on Alan's 2.4.8-ac1 kernel.
    • Software: SuSE 7.2 (gcc 2.95-3) kernel 2.4.8-ac1 restricted to 32 MB memory without swap.



    Graphical Output Explanation

    The shown graphical output differs slightly from above. It is again divided in 6 parts and reads as follows:
    • The first part consists of the legend and some remarks describing the graph as a total.
    • The second part shows the absolute write performance in KB per second from the agesystem output file for 25%, 50%, 75% and 100% usage.
    • The third part shows the read result for the different usage size. For each file system and usage you see two small bars. The first is the absolute read performance in KB per second measured with the timer starting just before the read system call and stopping right after it while recursively walking through all files in all directories. The second small bar should be lower than the first. It includes the time needed to traverse all files in all directories plus reading them eg. the timer is started before all file reads and stopped after the last file is read. At best for a read test the bars should not differ too much with the second bar smaller than the first.
    • The fourth part measures the random read performance when the files are randomly chosen and then sequentially read. The first run of read stores the file names and paths in memory, so that this time no stat or readdir overhead is needed. Again there are the two columns for the random read performance which should now be ''more equal'' because there is a smaller difference between the total accumulated read time and the total running time of the random read test than when recursively walking through the directories stating each file.
    • The fifth part shows some meta data performance. The first small bar is the number of files stated (and read) during the recursive walk of read divided by the time difference mentioned above, eg. total run time of read minus accumulated read time of read which mainly is the time needed to open and read the directories plus stating the files in them. The second bar is the number of files found while fibmap recursively walks through all directories stating (and fibmaping) all files but not reading them divided by its running time. Again, the two bars should be of the same order in magnitude with the second less high than the first. At least I thought that before the measurements...
    • The sixth part shows the internal fragmentation (first small bar) and the external fragmentation (second small bar) and the fragmentation path results (third small bar) reported by fibmap. If the maximum frag path shown is greater than one (its minimum value) the log of it is plotted which is marked on the left (it says: ''PatL'' instead of ''Path'').


    Variable File Size Results for different directory structures

    To test the directory structure dependence, I used the following setup:
    The picture is now quite clear:
    • The write performance is influenced by two factors:
      The first is intrinsic to the file system: the costs to handle a certain number of files per directory.
      The second is determined by the way agesystem chooses the directories to work at: it proceeds from directory to directory in a defined way, writing in each directory only one file. That is the dcache determines a certain amount of the write performance. A test to demonstrate this could be to write a certain amount of files to one directory at once before proceeding to the next directory.
      Both effects are enhanced by the file size distribution, which in the tested cases strongly peaks for small files.
    • The read performance shows the same Dependance. In some cases mainly for ext2 one can see also the effect of having to handle a big number of files per directory (the two bars differ!).
    • As a major difference the random read performance saturates for all setups which is due to expensive disc seeking times which mainly determine the performance for the chosen small file size distribution.
    The resulting graphs per file system comparing the different directory structures in total:

    Reiser Reiser -o notail Ext2 Comments
    10 ReiserFS
    Breakdown in read performance between 1x1 and 10x1 (resp. 1x10)! Why that big difference?
    10 ReiserFS notail
    Breakdown in read performance between 1x1 and 10x1 (resp. 1x10)! Why that big difference?
    10 Ext2
    Clearly seen the overhead of maintaining many files per directory. Bad stating performance.
    Overall best case for Reiser: many small files per directory.
    100 ReiserFS 100 ReiserFS notail 100 Ext2 This is a case where the directory structure shows no relevance.
    1000 ReiserFS 1000 ReiserFS notail
    Write performance becomes bad after very good start and is even significantly worse than for 10000 directories.
    1000 Ext2
    It looks like that between 100 and 1000 entries can be cached: 100x10 und 10x100 are slightly better for writing and random read and worse for recursive walkings because the directory commands become expensive.
    Directory structure shows up for Ext2.
    10000 ReiserFS
    The 1x10000 and 10000x1 are faster for read, but slower for random read.
    10000 ReiserFS notail
    Seems, that the extreme 1x10000 and 10000x1 are better than the moderate structures.
    10000 Ext2
    Again, the moderate structures are in favour for writing and random read. 1x10000 and also 10000x1 are better for recursive operations.
    Directory structure shows up for Ext2 and slightly less for Reiser.

    Without comments, the resulting graphs as a comparison between the file systems:
    1x1 10x1 1x10
    100x1 10x10 1x100
    1000x1 100x10 10x100 1x1000
    10000x1 1000x10 100x100 10x1000 1x10000