	The enclosed files are an attempt to measure and profile the
kernel buffer cache and scsi disk access.  To do this, I have taken a
number of source code modules from the kernel, and usually with very
slight modifications I am able to run them as a user-mode program.
The advantage of doing this is that you can run gdb and gprof on
the code to determine what is going on.

	Currrently we start at the top level with bmark.c.  This
supplies a bunch of kernel functions which we do not normally have
available, and it sets things up.  We begin by simulating a 15Mb
buffer cache - each buffer header actually shares the same buffer, so
you do not need 15Mb of memory to run this.

	Next we set up the scsi disks.  The scsi_debug.c module was
designed for these sorts of applications because it simulates the
presence of two scsi disks.  For our purposes we only need one, but it
was easier to leave alone.  I modified the scsi_debug driver so that
all "access" to the scsi disks completes immediately without delay.
We are trying to measure delays and latencies that do not include
waiting for a scsi request to complete, so we are not interested in
simulating the actual time that a real scsi disk would take to
complete the request.  If someone wanted to get really fancy, you
could wire the srawread program into the scsi_debug program and
simulate the entire thing from stem to stern.

	At some point I would like to start including some modules
from ext2 and simulate file access.  This would require us to somehow
keep track of some dirty buffers for bitmaps, but it should be doable.

	Anyway, as currently configured, the program is set up to
simulate reading a scsi disk partition in 1024 byte increments.  The
clustering diffs are not included in this distribution, but could
easily be used because most of the relevant source files are in fact
direct copies of the actual kernel sources with very minor
modifications.  I have rigged the test in such a way that clustering
will not make much of a difference anyway, because it would appear
that most of the overhead is above the scsi code.  The srawread
results with real scsi disks seems to bear this out.

	To run the program type:

./bmark 10

to simulate reading a 10Mb file.

	Note that with this version a number of configuration options
are specified in the file config.h.  There are comments that explain each
one and what it does.

	One thing that I find interesting is that this program reports
datarates of approximately 7.5Mb/sec (my machine is a 486/33).  This would
be a theoretical maximum for a 486-33 that you could probably not surpass even
with the best VLB or EISA board.

	The current Makefile is set up to compile the program to do
profiling.  The performance that the program reports will not be as
good with profiling installed, because of the additional overhead.
Note that profiling is badly broken in most versions of libc out there
- you should get the extra-*.tar.gz from a libc 4.5.4 or later to get
meaningful profile reports.

	I am enclosing a portion of the report for a 30Mb file.

	Also note that when profiling is in use, the program is
noticably slower - instead of 7.5 Mb/sec, I get something like
5.5Mb/sec.

-Eric

Note: I have noticed that the memset function is not well suited
towards zeroing large chunks of memory (i.e. buffers).  It zeroes the
buffer on a byte by byte basis - it is almost 4 times faster to zero on
a longword by longword basis.  This probably does not come up very
often, but it is helpful to keep in mind.
