Multiuser runs are performed by running multiple copies of webclient
under the control of the perl program run.workload
. This perl
program provides various convenience functions, most important of which
are its ability to perform a clean ramp-up and shutdown of the workload,
and to exclude the ramup/down periods from statistics gathering.
The control program uses a special shared-memory area to coordinate the
clients; the control features are not reproducible with a simple
"roll-your-own" shell script.
The file run.workload
is primarily a configuration file, with the actual
work being done by the program workload.pl
. You should review the
contents of run.workload
and make sure that reasonable defaults
and values have been chosen for your run. Typically, one starts
the program as follows:
run.workload url.session.input 30 5.0 3600 prefix_
This will launch 30 copies of webclient
, run for 3600 seconds,
using url.session.input
as it's input file, and setting the default
think time to be 5.0 seconds (overriding any think times specified in
the input file.) The runtime of 3600 seconds is exclusive of the startup
and shutdown times (which can be significant, especially when a large
number of clients are specified). The output reports are placed
in a directory named prefix_3600_30
. This directory name is
built up by using the prefix, the length of the run, and the number
of clients. The directory is automatically created.
Be sure to pick a run duration that is long enough to collect a reasonable amount of data. For complex sites or complex workloads, runs of less than an hour can lead to choppy and uneven statistics. Note that sessions with non-zero think times can take minutes or tens of minutes to run, depending on the complexity of the session. Thus, you want to pick a run duration that allows at the very least 4 or 5 sessions to play through. Remember that no statistics are collected until a new session is started after ramp-up is complete, and statistics from partially-completed sessions at the end of the run are also discarded. Thus, typically 2 or 3 sessions are lost to rampup and ramp-down times, which is why 4-5 sessions minimum is recommended.
Here are some of things that can be tuned or controlled by modifying
run.workload
:
$options
-- set to specify options to webclient
. Options such
as -A
or -S SSLV3:4
are placed here.$webserver
-- specifies the webserver host being used.$sleep_interval
-- specifies how long run.workload
is to run
between statistics being printed.$nservers
-- number of webservers on the host to use. Each
webserver on the host is assumed to have a distinct port
number. Set this to one to use only one webserver on the
host.$port[]
array -- this is the array of port numbers. If
$nservers
is set to 1 then $port[0]
is the only
entry that is used.$nstart
-- run.workload
starts the child webclient
's in groups.
This number specifies the size of each group.$dest
-- the name of the directory where the results will be
placed.$custid[]
array -- provides the customer id's for the child
runs.$pin[]
array -- provides the pin numbers for the child runs.$passwd[]
array -- provides the passwords for the child
runs.$seed[]
array -- provides the random seeds used by the child
runs. Distinct seeds are required to get distinct think times
and randomization values in each child.$shmkey
-- used to identify the shared memory area to the
children. Value does not matter and should not need to be
changed unless another program (perhaps another instance
of run.workload
) that uses the same shared memory key exists
on the local system.
In certain places in run.workload
, the program updshm
is used
to update the shared memory area instead of the PERL routine shmwrite()
.
This is because shmwrite()
was causing segmentation faults when it
should not; so it was replaced with the updshm
program instead.
The run.workload
script goes through a sequence of steps to ramp up
the workload on the server and start statistics gathering. The
below enumerates the sequence it goes through.
webclient
as specified are all started.
As each one starts, it goes through some basic initialization,
after which it forks itself off into the background. If any of
these fail during initialization, run.workload
stops and
all clients are killed. Once each client has finished initializing,
it increments a flag in shared memory, indicating that it's
ready to start running requests. It will not actually run any
requests until run.webclient
sets the 'start' flag.
run.workload
waits until all children have indicated that they
have initialized.
run.workload
then increments a second global variable by
$nstart
. This causes the first $nstart
children to begin
submitting requests. run.workload
waits at least 12 seconds
(by default), or longer, until each of the children has actually
completed at least one request, before starting another group
of $nstart
children.
run.workload
sets a flag in global memory indicating that all children are
now actively submitting requests. At this point, ramp up is complete.
webclient
is sent
a SIGUSR1
signal, telling it to shut down. Data collected for
the partially-completed session that was running when the signal
was caught is discarded. Then each client will print out the
statistics that it has collected. By discarding the partially
completed run at the tail end, the statistics that are presented
are always for a whole number of sessions repetitions.
run.workload
takes a snapshot of the current values for the
statistics in the global memory area and prints these to stdout
,
in order to simplify the live monitoring of a run. Note that
the printed statistics do exclude any data gathered during ramp-up,
and thus should present an accurate measure of throughput.
However, they will not match exactly the summary statistics
generated by sumstats
, since sumstats
uses only whole sessions,
with partially completed sessions trimmed from the beginning and end
of the run. Thus, sumstats
will typically report a smaller number
of completed URL's, that were done in a smaller amount of time.
The average throughput should be the same, though, within
statistical fluctuations.
Below is an example of the output from run.workload
:
T ELAP STP STL SESS REQS CONNS GIFS KBYTES GIF-KB REQ FIRST END to
I for this interval: KB-RATE RATE DATA END
--------------------------------------------------------------------------
T 30 0 0 74 74 370 370 1736 702 2.47 0.135 1.135
I 57.87 2.47 0.135 1.135
T 60 0 0 145 145 725 725 3402 1376 2.42 0.145 1.152
I 55.52 2.37 0.155 1.169
T 90 0 0 219 219 1095 1095 5138 2078 2.43 0.187 1.151
I 57.87 2.47 0.270 1.149
Notice that there are two lines, labelled T
and I
. The T
line shows totals, the I
line shows stats for the interval.
In the above, the interval is 30 seconds long.
This output is primarily useful in keeping an eye on the run; more detailed and compete statistics can be gotten by post-processing the reports files after the end of the run. Later sections discuss the post-processing stage.
The columns in the output are labelled as follows:
Elapsed time. The amount of time, in seconds, since the beginning of the run. In the above example, we see three entries: 30, 60 and 90 seconds into the run.
Stopped runs. The number of clients that have completely halted. Clients will halt when they encounter a variety of differnt errors, such as unresponsive web servers, missing web pages, bad checksums, and other reasons.
Stalled runs. The number of clients that have made no progress in the last
two intervals.
A client is "stalled" if the number of KB of HTML fetched and the number
of gif files fetched has not changed for two $nsleep
intervals.
In this example, $nsleep is 30 seconds. There are no stalled runs.
Sessions. The total number of sessions that have been completed by all of the clients since the begining of the run. Here, we see that 219 sessions were completed in 90 seconds.
Requests. The total number of requests that have been completed by all of the clients since the begining of the run. Here, we note that the number of requests equals the number of sessions: that's because the 'session' consisted of a single web page. If, for example, a session consisted of two pages, then the number of completed requests would be about double the number of completed sessions.
Connections. The total number of connections made to the web server. These include sockets opened to fetch gifs. If a server is running with Keep-Alive/Persistant connections enabled, then the connection count will typically stay low.
Gifs. The total number of image (or audio) files fetched by all clients.
Since one web page typically contains a lot of images, this number will
typically be much larger than the number of completed requests. However,
since webclient
emulates gif caching, this number will stay low
if the same set of gifs appear on each page.
In the above example, we see that the web page has five gifs on it,
and thus the number of fetches is five times the number of requests.
Kilobytes Fetched. The total amount of data downloaded, including header bytes, body bytes, and gif bytes. Header bytes are the part of the HTTP header that isn't the displayed HTML: even for a simple 'hello world' web page, the HTTP header can be hundreds of bytes long, and can contribute significantly to the total network traffic.
Kilobyte Rate. Shown in the same column as KYBTES
, but down one
row, this is the rate of data transfer, in KBytes per second. This
figure is only for the interval; it is not an average since
the begining.
Image KBytes Fetched. The total amount of KBytes that were image/audio/ embedded graphics.
Request Rate. The number of pages per second being fetched. There are
two numbers in the column, on alternating rows. On the T
row,
we have the average request rate, since the begining of the run.
On the I
row, we have the request rate in the last (30 second)
interval.
First Data Response Time. The elapsed time, in seconds, between when a URL request is made, and the first non-header response byte is received. (Some web servers send a partial header immediately; this statistic measures how long it took until data actually started flowing.)
There are two numbers in the column, on alternating rows. On the
T
row, we have the average response time, since the begining
of the run.
On the I
row, we have the response time in the last (30 second)
interval.
End to End Response Time. The elapsed time, in seconds between when a URL request is made, and the very last byte of the response is received. This number is always greater than the First Data Response Time, because it includes the additional overhead of delivering the rest of the page. Some cgi-bins/interctive web sites start streaming a web page beck to the browser even before the entire web page has been dynamically generated. Thus, back end delays can cause long delays between the start of a web page transmission, and its end.
There are two numbers in the column, on alternating rows. On the
T
row, we have the average response time, since the begining
of the run.
On the I
row, we have the response time in the last (30 second)
interval.
Note that these numbers are useful primarily in keeping an eye on the running test. More detailed and complete statistics are obtained by post-processing the reports generated by the clients. Later sections discuss the post-processing tools
It is useful to gather system performance statistics on the tested
web server. In particular, the vmstat
command can be used to
capture CPU usage. Other commands provide network traffic, and
the cwsmon
tool will provide GOLD, MQ Series and IPC statistics.
Some tools are provides to simplify the handling of some of these
other statistics. In particular, note the following:
Takes the output of the vmstat
command, and computes the
average CPU usage, number of page faults, I/O & etc.
Handles multiple files at once, printing a one-line summary
per file. This script is critical for understanding what the
cpu usage was on the server during a run.
Chops up one large vmstat.out
file into pieces, suitable for
input to the cputotals
command. To determine how to chop up
the vmstat
input file into pieces, it parses the output of
run.workload
to determine what times a run started and ended.
It then uses these start and end times to chop a corresponding
chunk out of the run.vmstat
output. It can handle multiple
run.workload.out
files simultaneously to minimize typist fatigue.
Perl script that chops off the beginning and end of an input
file, printing only the middle to stdout
, as specified by
a start and end-time. Handles run.vmstat
files by default,
but can handle cwsmon
or other file formats with some fiddling.
Can be used to create the input files needed by cputotals
,
by stripping off garbage data at the beginning and the end of
a vmstat
measurement.
Each client of a multi-user run produces its own report file when the run terminates. The report file contains statistics for each individual web page, as well as various sorts of averages and roll-ups. There's a lot of data in there. Probably mor than you want. To get averages for the whole run (averages of all of the client statistics), some post-porcessing of the data needs to be done. This section describes the scripts and the process for this data reduction.
The output of the process is a report file that very much resembles the report file of each individual client, except that it contains the averages over all clients. The report breaks out the response times and delays for each web page, for collections of web pages, and for the session as a whole. The last few lines of the report show the grand total page rate and response time.
The instructios below describe how to create a summary report.
webclient
report files. The way that run.workload
is configured at present, these end up in the reports directory
under run_dd_nn
. To reduce the data, run
sumstats run_dd_nn/reports/*
This extracts the response time per URL and the overall response time for the run from each of the report files.
Note that if randomization is used (i. e. the "fraction" field of the URL input line is less than 1.0, which causes the associated URL to be run on that fraction of the sessions attempted), then the response times printed by reducer will not add up to the overall response time at the bottom of the reducer report. This is because the response times listed are averaged over all runs where the particular URL was submitted, whereas the total at the bottom is averaged over all sessions attempted. To get the totals to add up, you must weight the intermediate numbers by the fraction of sessions where those URL's were actually submitted.
vmstat
output files so they contain
data only from the time period associate with the measurement
run. This can be done manually, or with the aid of several
tools. These fixed-up files are then used by the cputotals
command to report the average CPU usage, memory size, page
faults, etc. for the run.
The choplog
tool automates the following manual procedure:
run.workload.out
file that contains the
output text from the run.workload
command. Find the
line "rampup complete at time:
" and record the time
(in hours and minutes) that is given there.run complete at time:
" and record the
time given there. These times represent the start and
end time of the run, exclusive of startup and shutdown
times.vmstat.out
that was produced by the
run.vmstat
command. Delete the lines of the file
that correspond to observations taken outside of the
experimental interval.
The choplog
tool allows you to specify a vmstat.out
and
multiple workload.out
files on the command line. It will
produce one chopped vmstat.out
file for each workload.out
file
that was specified. This is very handy when digesting output
from multiple overnight runs.
Alternately, the timechop
utility is a more primitive tool:
given an input, a start time, and an end time, it will print
out only the portion of the input that lies between the start
and end times.
After the properly cleaned up vmstat.out
files have been
created, the cputotals
command can be used to report the
average CPU busy & idle percentages, as well as the average
number of context switches and system call rates per second.
The cputotals
command will accept multiple input files at
once, and will print a separate summary line for each.
Of course, this toolset can be used to perform stress-testing of web servers. Below we describe some additional features that may be useful for such testing.
In the real world, network errors occur, and clients occasionally
drop or disconnect sockets. These tools can simulate some of these
behaviours in a simplistic fashion. webclient
has been written
so that if it catches a SIGHUP
during a network operation
(i.e. system call) it will repeat that operation. This can induce
some network abnormalities while still allowing webclient
to
continue functioning. The perl script smack
can be used to send
SIGHUP
signals to all running copies of webclient
.
Several changes have been made in run.workload
to accommodate larger
numbers of webclient
children:
run.workload
waits until they
run at least 1 request each before it starts the next 10. That way
we avoid convoy problems (e. g. where all of the children run the
same request at the same time). This part of the run is called
rampup. You can revert to the old behavior by setting the
$nstart
variable to $number_of_users
. This will cause all of
the runs to start at once.
run.workload
or webclient
until after the end of rampup.
run.workload
now checks for "stalled" or "lack of progress" runs.
A run is stalled if the number of html KBytes and gif files it
has fetched does not change for two print intervals (currently
set to 30 seconds). Messages are printed when a run becomes
stalled and when it resumes progress (if ever).
shmwrite()
function appears to have a bug in it that
causes it to fail with a segment violation if it is used more
than once per run. To get around this, there is a little
program called updshm
that updates shared memory and is used
instead of shmwrite
.
number_of_users
values supported by run.workload
is limited only by the size of the password file. Runs with up
to 350 users have been successfully completed.