Next Previous Contents

9. How To Do Multi-User Runs

Multiuser runs are performed by running multiple copies of webclient under the control of the perl program run.workload. This perl program provides various convenience functions, most important of which are its ability to perform a clean ramp-up and shutdown of the workload, and to exclude the ramup/down periods from statistics gathering. The control program uses a special shared-memory area to coordinate the clients; the control features are not reproducible with a simple "roll-your-own" shell script.

The file run.workload is primarily a configuration file, with the actual work being done by the program workload.pl . You should review the contents of run.workload and make sure that reasonable defaults and values have been chosen for your run. Typically, one starts the program as follows:


run.workload  url.session.input  30   5.0  3600 prefix_

This will launch 30 copies of webclient, run for 3600 seconds, using url.session.input as it's input file, and setting the default think time to be 5.0 seconds (overriding any think times specified in the input file.) The runtime of 3600 seconds is exclusive of the startup and shutdown times (which can be significant, especially when a large number of clients are specified). The output reports are placed in a directory named prefix_3600_30. This directory name is built up by using the prefix, the length of the run, and the number of clients. The directory is automatically created.

Be sure to pick a run duration that is long enough to collect a reasonable amount of data. For complex sites or complex workloads, runs of less than an hour can lead to choppy and uneven statistics. Note that sessions with non-zero think times can take minutes or tens of minutes to run, depending on the complexity of the session. Thus, you want to pick a run duration that allows at the very least 4 or 5 sessions to play through. Remember that no statistics are collected until a new session is started after ramp-up is complete, and statistics from partially-completed sessions at the end of the run are also discarded. Thus, typically 2 or 3 sessions are lost to rampup and ramp-down times, which is why 4-5 sessions minimum is recommended.

9.1 Workload Parameters

Here are some of things that can be tuned or controlled by modifying run.workload:

  1. $options -- set to specify options to webclient. Options such as -A or -S SSLV3:4 are placed here.
  2. $webserver -- specifies the webserver host being used.
  3. $sleep_interval -- specifies how long run.workload is to run between statistics being printed.
  4. $nservers -- number of webservers on the host to use. Each webserver on the host is assumed to have a distinct port number. Set this to one to use only one webserver on the host.
  5. $port[] array -- this is the array of port numbers. If $nservers is set to 1 then $port[0] is the only entry that is used.
  6. $nstart -- run.workload starts the child webclient's in groups. This number specifies the size of each group.
  7. $dest -- the name of the directory where the results will be placed.
  8. $custid[] array -- provides the customer id's for the child runs.
  9. $pin[] array -- provides the pin numbers for the child runs.
  10. $passwd[] array -- provides the passwords for the child runs.
  11. $seed[] array -- provides the random seeds used by the child runs. Distinct seeds are required to get distinct think times and randomization values in each child.
  12. $shmkey -- used to identify the shared memory area to the children. Value does not matter and should not need to be changed unless another program (perhaps another instance of run.workload) that uses the same shared memory key exists on the local system.

In certain places in run.workload, the program updshm is used to update the shared memory area instead of the PERL routine shmwrite(). This is because shmwrite() was causing segmentation faults when it should not; so it was replaced with the updshm program instead.

9.2 Notes on the "Ramp-Up" Process

The run.workload script goes through a sequence of steps to ramp up the workload on the server and start statistics gathering. The below enumerates the sequence it goes through.

  1. As many copies of webclient as specified are all started. As each one starts, it goes through some basic initialization, after which it forks itself off into the background. If any of these fail during initialization, run.workload stops and all clients are killed. Once each client has finished initializing, it increments a flag in shared memory, indicating that it's ready to start running requests. It will not actually run any requests until run.webclient sets the 'start' flag.
  2. run.workload waits until all children have indicated that they have initialized.
  3. run.workload then increments a second global variable by $nstart. This causes the first $nstart children to begin submitting requests. run.workload waits at least 12 seconds (by default), or longer, until each of the children has actually completed at least one request, before starting another group of $nstart children.
  4. When all children have completed at least one request, run.workload sets a flag in global memory indicating that all children are now actively submitting requests. At this point, ramp up is complete.
  5. The children examine this "ramp-up complete" flag to determine whether to start collecting statistics. They do not actually start collecting statistics until the start of a new session after the flag is set. Thus, statistics are always for whole sessions. Data for sessions that started before ramp-up complete are discarded.
  6. When the run duration expires, each copy of webclient is sent a SIGUSR1 signal, telling it to shut down. Data collected for the partially-completed session that was running when the signal was caught is discarded. Then each client will print out the statistics that it has collected. By discarding the partially completed run at the tail end, the statistics that are presented are always for a whole number of sessions repetitions.
  7. run.workload takes a snapshot of the current values for the statistics in the global memory area and prints these to stdout, in order to simplify the live monitoring of a run. Note that the printed statistics do exclude any data gathered during ramp-up, and thus should present an accurate measure of throughput. However, they will not match exactly the summary statistics generated by sumstats, since sumstats uses only whole sessions, with partially completed sessions trimmed from the beginning and end of the run. Thus, sumstats will typically report a smaller number of completed URL's, that were done in a smaller amount of time. The average throughput should be the same, though, within statistical fluctuations.

9.3 Interpreting the Output

Below is an example of the output from run.workload:


T  ELAP STP STL SESS  REQS  CONNS  GIFS  KBYTES GIF-KB  REQ   FIRST  END to
I   for this interval:                   KB-RATE        RATE   DATA   END
--------------------------------------------------------------------------
T    30  0  0    74     74   370    370   1736    702   2.47  0.135  1.135
I                                         57.87         2.47  0.135  1.135
T    60  0  0   145    145   725    725   3402   1376   2.42  0.145  1.152
I                                         55.52         2.37  0.155  1.169
T    90  0  0   219    219  1095   1095   5138   2078   2.43  0.187  1.151
I                                         57.87         2.47  0.270  1.149

Notice that there are two lines, labelled T and I. The T line shows totals, the I line shows stats for the interval. In the above, the interval is 30 seconds long.

This output is primarily useful in keeping an eye on the run; more detailed and compete statistics can be gotten by post-processing the reports files after the end of the run. Later sections discuss the post-processing stage.

The columns in the output are labelled as follows:

ELAP

Elapsed time. The amount of time, in seconds, since the beginning of the run. In the above example, we see three entries: 30, 60 and 90 seconds into the run.

STP

Stopped runs. The number of clients that have completely halted. Clients will halt when they encounter a variety of differnt errors, such as unresponsive web servers, missing web pages, bad checksums, and other reasons.

STL

Stalled runs. The number of clients that have made no progress in the last two intervals. A client is "stalled" if the number of KB of HTML fetched and the number of gif files fetched has not changed for two $nsleep intervals. In this example, $nsleep is 30 seconds. There are no stalled runs.

SESS

Sessions. The total number of sessions that have been completed by all of the clients since the begining of the run. Here, we see that 219 sessions were completed in 90 seconds.

REQS

Requests. The total number of requests that have been completed by all of the clients since the begining of the run. Here, we note that the number of requests equals the number of sessions: that's because the 'session' consisted of a single web page. If, for example, a session consisted of two pages, then the number of completed requests would be about double the number of completed sessions.

CONNS

Connections. The total number of connections made to the web server. These include sockets opened to fetch gifs. If a server is running with Keep-Alive/Persistant connections enabled, then the connection count will typically stay low.

GIFS

Gifs. The total number of image (or audio) files fetched by all clients. Since one web page typically contains a lot of images, this number will typically be much larger than the number of completed requests. However, since webclient emulates gif caching, this number will stay low if the same set of gifs appear on each page. In the above example, we see that the web page has five gifs on it, and thus the number of fetches is five times the number of requests.

KBYTES

Kilobytes Fetched. The total amount of data downloaded, including header bytes, body bytes, and gif bytes. Header bytes are the part of the HTTP header that isn't the displayed HTML: even for a simple 'hello world' web page, the HTTP header can be hundreds of bytes long, and can contribute significantly to the total network traffic.

KB-RATE

Kilobyte Rate. Shown in the same column as KYBTES, but down one row, this is the rate of data transfer, in KBytes per second. This figure is only for the interval; it is not an average since the begining.

GIF-KB

Image KBytes Fetched. The total amount of KBytes that were image/audio/ embedded graphics.

REQ RATE

Request Rate. The number of pages per second being fetched. There are two numbers in the column, on alternating rows. On the T row, we have the average request rate, since the begining of the run. On the I row, we have the request rate in the last (30 second) interval.

FIRST DATA

First Data Response Time. The elapsed time, in seconds, between when a URL request is made, and the first non-header response byte is received. (Some web servers send a partial header immediately; this statistic measures how long it took until data actually started flowing.)

There are two numbers in the column, on alternating rows. On the T row, we have the average response time, since the begining of the run. On the I row, we have the response time in the last (30 second) interval.

END to END

End to End Response Time. The elapsed time, in seconds between when a URL request is made, and the very last byte of the response is received. This number is always greater than the First Data Response Time, because it includes the additional overhead of delivering the rest of the page. Some cgi-bins/interctive web sites start streaming a web page beck to the browser even before the entire web page has been dynamically generated. Thus, back end delays can cause long delays between the start of a web page transmission, and its end.

There are two numbers in the column, on alternating rows. On the T row, we have the average response time, since the begining of the run. On the I row, we have the response time in the last (30 second) interval.

Note that these numbers are useful primarily in keeping an eye on the running test. More detailed and complete statistics are obtained by post-processing the reports generated by the clients. Later sections discuss the post-processing tools

9.4 Gathering Other Statistics

It is useful to gather system performance statistics on the tested web server. In particular, the vmstat command can be used to capture CPU usage. Other commands provide network traffic, and the cwsmon tool will provide GOLD, MQ Series and IPC statistics. Some tools are provides to simplify the handling of some of these other statistics. In particular, note the following:

cputotals

Takes the output of the vmstat command, and computes the average CPU usage, number of page faults, I/O & etc. Handles multiple files at once, printing a one-line summary per file. This script is critical for understanding what the cpu usage was on the server during a run.

choplog

Chops up one large vmstat.out file into pieces, suitable for input to the cputotals command. To determine how to chop up the vmstat input file into pieces, it parses the output of run.workload to determine what times a run started and ended. It then uses these start and end times to chop a corresponding chunk out of the run.vmstat output. It can handle multiple run.workload.out files simultaneously to minimize typist fatigue.

timechop

Perl script that chops off the beginning and end of an input file, printing only the middle to stdout, as specified by a start and end-time. Handles run.vmstat files by default, but can handle cwsmon or other file formats with some fiddling. Can be used to create the input files needed by cputotals, by stripping off garbage data at the beginning and the end of a vmstat measurement.

9.5 Post-Processing and Data Reduction

Each client of a multi-user run produces its own report file when the run terminates. The report file contains statistics for each individual web page, as well as various sorts of averages and roll-ups. There's a lot of data in there. Probably mor than you want. To get averages for the whole run (averages of all of the client statistics), some post-porcessing of the data needs to be done. This section describes the scripts and the process for this data reduction.

The output of the process is a report file that very much resembles the report file of each individual client, except that it contains the averages over all clients. The report breaks out the response times and delays for each web page, for collections of web pages, and for the session as a whole. The last few lines of the report show the grand total page rate and response time.

The instructios below describe how to create a summary report.

  1. You need to summarize the response time outputs from all of the webclient report files. The way that run.workload is configured at present, these end up in the reports directory under run_dd_nn. To reduce the data, run sumstats run_dd_nn/reports/*

    This extracts the response time per URL and the overall response time for the run from each of the report files.

    Note that if randomization is used (i. e. the "fraction" field of the URL input line is less than 1.0, which causes the associated URL to be run on that fraction of the sessions attempted), then the response times printed by reducer will not add up to the overall response time at the bottom of the reducer report. This is because the response times listed are averaged over all runs where the particular URL was submitted, whereas the total at the bottom is averaged over all sessions attempted. To get the totals to add up, you must weight the intermediate numbers by the fraction of sessions where those URL's were actually submitted.

  2. You need to fixup the vmstat output files so they contain data only from the time period associate with the measurement run. This can be done manually, or with the aid of several tools. These fixed-up files are then used by the cputotals command to report the average CPU usage, memory size, page faults, etc. for the run.

    The choplog tool automates the following manual procedure:

    1. View the run.workload.out file that contains the output text from the run.workload command. Find the line "rampup complete at time:" and record the time (in hours and minutes) that is given there.
    2. Find the line "run complete at time:" and record the time given there. These times represent the start and end time of the run, exclusive of startup and shutdown times.
    3. Edit the file vmstat.out that was produced by the run.vmstat command. Delete the lines of the file that correspond to observations taken outside of the experimental interval.

    The choplog tool allows you to specify a vmstat.out and multiple workload.out files on the command line. It will produce one chopped vmstat.out file for each workload.out file that was specified. This is very handy when digesting output from multiple overnight runs.

    Alternately, the timechop utility is a more primitive tool: given an input, a start time, and an end time, it will print out only the portion of the input that lies between the start and end times.

    After the properly cleaned up vmstat.out files have been created, the cputotals command can be used to report the average CPU busy & idle percentages, as well as the average number of context switches and system call rates per second. The cputotals command will accept multiple input files at once, and will print a separate summary line for each.

9.6 Stress Testing

Of course, this toolset can be used to perform stress-testing of web servers. Below we describe some additional features that may be useful for such testing.

In the real world, network errors occur, and clients occasionally drop or disconnect sockets. These tools can simulate some of these behaviours in a simplistic fashion. webclient has been written so that if it catches a SIGHUP during a network operation (i.e. system call) it will repeat that operation. This can induce some network abnormalities while still allowing webclient to continue functioning. The perl script smack can be used to send SIGHUP signals to all running copies of webclient.

9.7 Large Numbers of Clients

Several changes have been made in run.workload to accommodate larger numbers of webclient children:

  1. The runs are now started in groups of 10 rather than all at once. Each time a group of 10 is started, run.workload waits until they run at least 1 request each before it starts the next 10. That way we avoid convoy problems (e. g. where all of the children run the same request at the same time). This part of the run is called rampup. You can revert to the old behavior by setting the $nstart variable to $number_of_users. This will cause all of the runs to start at once.
  2. Statistics are not collected by either run.workload or webclient until after the end of rampup.
  3. The output no longer prints the status of each run. Only when a run stops will a message about that particular run be printed. (or when it becomes or ceases being "stalled" -- see next item)
  4. run.workload now checks for "stalled" or "lack of progress" runs. A run is stalled if the number of html KBytes and gif files it has fetched does not change for two print intervals (currently set to 30 seconds). Messages are printed when a run becomes stalled and when it resumes progress (if ever).
  5. Since throughput statistics are not recorded until after the end of rampup, the throughput rates should not require adjustment due to startup.
  6. The perl5 shmwrite() function appears to have a bug in it that causes it to fail with a segment violation if it is used more than once per run. To get around this, there is a little program called updshm that updates shared memory and is used instead of shmwrite.
  7. The number_of_users values supported by run.workload is limited only by the size of the password file. Runs with up to 350 users have been successfully completed.


Next Previous Contents