TimestampWeightedTally

class pydsol.core.statistics.TimestampWeightedTally(name: str)[source]

Bases: WeightedTally

The TimestampWeightedTally is a statistics object that calculates descriptive statistics for piecewise constant observations, such as weighted mean, weighted variance, minimum observation, maximum observation, etc. Contrary to the WeightedTally, the weights are implicitly calculated based on timestamps that are provided with each observation.

The initialize() method resets the statistics object. The initialize method can, for instance, be called when the warmup period of the simulation experiment has completed.

In order to properly ‘close’ the series of observations, a virtual observation has to be provided at the end of the observation period, to count the value and duration of the last interval into the statistics. The end_observations method takes care of ending the observation period. After calling end_observations(timestamp), further calls to the register method will be silently ignored.

In a sense, the TimestampWeightedTally can be seen as a normal Tally where the observations are multiplied by the duration (interval between two successive timestamps) when the observation had that particular value. But instead of dividing by the number of observations to calculate the mean of the ordinary Tally, the sum of durations times observation values is divided by the total duration of the observation period till the last registered timestamp.

Example

In discrete-event simulation, the TimestampWeightedTally is often used to calculate statistics for (average) queue length, or (average) utilization of a server. Every time the actual queue length or utilization changes, the new value is registered with the timestamp, and the previous observation value is counted towards the statistic with the time interval between the previous timestamp and the new timestamp as the weight.

Attributes:
  • _name (str) – the name by which the statistics object can be identified

  • _n (int) – the number of observations

  • _n_nonzero (int) – the number of non-zero weights

  • _sum_of_weights (float) – the sum of the weights

  • _weighted_sum (float) – the sum of the observation values times their weights

  • _weight_times_variance (float) – the weighted variant of the second moment of the statistic

  • _min (float) – the lowest value in the current observations

  • _max (float) – the highest value in the current observations

  • _start_time (float) – timestamp of the first registered observation

  • _last_timestamp (float) – timestamp when the currently valid observation value was set

  • _last_value – currently valid observation value

  • _active – true after initializations until end_observations has been called

__init__(name: str)[source]

Construct a new TimestampWeightedTally statistics object. The TimestampWeightedTally is a statistics object that calculates descriptive statistics for weighted observations, such as weighted mean, weighted variance, minimum, and maximum, where the weights are implicitly calculated based on successive timestamps. The intervals between the timestamp are used as the weights.

Parameters:

name (str) – The name by which the statistics object can be identified.

Raises:

TypeError – when name is not a string

initialize()[source]

Initialize the statistics object, resetting all values to the state where no observations have been made. This method can, for instance, be called when the warmup period of the simulation experiment has completed.

isactive() bool[source]

Indicate whether the statistic is active and can register observations. After calling end_observations(timestamp) _active will be set to False and further calls to the register method will be silently ignored.

Returns:

Whether the statistic is active and can register observations.

Return type:

bool

end_observations(timestamp: float)[source]

In order to properly ‘close’ the series of observations, a virtual observation has to be provided at the end of the observation period, to count the value and duration of the last interval into the statistics. The end_observations method takes care of ending the observation period. After calling end_observations(timestamp), further calls to the register method will be silently ignored.

Parameters:

timestamp (float) – The timestamp of the final interval before the observations end. The last registered value will be counted into the statistics for the duration of (timestamp - last_timestamp).

Raises:
  • ValueError – when the provided timestamp is nan

  • ValueError – when the provided timestamp is before the last registered timestamp

last_value() float[source]

Return the last registered value (this value has not yet been counted into the statistics, unless end_observations() has been called).

Returns:

The last registered value.

Return type:

float

register(timestamp: float, value: float)[source]

Process one observation value and a timestamp that indicates from which time the observation is valid, and calculate all statistics up to and including the previous registered value for the duration between the last timestamp and the timestamp provided in this method. Successive timestamps can be the same, but a later timestamp cannot be before an earlier one.

Note

When two successive timestamps are the same, the observation value is counted towards the number of observations, and for the minimum and maximum observation value, but it does not contribute to the other statistics.

Parameters:
  • timestamp (float) – The timestamp from which the observation value is valid.

  • value (float) – The observation value.

Raises:
  • TypeError – when timestamp or value is not a number

  • ValueError – when weight or value is NaN

  • ValueError – when the provided timestamp is before the last registered timestamp

max() float

Return the (unweighted) observation with the highest value. When no observations were registered, NaN is returned.

Returns:

The observation with the highest value, or NaN when no observations were registered.

Return type:

float

min() float

Return the (unweighted) observation with the lowest value. When no observations were registered, NaN is returned.

Returns:

The observation with the lowest value, or NaN when no observations were registered.

Return type:

float

n() int

Return the number of observations.

Returns:

The number of observations.

Return type:

int

property name: str

Return the name of this statistics object.

Returns:

The name of this statistics object.

Return type:

str

Return a string representing a footer for a textual table with a monospaced font that can contain multiple tallies.

classmethod report_header() str

Return a string representing a header for a textual table with a monospaced font that can contain multiple weighted tallies.

report_line() str

Return a string representing a line with important statistics values for this tally, for a textual table with a monospaced font that can contain multiple tallies.

weighted_mean() float

Return the weighted mean. When no observations were registered, NaN is returned.

The weighted mean of the WeightedTally is calculated with the formula:

\[\mu_{W} = \frac{\sum_{i=1}^{n} w_{i}.x_{i}}{\sum_{i=1}^{n} w_{i}}\]

where n is the number of observations, \(w_{i}\) are the weights, and \(x_{i}\) are the observations.

Returns:

The weighted mean, or NaN when no observations were registered.

Return type:

float

weighted_stdev(biased: bool = True) float

Return the (biased) weighted population standard deviation of all observations since the statistic initialization. The biased version needs at least one observation. For the unbiased version, two observations are needed. When too few observations were registered, NaN is returned.

The formula for the biased (population) weighted standard deviation is:

\[\sigma_{W} = \sqrt{ \frac{\sum_{i=1}^{n}{w_i (x_i - \mu_{W})^2}} {\sum_{i=1}^{n}{w_i}} }\]

where \(w_i\) are the weights, \(x_i\) are the observations, \(n\) is the number of observations, and \(\mu_W\) is the weighted mean of the observations.

For the unbiased (sample) weighted variance (and, hence, for the standard deviation), different algorithms are suggested by the literature. As an example, R and MATLAB calculate weighted sample variance differently. SPSS rounds the sum of weights to the nearest integer and counts that as the ‘sample size’ in the unbiased formula. When weights are used as so-called reliability weights (non-integer) rather than as frequency weights (integer), rounding to the nearest integer and using that to calculate a ‘sample size’ is obviously incorrect. See the discussion at https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance and at https://stats.stackexchange.com/questions/51442/weighted-variance-one-more-time. Here we have chosen to implement the version that uses reliability weights. The reason is that the weights in simulation study are most usually time intervals that can be on any (non-integer) scale.

The formula used for the unbiased (sample) weighted standard deviation is:

\[S_{W} = \sqrt{ \frac{M}{M - 1} . \sigma^2_{W} }\]

or as a complete formula:

\[S_{W} = \sqrt{ \frac{M}{M - 1} . \frac{\sum_{i=1}^{n}{w_i (x_i - \mu_{W})^2}} {\sum_{i=1}^{n}{w_i}} }\]

where \(w_i\) are the weights, \(x_i\) are the observations, \(n\) is the number of observations, \(M\) is the number of non-zero observations, and \(\mu_W\) is the weighted mean of the observations.

Parameters:

biased (bool) – Whether to return the biased (population) standard deviation or the unbiased (sample) standard deviation. By default, biased is True and the population standard deviation is returned.

Returns:

The weighted standard deviation of all observations since the initialization, or NaN when too few (non-zero) observations were registered.

Return type:

float

weighted_sum() float

Return the sum of all observations times their weights since the statistic initialization.

Returns:

The sum of the observations times their weights.

Return type:

float

weighted_variance(biased: bool = True) float

Return the weighted population variance of all observations since the statistic initialization. The biased version needs at least one observation. For the unbiased version, two observations with non-zero weights are needed. When too few observations were registered, NaN is returned.

The formula for the biased (population) weighted variance is:

\[\sigma^{2}_{W} = \frac{\sum_{i=1}^{n}{w_i (x_i - \mu_{W})^2}} {\sum_{i=1}^{n}{w_i}}\]

where \(w_i\) are the weights, \(x_i\) are the observations, \(n\) is the number of observations, and \(\mu_W\) is the weighted mean of the observations.

For the unbiased (sample) weighted variance, different algorithms are suggested by the literature. As an example, R and MATLAB calculate weighted sample variance differently. SPSS rounds the sum of weights to the nearest integer and counts that as the ‘sample size’ in the unbiased formula. When weights are used as so-called reliability weights (non-integer) rather than as frequency weights (integer), rounding to the nearest integer and using that to calculate a ‘sample size’ is obviously incorrect. See the discussion at https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance and at https://stats.stackexchange.com/questions/51442/weighted-variance-one-more-time. Here we have chosen to implement the version that uses reliability weights. The reason is that the weights in simulation study are most usually time intervals that can be on any (non-integer) scale.

The formula used for the unbiased (sample) weighted variance is:

\[S^{2}_{W} = \frac{M}{M - 1} . \sigma^2_{W}\]

or as a complete formula:

\[S^{2}_{W} = \frac{M}{M - 1} . \frac{\sum_{i=1}^{n}{w_i (x_i-\mu_{W})^2}} {\sum_{i=1}^{n}{w_i}}\]

where \(w_i\) are the weights, \(x_i\) are the observations, \(n\) is the number of observations, \(M\) is the number of non-zero observations, and \(\mu_W\) is the weighted mean of the observations.

Parameters:

biased (bool) – Whether to return the biased (population) variance or the unbiased (sample) variance. By default, biased is True and the population variance is returned.

Returns:

The weighted variance of all observations since the initialization, or NaN when too few (non-zero) observations were registered.

Return type:

float