In Depth Tutorials and Information

Reconstructing Model Parameters in Partially-Observable Discrete Stochastic Systems (Statistics Inference) (Analytical and Stochastic Modeling) Part 2

Symbolic Polynomial Analysis

The iterative analysis is a simple algorithm that usually finds accurate estimates for unknown model parameter values, but does not give any guarantees about time-to-convergence and optimality of the values found. In those cases where all unknown probabilities can be expressed as polynomials in a single common variable, this problem can be solved more reliably by direct computation.

The direct computation approach follows the paradigm of maximum likelihood estimation (MLE): Determine a mathematical expression that represents the likelihood of the model to explain the observations made, i.e. the symbol emissions as recorded in the trace, dependent on the unknown parameter. Then, find the global maximum of said expression to determine the most likely parameter value.

In the case of PODS systems, the observations recorded in the trace are usually not statistically independent (e.g. the time at which a machine produces an item depends on the time at which it finished production of the previous item). Thus, the usual MLE approach of computing a likelihood expression by either multiplying the probabilities for the occurrences of the individual observations or adding their log-likelihoods is not applicable. But the normal event-driven analysis algorithm readily serves as an alternative: Since the probability of a single Proxel of the last time step of an analysis is a measure for the likelihood of the system to be in a single state after having emitted the trace, the sum of the probabilities of all Proxels for that time step is a measure for the likelihood of the system to be in any state at all after having emitted that trace, and therefore a measure of likelihood for the trace to have been emitted by that model.

Thus, for a given value of an unknown model parameter a, the event-driven analysis computes a measure of likelihood for the model to have emitted the trace, if the value of the unknown parameter was a. If no value is given for the unknown parameter and the computations are performed symbolically by keeping a as a symbolic variable, then the event-driven analysis computes a mathematical expression – dependent on the unknown parameters – of the likelihood for the model to have emitted the trace – which is the first step in an MLE algorithm.

To achieve this, we propose the following approach: First, express all unknown parameters by polynomials in a single variable. This is necessary in order to make the following computations practically feasible. Since this is not always possible, the approach is not applicable to all models.

Then, perform an event-driven Proxel analysis of the trace with the incomplete model, but perform all computations of Proxel probabilities not numerically, but with symbolic polynomials. This allows for the computations to be performed even while some parameter values are missing. Consequently, the Proxel probabilities will not be numbers, but symbolic polynomials as well. Since the polynomials are limited to a single variable, the length of the Proxel probability polynomials increases only by a constant amount with each time step (since a probability polynomial may be multiplied by the constant size polynomial of an unknown model parameter; merging Proxels does not increase the degree of the polynomial at all), making the algorithm computationally feasible. After the algorithm has terminated, sum up the polynomials of all Proxel probabilities of the final time step in order to obtain the likelihood polynomial (cf. Figure 4 for the likelihood polynomial of the incomplete Tester model).

The final step of the MLE is then to determine the position of the maximum of this polynomial in the valid range (the range of values for the single free variable for which all unknown probabilities that were expressed as polynomials of it take on values in the interval [0; 1]). This can be done either through numerical root-finding of the symbolic first derivative of that polynomial, or by evaluating the polynomial using random values for the free variable and accepting the position of the sample with the highest function value as a close-enough estimate for the maximum. In both cases, evaluating the probability polynomials for the unknown parameters at the position determined by this step then returns their most likely values and thereby reconstructs the missing model parameters.

For implementers of this algorithm, it is important to consider numerical accuracy: Since the algorithm deals with polynomials containing powers of a variable to 1000 and more, the computations are numerically unstable. We found that a floating-point accuracy of at least 256 bits for all polynomial coefficients and all polynomial evaluations is required for reliable results.

Fig. 4. Graph of the trace generation likelihood polynomial for the Tester model. The value on the horizontal axis for which the polynomial reaches its maximum is the most likely defect rate for Machine 0.

A advantageous side effect of the symbolic computation approach is that it depends only on the likelihood polynomial for the sum of all Proxels. Thus, in contrast to the iterative analysis presented before, this approach does not need to encode history into Proxel as supplementary variables. This reduction of complexity partially offsets the increased complexity introduced by the symbolic computations and usage of the BigNum library GMP[8] for increased floating point accuracy. For an evaluation of computation times for both approaches, see the experiments in the next section.

Experiments

In this section, both algorithms are tested in order to determine their accuracy and computation times under varying trace length. Additionally, it is tested whether the algorithms can cope with ambiguous models.

The tests are conducted on the Tester model with traces of about 750 symbols (the equivalent of 50,000s of system behavior), generated using the discrete event simulator AnyLogic [4]. Using synthetic traces has the advantage that the actual defect probabilities (which are to be reconstructed using the two algorithms) can be recorded as well. Thus, the analysis result can be compared to the otherwise unobservable behavior of the "real" system.

In order to be analyzed by the polynomial approach, the four unknown parameters of the Tester model (the probabilities for each machine to produce defective or working pieces, respectively) need to be expressed as polynomials in a single variable. This is easily possible, since the "working" probability of each machine can easiliy be expressed by the corresponding "defective" probability. And the defective probability of one machine can easily be related to that of the other one, since both together have to exactly account for the actual fraction of produced defective items as recorded in the trace.

Computational Accuracy

To determine the accuracy of the approach, we created multiple signal traces from the Tester system, but from different system instances with different, randomly chosen defect probabilities for the two machines. The traces were then analyzed using the iterative and the polynomial analysis algorithms.

The results for the individual traces are shown in Figure 5. For each of the ten model instances, the actual value for the system parameter "defect probability of machine 0" are given in addition to the values reconstructed by the iterative and polynomial approaches. The results for the polynomial and iterative analysis are almost identical, giving confidence that they indeed answer the same question. And both results always differ only slightly from the actual system behavior. The difference between both reconstructions and the actual system behavior is likely to be unavoidable: Both algorithms only reconstruct the parameters values for which the system -most likely caused the trace, but the actual system behavior is not necessarily identical to the most likely explanation.

Fig. 5. Simulation results for the analysis of ten traces with different defect rates in the real system

Fig. 6. Increase of computation time under increasing trace length for the polynomial and iterative analysis algorithms

Computation Time

The computation time of the algorithms is relevant to assess their practical applicability, especially with longer traces. For this experiment, we generated a trace covering 500, 000s (about 5 days) of simulated time containing about 7, 500 symbols; ten times as many as the traces for the other experiments. The computation time used was then recorded every analyzed 10,000s for both algorithms.

Figure 6 shows the raw computation times, which need to be interpreted with care: For the polynomial algorithm, a single analysis run is always sufficient to reconstruct the system behavior. The iterative approach on the other hand requires repetitive analyses until its results converge. To give an indication of how the computation times compare, we plotted the results for a single iteration as well as for 20 iterations – a reasonable number of iterations till convergence.

The computation time of the iterative approach increases linearly, that of the polynomial approach on the other hand increases polynomially with a low initial slope. This is due to the symbolic polynomials increasing in length over time and thus making their computations more costly. Overall, the magnitude of the computation time is similar for both algorithms. Which algorithm is faster for the analysis of a specific trace and model depends on the trace length and the number of iteration steps required for convergence.

With the polynomial approach having to multiply and store polynomials using high precision floating point numbers that the CPU does not natively support, one would expect the polynomial approach to be slower than the iterative one by orders of magnitude even for short traces. But the polynomial algorithm does not have to encode history information into the system states (cf. Section 3.2), which leads to a far lower number of Proxels required for the analysis and partially compensates for the increased computational effort required for each Proxel.

Analysis of Ambiguous Systems

Some systems have multiple equally likely explanations for their internal behavior. For example, if the two machines in the Tester system had identical production behavior (i.e. identical probability distribution and distribution parameters), they could not be distinguished. An algorithm suitable for practical applications should fail gracefully in these cases.

To test algorithm behavior on this class of models, we created a trace for a modified Tester model where both production durations were N(120, 20) distributed, but had different defect probabilities. The trace was then analyzed by both new algorithms.

Figure 7 shows the progression of the iterative analysis. Since the initial guess for the defect probabilities is almost (due to floating point inaccuracies) identical to the local minimum of the likelihood polynomial (cf. Figure 8), the result barely changes between the first few iterations. The iterative algorithm, which uses the difference between the results of two iterations as a termination criterion, would ordinarily terminate right away and would return this locally most unlikely parameter value as the most defect probability. If forced to perform further iterations, the results diverge only after 60 iterations to one of the two local maxima, depending on on which side of the likelihood minimum the initial estimate was located. In effect, the algorithm randomly proclaims one of the two different but equally likely outcomes to be the most likely system behavior, and ignores the second one.

The polynomial approach reliably finds both maxima (cf. Figure 8). So, it correctly determines that one machine has caused about twice as many defects as the other one, and correctly states that it is not possible to distinguish the two machines.

Thus, the iterative analysis is unreliable in the general case since it may find the global maximum, or just a random local maximum, or may even be stuck in a local minimum of the likelihood. It is only reliable when the likelihood polynomial is guaranteed to only contain a single maximum and no minima. The polynomial approach, on the other hand, always finds the most likely value(s) for the unknown parameter irrespective of the shape of the likelihood polynomial.

Fig. 7. Progression of the iterative analysis of a trace from an ambiguous model

Fig. 8. Likelihood polynomial for a trace from an ambiguous Tester model

Conclusion and Outlook

In this paper, we have presented two approaches that enable the reconstruction of parts of the system specification of partially-observable discrete stochastic systems from an incomplete model and a trace of external observations of its behavior.

It should be noted that the task of both algorithms is only to reconstruct parameters of the system that most likely has created the trace. In the same way that a star athlete may be most likely to win a sports competition, but is not guaranteed to win it, the reconstructed most likely explanation is not necessarily the actual cause.

The iterative algorithm is a simple modification to the existing algorithms and is usually able to correctly determine the most likely parameter values in about 20 iterations. However, being a local optimization algorithm, it does not necessarily find the actual most likely system behavior, but only a value that is more likely to represent the actual system behavior than neighboring values on both sides. Additionally, its behavior is undefined if the initial estimate for the unknown parameter is less likely to represent the actual system behavior than neighboring values on both sides.

The polynomial algorithm is a more complex modification to the existing algorithms, using symbolic computations on polynomials to cope with incomplete specifications. It is thus computationally more expensive than the iterative approach, but in turn is guaranteed to reconstruct the most likely real system behavior.

In practice, the iterative algorithm may be preferred in real-time application scenarios due to its simplicity and speed, especially for systems where local optimization algorithms are guaranteed to also find the global optimum (i.e. those with no local minima and only a single local maximum), and where more accurate initial estimates may be available (e.g. by using the results of the trace of the previous day as an estimate for the next day) and thus fewer iterations are required. The polynomial algorithm is better suited to analyze the likelihood of different system configurations off-line and as a benchmark of accuracy for other approaches.

Next post: Performance Evaluation of a Kitting Process (Queueing Theory) (Analytical and Stochastic Modeling) Part 1

Previous post: Reconstructing Model Parameters in Partially-Observable Discrete Stochastic Systems (Statistics Inference) (Analytical and Stochastic Modeling) Part 1