Information Technology Reference
In-Depth Information
a 11
···
a 1 N
x 1
x N
y 1
.
y M
.
.
=
,
. . .
a M 1 ···
a MN
where A is a sparse binary matrix.
As part of data preprocessing, it is important to normalize durations according
to weight, measure, tool/instrument, etc. by rescaling. One way to implement this
normalization in the linear setting is to take the A matrix as weighted, with the weights
as described, rather than interpreting the matrix as binary.
We wish to solve the inference problem A x
x .
Since data may be inaccurate, insufficient and inconsistent, one possible algorithm
to use is the Lanczos inverse [ 31 ]:
=
y for x , resulting in an estimate
ˆ
A T A
) 1 A T y
x
ˆ
= (
.
If there is some prior knowledge about the statistical nature of x , another possible
algorithm to use is message-passing Bayesian inference [ 32 , 33 ].
If all atoms in the several molecules were unique, then it would be impossible to
perform the inference with any degree of validity. To ensure this does not happen, we
should group normalized atoms from disparate molecules into equivalence classes.
This can be performed using an unsupervised clustering algorithm, e.g. based on
features from the ontology. Note that grouping is not restricted to steps that operate
on single ingredients.
In this clustering, there is a tradeoff between
, the estimated poorness of inverse
problem solution obtainable (in the linear setting, this can be measured using the
condition number of the measurement matrix A ), and
ʺ
, the internal coherence in the
equivalence classes, which must be balanced to obtain best overall performance. One
way to trade condition number and coherence is to define a hierarchy for equivalence
class formation, e.g. using tree-structured k -means clustering: the hierarchy can be
defined jointly by both ingredient (red apple
˃
<
<
<
apple
fruit
produce) and action
<
<
˃
(brunoise
dice
cut). Then we may proceed up the hierarchy decreasing
until
ʺ
is sufficiently small.
Thus we have a basic way to estimate the durations of recipes, using data from
other recipes.
There are several extensions that could be made to this basic recipe step dura-
tion estimation procedure. Doing certain atomic steps together as a molecular step
may take less time than doing them separately as atomic steps, which introduces
a nonlinearity in the additivity of work times. Also, a molecular step may be able
to exploit certain parallelization possibilities to do work faster than split up into
atomic steps, which also introduces a nonlinearity in the additivity of work times
[ 34 ]. These nonlinearities should be handled when considering how atoms lead to
molecules. Furthermore, although here we assume that each atomic step has a precise
Search WWH ::




Custom Search