A Multi-driven Approach to Improve Data Analytics for Multi-value Dimensions - New Contributions in Information Systems and Technologies

Information Technology Reference

In-Depth Information

Table 3 presents the values in millisecond achieved for each performance test rea-

lised for all options related to multi-value dimensions and double counting, using

Pentaho as our OLAP testing tool. Once again avoiding the need for a bridge table,

overcoming at the same time any double-counting derived from non-strictness.

Table 3. Main outcomes with different techniques to calculate the metric vale for PrecoLot

Me tr i c

(PrecoLote)

Agregation

Operator

Execution

Time (ms )

Comparis on

(Scenario1)

Double-

Counts

Grain of

Data

Multi-Value

Grand Total

Sce nari o1

SUM

1571

100%

Ye s

Granul ar

Not A ccurate

Sce nari o2

SUM

1548

99%

Granul ar

Not A ccurate

Scenario3

SUM

35293

2247%

Aggregated

Yes

Not Accurate

Sce nari o4

SUM

2268

144%

A ggre gate d

Ye s

A ccurate

Scenario 1 , in this scenario the value of PrecoLot was calculated using the de-

fault features of Pentaho-Mondrian (i.e., possibility to configure the standard

behaviour of the SUM operator). Main drawback: only provides correct values at the

lowest granularity. The tested results originate double-counting and faulty data for

Grand Total and PrecoLote, if not all dimensions are considered (i.e., foreign keys

DEA_ID, DPR_ID or DLOT_ID). This situation can fall into a summarizability

problem since there is a granularity mismatch in the instances of the fact causing

inconsistent totals;

Scenario 2 , in this scenario we used Pentaho MDX expressions to calculate Preco-

Lot for multi-value dimensions. The expression “ Sum ([DEA].[All],

[Measures].[PrecoLote]) ” calculates the metric value evaluated over the entire

set of Contracting Entities . In this case we had to assign a value of 0 (zero) to

each fact row with duplicates. This basic solution requires the end-user to have a deep

knowledge of the DW schema when performing OLAP queries. Main drawback: multi-

valued dimension are not supported because facts are defined independently of the di-

mension (e.g., the value of PrecoLote is assigned always with the same value for

contracting entities). The ETL also becomes very complex to ensure that the price of

each Lot is specified only once in the fact table.

Scenario 3 , in this scenario the approach used in scenario 2 was reviewed to the

following formula “ IIf([DLOT.H].CurrentMemberis[DLOT.H].[All],

Sum([DLOT.H].CurrentMeber.Children,[Measures].[PrecoLote3

Max]), [Measures].[PrecoLoteMax]) ”. This approach returns SUM ( Pre-

coLoteMax ) for each aggregated Lot , otherwise it returns the value of PrecoLo-

teMax . PrecoLoteMax is a hidden metric that was configured in the same way as

PrecoLote in scenario 1 but using the MAX operator instead of the SUM operator.

When applied to a single Lot, the output of PrecoLoteMax is correct because

MAX(X,X)= X , even if the price of the lot is repeated across several rows; it also

avoids double counting. Main drawback: extremely underperformance and still

provides inaccurate results for Grand Total.

New Contributions in Information Systems and Technologies

Search WWH ::

Custom Search

Home