DATA VALIDATION - Wind Resource Assessment: A Practical Guide to Developing a Wind Project

Environmental Engineering Reference

In-Depth Information

should be used to avoid confusion. For example, raw data can be given the extension

raw , while verified data can be given the extension ver .

9.2 DATA VALIDATION

In these days of powerful personal computers, most data validation is done with

automated tools; however, a manual review is still highly recommended. Validation

software can be obtained from some data logger vendors, and commercial software is

also available. Firms that do a lot of data validation often create their own automated

methods using spreadsheets or custom software written in languages such as Fortran,

Visual Basic, C

,orR.

Whatever method is used, data validation usually proceeds in two phases: automated

screening and in-depth review. The automated screening uses a series of algorithms to

flag suspect data records. Suspect records contain values that fall outside the normal

range based on either prior knowledge or information from other sensors on the same

tower. The algorithms commonly include relational tests, range tests, and trend tests.

The second phase, sometimes called verification , involves a case-by-case decision

about what to do with the suspect values—retain them as valid or reject them as

invalid. This is where judgment by an experienced person familiar with the monitoring

equipment and local meteorology is most helpful. Information that is not part of the

automated screening, such as regional weather data, may also be brought into play.

As an example of how this process can unfold, the automated screening might flag

a brief series of 10-min wind speeds as questionable because they are much higher

than the speeds immediately before and after. Was this spike real, or was it caused

by a glitch in the logger electronics, such as might be caused by a loose connection?

During the review phase, the reviewer might check other sensors on the same mast

and observe the same spike; this would suggest that it is not a problem with a single

sensor or logger channel. Then he or she might look at regional weather records and

find that there was thunderstorm activity in the area at the time. The conclusion is

that the spike was most likely caused by a passing thunderstorm and should not be

excluded from the data analysis.

Another example is presented in Figure 9-1. After a period of apparently normal

operation, the 10-min average speed readings from an anemometer dropped to the off-

set value (indicating no detectable wind), while the standard deviation dropped to zero.

Later, both appeared to return to their normal behavior. The reviewer checks the tem-

perature and finds it hovered near freezing before the event and rose above freezing at

the end. Furthermore, the direction standard deviation (not shown) fell to zero shortly

before the speed standard deviation did and resumed normal behavior at about the same

time. The conclusion is that this was a likely icing event and should be excluded.

In such a two-phase validation approach, it is reasonable for the automated screen-

ing to be somewhat overly sensitive, meaning it produces a greater number of false

positives (data flagged as bad, although they are actually good) than false negatives

(data that are cleared as good but are actually bad). One reason for this bias toward

overdetection is that there will be an opportunity to reexamine bad data records in

++

Wind Resource Assessment: A Practical Guide to Developing a Wind Project

Search WWH ::

Custom Search

Home