Biomedical Engineering Reference
In-Depth Information
However, many important points have been neglected in the recent image-based
biological analyses. Although machine learning approaches offer strong prediction
performances, analysts should be aware of the problems arising from the combi-
nation of ''nature of image data'' and ''nature of cells.''
First, cellular image data has a much higher bias compared to mRNA and protein
data. This is because mRNA and proteins are total molecules summed within all the
cells in one vessel, whereas image data biases depend on where and by whom the
cells are observed. Typically, cells grow locally in culture dishes, and migrate toward
empty spaces. Thus, if cells are not uniformly seeded, the cell number and migration
rates differ greatly between images. Furthermore, analysts should be aware that most
cellular images from cell biologists are taken from the researchers' favorite ''view
field'' with favorite ''focus and lighting.'' Therefore, if images are not acquired
randomly or scheduled by automated machinery, random selection of images from
the ''researcher's image library'' already has a huge bias.
Second, due to the nature of cells, cellular images always contain a certain
percentage of ''common features.'' This is because every cell type exhibits the
same round and small morphology when the cells are ''dead,'' ''proliferating,'' or
''rapidly migrating.'' In addition, cell-derived debris increases on the surface of the
culture plate during the culture process. Such a ''common sub-population,'' which
contributes to fatal noise in machine learning algorithms, drastically lowers model
accuracy. Therefore, the objects in cellular images have to be effectively filtered or
classified by detailed observation and statistical analysis of the cell populations
before model construction. We also have to consider the fact that primary cells
sometimes contain ''different cell types,'' although they are usually overlooked.
Third, the cell variation is huge between cell lines, cell passages, and cell
origins. Therefore, the variation arising from the source of cellular images is also
extremely important. In our experience, such variation is extremely large; there-
fore, the quantity of images should be large enough to provide enough cell
numbers to minimize the standard deviation. Such quantity and variation in cel-
lular images are commonly neglected, mostly because of the cost and labor
involved in the experiment. For effective machine learning, sufficient data for
cross-validation is required; therefore, the experimental design for image acqui-
sition is extremely important.
Fourth, cellular image processing is commonly completely dependent on the
researcher's feeling. In most cases, binarization is processed with ''a threshold.''
However, this threshold is commonly decided by some value ''considered OK by
the researcher after evaluating fewer than 20 images.'' Such a threshold is rarely
''thoroughly scanned,'' because such a function is lacking in most cellular image
analysis software. Therefore, most cellular images are processed individually and
differently with a ''feeling-based threshold,'' or processed by a single threshold
that is ''roughly decided.'' Therefore, a high bias is inherent in image processing
when image data are processed into numerical data.
To reduce these four major biases, we applied original solutions before analysis. (i)
Image bias: we used an original seeding device for equal cell seeding, and acquired
more than 400 images per condition, including different view fields, wells, and time-
Search WWH ::




Custom Search