Database Reference
In-Depth Information
Changing the Face of ETL
Extract, transform, and load (ETL) processes are generally considered the
backbone of any BI implementation because they are responsible for
moving, cleansing, loading, and reviewing the quality of the data that
reports and analytics are based on in today's current systems. The challenge
with today's ETL is that it can be difficult to adapt or adjust once a
large-scale implementation is in place.
As discussed throughout this topic, tools such as Pig and Sqoop provide
a less-robust, more-scalable, and more-flexible approach to moving data
in and around your environment. In most cases, you will have both your
traditional ETL platform, like SQL Server Information Services (SSIS), and
an ETL process that works within your project implementation. These work
together to move the data back and forth between those environments.
Pig, in particular, provides a robust command-based solution. With this
solution,youcandotext-baseddatacleansingandorganizationaggregation,
and you can use the MapReduce framework under the covers to scale the
network across many nodes and thus process and move large volumes of
data that a normal ETL server may struggle to handle in the same amount of
time. Traditional ETL servers use memory-intensive processes to load data
into memory, access it very quickly, and perform the appropriate operation
(such as cleansing, removing, sorting, aggregating, and so on). Arguably,
by combining the capabilities of Pig with additional tools, such as natural
language processing and other types of advanced analysis, developers can
do things in conjunction with the ETL process that would previously have
required a lot more code and a lot more effort to accomplish.
What Does This Do to Business Intelligence?
With all these changes, you are probably asking yourself what this does to
BI.Theimportantthingtorememberisthatnothingchangesovernight.Yes,
developers do have many new tools and new ways to do some impressive
analytics and incredible dedicated visualizations, but these new tools will
continue to interact with existing platforms into which you have already
invested significant intellectual capital and organizational intellectual
property.
BI was originally designed as the analytics platform of the future. Now,
though, as the future looms (or as we pass through it), we require additional
scalability. BI has successfully demonstrated the value of these types of
Search WWH ::




Custom Search