IBM TS7650G PROTECTIER DEDUPLICATION GATEWAY Overview - page 2
2 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
The Inevitability of Disk-Based
Data Protection
Disk is in widespread use as a part of the
data protection infrastructure of many large
enterprises.
Evolving
business
and
regulatory mandates are imposing stringent
SLAs on these organizations, pushing them
to address backup window, RPO, RTO, and
recovery reliability issues, and disk has a lot
to offer in these areas. Technologies such as
VTLs have made the integration of disk into
existing data protection environments a very
operationally viable option.
Cost has historically been the single biggest
obstacle to integrating disk into existing data
protection infrastructures in a widespread
fashion, but the availability of SCO
technologies such as single instancing, data
de-duplication,
and
compression
have
brought the $/GB costs for usable disk
capacity down significantly. SCO-based
solutions first became available in 2004, and
the SCO market hit $237M in revenue in
2007. Over the next five years, we expect
revenue in the SCO space to surpass $2.2B,
with the largest single market sub-segment
being SCO VTLs (source: Taneja Group Next
Generation
Data
Protection
Emerging
Markets Forecast September 2008). If you
are not using disk for data protection
purposes today, and you are feeling some
pressure around backup window, RPO, RTO,
or recovery reliability, you need to take
another look at SCO VTLs. It is our opinion
that within 1-2 years, SCO VTLs will be in
widespread use throughout the enterprise.
With data expected to continue to grow at
50% - 60% a year, the economics of SCO
technology are just too compelling to ignore.
A Brief Primer on SCO
Taneja
Group
has
chosen
the term
SCO to apply to the range of technologies
that are used today to minimize the amount
of raw storage capacity required to store a
given amount of data. Data de-duplication is
a common term in use by vendors, but this
term really only describes one set of
algorithms used to capacity optimize storage.
And many vendors of de-duplication use it
along with other technologies, such as
compression, in a multi-step process used to
achieve the end result. That said, de-
duplication is the primary technology that
enables solutions to reach dramatic capacity
optimized ratios such as 20:1 or more. Given
the focus and attention on de-duplication -
as well as the fact that it is at the heart of
IBM’s TS7650G - let’s take a closer look.
At their most basic level, data de-duplication
technologies break data down into smaller
recognizable pieces (ie. elements) and then
look for redundancy. As elements come into
the system, they are compared against an
index which holds a list of elements that are
already stored in the system. When an
incoming element is found to be a copy of an
element that is already stored in the system,
the new element is eliminated and replaced
by a pointer to the reference element. In
secondary storage environments like backup
where backed up data may only change 3-5%
or less per day, there is a significant amount
of redundancy that can be identified and
removed (a 5% change rate implies a 95%
data redundancy rate!). De-duplication
algorithms can operate at the file level (this is
also referred to as single instancing) or at the
sub-file level. Sub-file level de-duplication