IBM TS7650G PROTECTIER DEDUPLICATION GATEWAY Overview - page 10
10 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
HyperFactor, whose ability to map 1PB of
base capacity in main memory supports
multiple petabytes of usable capacity. The
fact that IBM can scale to this level against a
single, global de-duplication repository is
key: all other things being equal, they will
achieve higher data reduction ratios by using
a global repository than vendors scaling to
the same usable capacity but that spread that
capacity over multiple repositories (one
associated with each SCO VTL appliance).
And the TS7650G’s single node performance
and scalability mean that you can build out
these
large
configurations
with
less
hardware, creating simpler, less expensive
configurations. Whether you’re consoli-
dating multiple existing backup targets or
creating a single backup target that can scale
to petabytes of capacity, the TS7650G lets
you do this very cost-effectively.
Availability. The introduction of clustering
not only doubles single system performance
but
also
addresses
the
enterprise
requirement for higher availability. IBM’s
clustering technology provides a highly
available environment that can tolerate the
failure of a VTL node while maintaining
access to all the data within the repository.
To provide the necessary levels of high
availability, enterprise SCO VTL solutions
also need to be able to ride through single
disk failures. The TS7650G supports
heterogeneous storage on the back end, and
IBM
recommends
the
use
of
RAID
capabilities supported by this back end disk
to provide high data availability. If higher
levels of resiliency are desired, users can
flexibly configure storage subsystems with
the required levels of resiliency. IBM’s Best
Practices provide tools that recommend
certain
RAID
configurations
for
it’s
repository (metadata and user data) for
optimal performance and resiliency.
Reliability. Two basic issues were
identified earlier in this area: the risk of false
positives and the verification of retrieved
data. HyperFactor uses a unique approach
to identify and confirm redundant elements.
At a high level, HyperFactor does a very low
latency “fly by” looking for elements that look
similar to what it has already seen. A more
in-depth analysis is then performed only on
the elements identified as “similar” whereas
the “new” elements go immediately into the
index before they are stored on the back end
storage. Competitive approaches execute
their full “chunk evaluation algorithm” on
each and every element, which in the end
generally means they end up doing a lot more
work (at very high latency cost since a large
percentage of references may require reads
from disk) for every element. HyperFactor’s
approach
not
only
handles
higher
throughput but also more reliably identifies
each element.
ProtecTIER retains metadata about each
element, one piece of which is a cyclic
redundancy check (CRC or checksum). On
reads, ProtecTIER assembles the required
elements, performing checksums on each
element once they have been converted back
into their original form to verify that the data
element read out of the repository is the
exact same data element originally stored
there.
The RAID capabilities of the underlying
storage subsystems provide yet another level