IBM TS7650G PROTECTIER DEDUPLICATION GATEWAY Overview - page 6
6 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
technologies that look most promising in
your environment, and don’t just run them
against a single backup. The throughput
performance of various SCO algorithms may
change over time as the indexes grow;
conventional hashing and content-aware
algorithms may actually suffer decreased
throughput once their index has outgrown
main memory capacity (something that often
happens around 20TB of base capacity with
conventional indexing algorithms). In
environments that do weekly full and daily
incremental backups, ratios will generally
improve over time, approaching a steady
state. The daily change rate of your data is a
critical determinant of the ratios you’ll
achieve over time, and if you’re like most
shops your daily rate will vary somewhat.
Finally, understand if the solution you’ve
chosen supports what is called a “global”
repository. Earlier, we stated that some sort
of index is generally referenced as each
element
comes
into
the
system.
Architectures that allow multiple SCO VTLs
to reference a single, global repository that
includes all the elements that have been seen
before tends to offer better ratios than
systems that have a single, separately
developed index for each SCO VTL.
Architectures that support global repositories
tend to offer a better growth path as well,
since when the performance capabilities of a
single SCO VTL are outgrown, a new one can
be added and can immediately take
advantage of the index that is already there.
High availability. In today’s 24x7
environments, even secondary data has to be
highly available so that stringent SLAs can be
met. SCO VTLs cannot compromise that
high availability as they are integrated into
existing data protection infrastructures.
Once data is converted into a capacity
optimized form, it is not usable by
applications until it can be re-converted back
into its original form. If there is a failure,
either within a SCO VTL or at the level of the
entire SCO VTL, the data may not be
available. For that reason, it is important to
support high availability solutions that can
ride through single points of failure. High
availability architectures allow maintenance
to be performed on-line as well, further
improving the overall availability of the
environment. Clustered architectures are a
good way to meet this need, and can
contribute to higher overall throughput as
well if a global repository is supported. Look
for support also for various RAID options on
the back end storage to protect against disk
failures.
Reliability. Because SCO VTLs effectively
convert data into an abbreviated form prior
to storing it, there is some conversion risk
that must be evaluated. How does the
system perform the conversion, and what is
the risk of false positives (two elements that
are not exactly alike being identified as
such)? In SCO VTLs that use conventional
hashing methodologies, this risk is called out
as the “hash collision rate.” While nominal
hash collision rates may appear to be low
with conventional systems, if they are going
to be used in enterprise environments that
may be dealing with petabytes of usable
capacity, they need to be evaluated in light of
that level of scale.
When data is read back, it’s important to
verify the accuracy of the conversion process.