IBM TS7650G PROTECTIER DEDUPLICATION GATEWAY Overview - page 4
4 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
in capacity optimized form, is the amount of
time it takes to both ingest the backup and to
perform the capacity optimization, not just
the time it takes to ingest the backup.
This dichotomy (in-line vs post processing)
has some key implications on overall system
performance that may not be entirely
evident. When an in-line vendor quotes a
throughput number, that is the single
number necessary to evaluate how long it
takes to complete the backup and process the
data into capacity optimized form, at which
point it is ready for any further processing
(e.g. 600MB/sec can process roughly
2.16TB/hour). When a post-processing
vendor quotes throughput, that generally
refers to how long it takes to ingest the data
and does not include the post-processing
time necessary to capacity optimize it (e.g.
600MB/sec can ingest 2.16TB/hr but
additional time will be required to perform
post-processing). To truly understand if a
post-processing approach can meet your
backup windows, you need to evaluate the
total time required to both ingest the backup
and to perform the post-processing. Post-
processing vendors may argue that since the
post-processing is de-coupled from the
backup, it doesn’t matter how long it takes.
In some environments, that may be true but
if you have an 8 hour window to complete
your backups and capacity optimize them
before you clone data to tapes, or replicate
your backup sets to a remote site for DR
purposes, and you cannot complete the
backup ingest and the post-processing within
that 8 hour window, then the post-
processing approach will impact your DR
RPO.
Without a doubt, in-line approaches require
less overall physical storage capacity than
post-process approaches. For a given
environment exhibiting a 10:1 capacity
optimization ratio, the system will write
100GB of data for every 1TB it backs up. A
post-process method will need to write that
1TB to disk first, then cycle it through post-
processing, eventually shrinking the storage
required to store that backup to 100GB.
Thus,
post-processing
systems
must
maintain spare capacity to allow for the
initial ingest of data prior to the de-
duplication
process.
Post-processing
products clearly require more capacity for a
given environment than in-line solutions to
allow for this buffer, but the actual amount
will vary based on the specific post-
processing approach being used.
Post-processing
approaches
introduce
additional time before a capacity optimized
backup is ready for further processing, such
as cloning to tape, distributing electronically
to a DR site, etc. If additional time and
capacity are available, then you may be
indifferent between the two approaches, but
if they are not, then this is something to
consider when evaluating solutions. Note
that some post-processing vendors allow the
post-processing to be started against a
particular backup job before it completes,
thereby reducing both the capacity and time
requirements that would otherwise be
associated with approaches which perform
these operations sequentially. In-line
approaches, however, will generally complete
the overall backup processing (ingestion +
capacity optimization) faster than post-
processing approaches since they complete
their work in a single pass.