IBM TS7650G PROTECTIER DEDUPLICATION GATEWAY Overview - page 8
8 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
storage in a scant 4GB of RAM. This
supports the TS7650G’s industry leading in-
line, single node throughput because element
identification
and
referencing
is
all
performed in main memory – no accesses to
disk are required. Competitive indexing
technologies such as hashing and content-
aware approaches have much less efficient
mapping
algorithms, forcing them to
reference a disk-based index during the
capacity optimization process to map more
than around 20TB of base capacity. This
explains
why
alternative
capacity
optimization technologies generally suffer
decreased throughput as the repository
grows; they run very fast when all the index
references can be handled in main memory,
but once they outgrow the available memory
and must touch disk, reference times can
slow down by two orders of magnitude. This
efficient
index
mapping
design
sets
HyperFactor apart, allowing it to scale
linearly for repositories up to 1PB in base
capacity. After HyperFactor completes the
de-duplication process, it then compresses
elements before they are stored.
The Importance of SCO VTL Clustering
With this announcement, IBM is unveiling
gateway clustering along with support for a
global repository. Although today they are
supporting two node configurations, the
architecture is designed to support up to 16
nodes over time, providing a very scalable
growth path for high end customers.
Clustered TS7650Gs present a single VTL
image to backup servers across which single
system throughput can be scaled. Based on
data from ProtecTIER’s installed base, many
of their customers are seeing single node
sustained throughput in the 450MB/sec
range, with peak throughputs topping
600MB/sec. In adding a second node and
supporting a global repository, IBM is
pushing the sustained throughput rate into
the
900MB/sec
range,
with
peak
throughputs even higher. Because the entire
index is mapped into the main memory of
each node, it doesn’t matter which node a
backup stream hits: it will enjoy the same
high level of performance.
When it comes to throughput in clustered
environments,
there
is
an
important
distinction between single system and
aggregated throughput. Single system
throughput identifies a throughput number
against a single repository, access to which
may be spread across multiple VTLs and
multiple processing nodes. In the TS7650G’s
case, multiple gateways leverage a global
repository, which makes the single node
throughput number additive as nodes are
added to scale the system. For example, a
single node TS7650G can sustain speeds of
450MB/sec, while a two-node cluster can
sustain 900MB/sec, all while accessing a
single large repository. Other competitors
talk about aggregate throughput numbers for
their clusters, which implies that they do not
support a global repository. In these
products, there is a separate repository for
each “node” so the performance numbers for
each node are not additive. Such products
lead to independent islands of storage, which
limits the capacity optimization ratios to
those
achievable
by
a
single
node.
Enterprises that are looking to consolidate
their backup sets to improve efficiencies and
reduce management points, necessarily
prefer solutions with high single system