EPITOME(1) OpenBSD Reference Manual EPITOME(1)
NAME
epitome - deduplication services
DESCRIPTION
The epitome suite consists of several discrete pieces that provide stor-
age deduplication services. Deduplication is defined as the elimination
of redundant data.
epitome provides a number of services to enable three major archiving
technologies: CAS, SIS, and DEDUP. Since these three are often (ab)used,
epitome defines them as follows:
CAS (Content Addressable Storage)
CAS, also referred to as associative storage, is a mechanism for stor-
ing information that can be retrieved based on its content, not its
storage location. It is typically used for high-speed storage and re-
trieval of fixed content, such as documents stored for compliance with
government regulations and medical content.
CAS is a method to archive content and provide the issuer a UUID for
identification at a later time. The user or application using this
service is responsible for maintaining the UUID to content mapping
(e.g. UUID -> file).
CAS is not traditionally associated with dedup technologies, however
in epitome it is a trivial addition. An added benefit when using CAS
is that content is never stored more than once on the physical media
because the chunks that make up the content are automatically deduped.
Inherent to this mechanism is also a data integrity element. The
unique fingerprint that identifies the data doubles as a hash for the
actual content.
In an ideal world CAS is a mountable write-once filesystem that is
transparent to the user operation where the user can associate policy
to the content. All stored content is immutable.
SIS (Single Instance Storage)
SIS is essentially application deduplication and is best explained
with an example. Consider user A sending user B an email; minus the
mail header the email is identical so a client (in this case the mail
server) that uses SIS would only save the content once. Now imagine
this email being sent to 100 people; the savings are considerable at
this time.
The big difference between SIS and the other techniques is that the
application using it must conform to an API and be written specifical-
ly to interface with the dedup system.
DEDUP (Deduplication)
DEDUP is really the underlying technique for all incarnations of hash
based storage. It uses a chunk based hash to determine what is dupli-
cate. Where CAS has a content <-> hash association, DEDUP is purely a
hash for some arbitrary block of some arbitrary size.
The limits of any of these technologies are directly linked to the amount
of meta-data that is saved. The more meta is saved the more capabilities
can be implemented. For example to make a CAS system there needs to be
some sort of UUID -> blocks association (e.g. a catalog). The design of
epitome allows for flexibility on the back-end so that one or more front-
ends can be used provided that there is storage space for the additional
meta-data available and that the computational overhead is acceptable.
The idea behind epitome is to provide a WORM based archive/backup mecha-
nism that is lossless and offers permanent storage with inherent data
protection properties. Additionally, epitome provides several metadata
formats and back-ends to meet several usage models.
SEE ALSO
epitomize(1), eprepare(1), epitome(3)
HISTORY
epitome first appeared in OpenBSD 4.5.
AUTHORS
The epitome suite was written by Marco Peereboom <marco@peereboom.us>.
CAVEATS
The epitome suite is currently considered experimental.
Not everything mentioned in this manual has been implemented yet.
OpenBSD 4.4 October 6, 2008 2