Abstract—The current Web has no general mechanisms to make digital artifacts — such as datasets, code, texts, and images —
verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to
enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that
rely onWeb resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we
propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts,
in a manner that is independent of the serialization format in the case of structured data files such as nanopublications.We demonstrate
how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range
of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized
architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that
these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.
INTRODUCTION
IN many areas and in particular in science, reproducibility
is important. Verifiable, immutable, and
permanent digital artifacts are an important ingredient
for making the results of automated processes reproducible,
but the current Web offers no commonly accepted
methods to ensure these properties. Endeavors
such as the Semantic Web to publish complex knowledge
in a machine-interpretable manner aggravate this
problem, as automated algorithms operating on large
amounts of data can be expected to be even more vulnerable
than humans to manipulated or corrupted content.
Without appropriate counter-measures, malicious actors
can sabotage or trick such algorithms by adding just a
few carefully manipulated items to large sets of input
data. To solve this problem, we propose an approach to
make items on the (Semantic) Web verifiable, immutable,
and permanent. This approach includes cryptographic
hash values in Uniform Resource Identifiers (URIs) and
adheres to the core principles of the Web, namely openness
and decentralized architecture. This article is an
extended and revised version of a conference paper [1].
A cryptographic hash value (sometimes called cryptographic
digest) is a short random-looking sequence of
bytes (or, equivalently, bits) that are calculated in a
complicated yet perfectly predictable manner from a
digital artifact such as a file. The same input always
leads to exactly the same hash value, whereas just a
minimally modified input returns a completely different
value. While there is an infinity of possible inputs that
verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to
enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that
rely onWeb resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we
propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts,
in a manner that is independent of the serialization format in the case of structured data files such as nanopublications.We demonstrate
how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range
of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized
architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that
these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.
INTRODUCTION
IN many areas and in particular in science, reproducibility
is important. Verifiable, immutable, and
permanent digital artifacts are an important ingredient
for making the results of automated processes reproducible,
but the current Web offers no commonly accepted
methods to ensure these properties. Endeavors
such as the Semantic Web to publish complex knowledge
in a machine-interpretable manner aggravate this
problem, as automated algorithms operating on large
amounts of data can be expected to be even more vulnerable
than humans to manipulated or corrupted content.
Without appropriate counter-measures, malicious actors
can sabotage or trick such algorithms by adding just a
few carefully manipulated items to large sets of input
data. To solve this problem, we propose an approach to
make items on the (Semantic) Web verifiable, immutable,
and permanent. This approach includes cryptographic
hash values in Uniform Resource Identifiers (URIs) and
adheres to the core principles of the Web, namely openness
and decentralized architecture. This article is an
extended and revised version of a conference paper [1].
A cryptographic hash value (sometimes called cryptographic
digest) is a short random-looking sequence of
bytes (or, equivalently, bits) that are calculated in a
complicated yet perfectly predictable manner from a
digital artifact such as a file. The same input always
leads to exactly the same hash value, whereas just a
minimally modified input returns a completely different
value. While there is an infinity of possible inputs that
Comments
Post a Comment