vacationvorti.blogg.se - Internet iceberg encryption

#INTERNET ICEBERG ENCRYPTION UPDATE#
#INTERNET ICEBERG ENCRYPTION FULL#

Now let's suppose we were to implement all of the above with KMS as the storage backend. We remark that GeneratedEncryptionKey as a pair of decrypted and encrypted key is required because we need both the decrypted key to actually encrypt the output stream wherever we're doing the encrypted-write itself, as well as the encrypted key to store in the manifest file after the fact.List generateEncryptionKeyBatch(List filePaths) Optional, can just default to calling the individual generate(.) multiple times. GeneratedEncryptionkey generateEncryptionKey(String filePath) List resolveEncryptionKeyBatch(List metadatas) Optional, can just default to calling the individual resolve(.) multiple times. Rolling new versions of master keys is supported.ĮncryptionKey resolveEncryptionkey(EncryptionMetadata metadata).If the user keeps a local key, it can only be used to decrypt that file.

The encryption never reuses a local key/iv pair.There is only one trip per a file to the KMS during reading or writing.The master key stays in the KMS and is never given to the user.Use the decrypted key and iv to decrypt the file as needed. When reading, use the metadata from the manifest to have the KMS decrypt the key for that file.

#INTERNET ICEBERG ENCRYPTION UPDATE#

When you update the manifest for the file, add the key name, key version, encryption algorithm, iv, and encrypted local key.

When writing, call the KMS to generate a random local key and the corresponding encrypted bytes.

Define the key name in the Iceberg table metadata.

I haven't looked at the details of Palantir's hadoop-crypto library, but the approach looks good. I understand that column encryption take file format support and that isn't available yet, although it will be available for ORC soon. So it's ideal if the KeyManager could support getting and putting multiple keys at once, as well as implementing the Spark Data Source and other Iceberg clients to contact the backend as few times as possible. I suppose though a file would only be able to be encrypted one way or the other way strictly if we encrypt the whole file, you more or less lose all the benefits of per-column encryption.Īdditionally a key part of performance is reducing the number of round trips made to the key storage backend, particularly if the backend supports batch operations.

#INTERNET ICEBERG ENCRYPTION FULL#

So whatever solution we come up with should be able to handle a full file encryption or a per-column encryption. You'll also notice this as such in the hadoop-crypto library. That is to say, our internal storage solution doesn't handle storing multiple keys to decrypt different portions of the same file. What do you think about this package?Ĭolumn encryption is interesting on our side we haven't explored this yet, and thus would not really be able to handle per-column encryption, and need to, in the meantime, only encrypt at the top file layer. I was considering using Palantir's hadoop-crypto library to do the actual encryption portion of things.