Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

How can vaultless tokenization protect data in the cloud?

How do vaultless tokenization and standard tokenization differ, and what is the best way to use them for securing cloud data? Expert Dan Sullivan offers guidance and use cases.

What is the difference between vaultless tokenization and standard tokenization techniques when it comes to securing...

data in the cloud? Are there certain use cases when an enterprise should use one over the other?

For starters, tokenization in security is the process of substituting sensitive or confidential information with a token. It's important to ensure the original data cannot be derived from the token. A simple case of tokenization would involve substituting a Social Security number with a generic string of nine numbers. This kind of tokenization is used when the original data does not need to be recovered, but some data must be in place to satisfy other requirements -- for example, a column cannot be empty during a data load process.

There are some use cases where the original data must be recoverable, such as in the case of storing credit card data. The PCI DSS standard dictates controls that must be in place when storing credit card data, including the primary account number (PAN). Instead of copying PANs to multiple databases and locations, application developers can generate tokens for each PAN and store the token. If a system storing tokens -- but not PANs -- is compromised, there is no risk of losing the card number.

Since there may be times when the PAN is required, some applications need to store the original PAN with the token. These can be stored in a relational database or in some type of key value database; these are known as token vaults. Alternatively, a token may be generated using the original PAN and a secret key or parameter that allows calculation of the PAN with the secret key and the token. These are vaultless tokens.

Vaultless tokenization does not require a database to store key value pairs, reducing the time required to complete a transaction that requires PAN recovery. Calculations on tokens are generally faster than database lookups, which can have longer latency. In both cases, a resource needs to be secured and maintained. In the case of token vaults, the database needs to be constantly available and able to scale to spikes in demand. In the case of vaultless tokenization, the secret key must be protected. Updating the secret key because the original is compromised, for example, will require updating all tokens computed with the original compromised key. Enterprises should weigh the pros and cons of token vaults and vaultless tokenization in light of their respective requirements, access to scalable database servers, and expected load on their applications.

Next Steps

Find out more about how tokenization in the cloud could replace encryption

Learn more about what merchants should know about PCI guidelines for tokenization

Read up on the pros and cons of tokenization and end-to-end encryption

Dig Deeper on Cloud Computing Frameworks and Standards

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Which type of tokenization does your organization use for cloud data, and why?
In addition to the excellent points made in the article, there are some additional key differences between vault-based and vaultless tokenization that readers should know. For example, vault-based tokenization systems are based on DLTs (dynamic lookup tables, a mapping table between plaintext data and tokens). These tables typically only grow in size over time and never shrink. This makes their read/write performance degrade over time. Vaultless systems are based on SLTs (static lookup tables) which manifest constant performance over time. In addition, vault-based tokenization systems utilize a centralized architecture because of the need for a single repository for token tables (vault). This is necessary to resolve token collisions. This presents a significant hindrance in geographical distribution of such systems. Vaultless tokenization systems fundamentally support a distributed architecture where the SLTs are distributed to and cached locally on individual nodes (application or data nodes where data security policies are enforced). This makes such nodes completely independent from each other, thereby supporting geographically independent distribution and scalability. While sensitive data can be secured in cloud environments using any viable cryptographic mechanism, vaultless tokenization also offers certain unique advantages when applied to CASB (cloud access security broker) products meant for securing enterprise sensitive data in public cloud environments. One such advantage is that vaultless tokenization systems offer geographically distributed scalability. This is especially useful for multinational enterprises aiming to secure sensitive data in cloud applications. Such enterprises can distribute instances of CASB closer to their regional office personnel which not only eliminates traffic latency introduced by traversing a centralized CASB (as in vault-based tokenization systems) but also may help address data residency requirements. Raj Jain, Cloud Security Software Architect, Protegrity