Simplify Cloud Data Security: A Deep Dive Into Protecting Sensitive Data in Java

  sonic0002        2023-04-28 21:22:10       1,076        0    

Featuring encryption, anonymization, hashing, and access control

Network security incidents occur now and then, mostly caused by data leakage. Data security has aroused widespread concern, and the community keeps working hard on approaches to simplify data security, especially in sensitive data protection.

Sensitive data includes but is not limited to personally identifiable information (PII) like names, ID numbers, passport numbers, driver’s license numbers, contact information like addresses, phone numbers, account credentials like usernames, passwords, PINs, and financial information like credit card numbers, bank account numbers, bank codes.

There are many aspects to properly protecting data. In the next section, I will cover data protection methods in Java, including encryption, anonymization, hashing, and access control.

The native implementations to secure sensitive data are similar across different language environments. So, let me take Java as an example and walk you through some of these aspects.

Data encryption

​​Java provides libraries like Java Cryptography Architecture (JCA) and Java Cryptography Extension (JCE) for encrypting and decrypting data. The most common encryption algorithms are AES and RSA. Let’s see an example with AES to see how it works:

import javax.crypto.Cipher;
import javax.crypto.KeyGenerator;
import javax.crypto.SecretKey;
import java.util.Base64;
public class DataEncryption {
    public static void main(String[] args) throws Exception {

        KeyGenerator keyGenerator = KeyGenerator.getInstance("AES");
        keyGenerator.init(128);
        SecretKey secretKey = keyGenerator.generateKey();

        Cipher cipher = Cipher.getInstance("AES");
        cipher.init(Cipher.ENCRYPT_MODE, secretKey);
        byte[] encryptedData = cipher.doFinal("Sensitive Data".getBytes());

        cipher.init(Cipher.DECRYPT_MODE, secretKey);
        byte[] decryptedData = cipher.doFinal(encryptedData);

        System.out.println("Encrypted Data: " + Base64.getEncoder().encodeToString(encryptedData));
        System.out.println("Decrypted Data: " + new String(decryptedData));
    }
}

Data anonymization

Data anonymization is a data protection technology that reduces the risk of data leakage by modifying or replacing sensitive data, ensuring that sensitive information is not exposed even if accessed by unauthorized users.

It is widely applied in scenarios like data sharing, testing, development, and analysis, and its methods include masking, synthetic data generation, data slicing, data perturbation, data generalization, etc. Among the above, masking is the most common one. It replaces parts of sensitive data with random characters, specific symbols, or other irrelevant information, such as hiding the middle digits of a phone number with asterisks.

In Java, this is done by replacing the string that conforms to the rules.

private static String maskPhoneNumber(String phoneNumber) {
    return phoneNumber.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
}

Data hashing

​​For some sensitive data, high-cost encryption is not necessary when hashing can provide sufficient protection with base64 or SHA-256. The following is an example of SHA-256, which greatly reduces the risk of being attacked when combined with salt.

private static String hashValue(String s, String salt) throws NoSuchAlgorithmException {
    MessageDigest messageDigest = MessageDigest.getInstance("SHA-256");
    messageDigest.update(salt.getBytes(StandardCharsets.UTF_8));
    byte[] hashedBytes = messageDigest.digest(s.getBytes(StandardCharsets.UTF_8));
    return Base64.getEncoder().encodeToString(hashedBytes);
}

private static String generateSalt() {
    SecureRandom secureRandom = new SecureRandom();
    byte[] salt = new byte[16];
    secureRandom.nextBytes(salt);

    return Base64.getEncoder().encodeToString(salt);
}

Access control

​​Apart from protecting the sensitive data itself, reasonable access rules and restrictions are of much help.

Java implements it with frameworks like Spring Security. This method ideally starts with two user roles (USER and ADMIN) being defined and the paths they access data being restricted, respectively. Refer to this GitHub gist for a demonstration.

While most developers prefer open source tools because of their transparency, for data privacy, it can be argued that non-open source tools provide a level of reliability in the event of breaking parts. A free tool I was recently introduced to during a project is Piiano Vault, which uses CLI and APIs to build secure applications.

It protects sensitive data by concentrating on the pseudonymization of data such as PII, PCI, PHI, KYC, and secrets. It uses field-level encryption, granular access control, tokenization, masking, and key rotation. Vault also centralizes the protection of sensitive data, so it helps with regulatory compliance and data sprawl.

Architecture

The core of Vault is the Vault Server, which is composed of a Control Server and a Data Server. The former manages schemas, transformations, and other configurations, while the latter handles all data-related operations.

from https://piiano.com/docs/architecture/components

The Server provides REST APIs to interact with Vault. There are two concepts involved here: Collections and Objects.

Collections is a set of schema collections in which each schema corresponds to the data prototypes to be protected, such as personal information, payment details, etc.

Objects, in which an object corresponds to a specific property in the schema, can be tokenized and encrypted.

The Vault deployed in the cloud supports managing secrets with the KMS from AWS and GCP. Check their documentation for the specific deployment.

Installation

Get the license from their website.

Install from Docker.

docker run --rm --init -d \
  --name pvault-dev \
  -p 8123:8123 \
  -e PVAULT_SERVICE_LICENSE=${license} \
  piiano/pvault-dev:1.3.1

Prepare Vault CLI.

alias pvault="docker run --rm -i --add-host='host.docker.internal:host-gateway' -v $(pwd):/pwd -w /pwd piiano/pvault-cli:1.3.1

Install Vault in Kubernetes with its helm charts.

helm repo add piiano https://piiano.github.io/helm-charts

Vault supports CLI and can also be integrated with Java, Python, and Typescript. In Java, it can be used in the following two ways:

  • java-sdk, generated by openapi, includes all requests and responses relating to REST API used in interacting with the Server, such as CryptoApi, TokensApi, etc.
openapi-generator generate -i openapi.yaml -g java -o java-sdk/
  • java-client, an official client end that encapsulates the above API.

Tokenization

Tokenization is a non-sensitive token form that provides sensitive data externally, making applying relevant sensitive data in non-secure environments or systems possible. Only authorized users can access sensitive data via tokens. Tokens enable you to send a non-sensitive reference to a value instead of the actual data.

There are usually two ways to use Vault for tokenization in Java.

Tokenize at the ORM level with TokenizationListener.

It can be implemented by adding the respective annotations on the relevant model class and fields. Below is part of the example of tokenizing Name and PhoneNumber in User.

Tokenize with SDK via the APIs in the service.

See the following part for reference and demo-app-using-vault-sdk for the complete example.

Masking

Masking is called transformations in Vault, and each data_type has a corresponding transformer. The table below lists the built-in transformers in Vault.

From Piiano Vault Transformation

The data types in the table are common built-in data types defined in Vault for sensitive data. For instance, PHONE_NUMBER is a string of up to 15 digits with an optional leading ‘+.’ Hyphens ‘-’ may separate groups of digits in the string; for example, +1–123–4567890, which complies with E.164.

The masked data in Vault will not be saved but be obtained with the passed-in parameters each time the data is fetched. For example, the masking field of the email is email.mask, which can be obtained with the following code:

try {
   QueryToken queryToken = new QueryToken().tokenIds(ImmutableList.of(tokenId));
   List<DetokenizedToken> detokenize = tokensClient.detokenize(queryToken, ImmutableList.of("email.mask"),
           emptyList(),DetokenizeParams.builder().collection(collection)
                   .accessReason(AppFunctionality).build());
   return (String)detokenize.get(0).getFields().get(propName);
} catch (ApiException e) {
   throw new RuntimeException("failed to detokenize a token", e);
}

Encryption and decryption

Encryption in Vault is fully transparent. It supports the encryption of field-level data, saves only the encrypted data in the database, and uses TLS to encrypt the data in transit.

The encryption and decryption in Java are the same as above but with the interaction with CryptoApi. There is an example given in the client SDK.

public EncryptedValue encrypt(EncryptionRequest encryptionRequest) throws ApiException {
   List<EncryptedValue> encryptedValues = this.cryptoApi.encrypt(
           this.defaultParams.getCollection(),
           this.defaultParams.getAccessReason().getReason(),
           ImmutableList.of(encryptionRequest),
           this.defaultParams.getExpirationSecs(),
           this.defaultParams.getAccessReason().getAdhocReason(),
           this.defaultParams.isReloadCache()
   );

   if (encryptedValues.size() == 0) {
       return null;
   }
   return encryptedValues.get(0);
}

Identity and access management

Vault provides permission control for various resources like field level({collection-name}/properties/{property-name-or-transformation-binding}) and token({collection-name}/tokens) by default, and supports the six operations of read, write, delete, search, tokenize, and detokenize.

To enable permission control, we should assign these policies to the corresponding roles and bind the roles with the users as well.

When adding new permission controls in Java, we are required to generate AuthUser and the relevant apiKey ourselves, configure AuthRole and AuthPolicy, and call IamApi eventually to interact with the Vault server to generate a new IAM. See the following example:

public void addIAM() {
  var IamApi api = new IamApi();
  APIKey key = api.regenerateUserApiKey("user");
    AuthConfig authConfig = new AuthConfig();
  AuthRole role = new AuthRole();
  role.addCapabilitiesItem("CapCollectionsReader");
  role.addPoliciesItem(""PolReadAll"");
  AuthUser user = new AuthUser();
  user.setRole("role");
  authConfig.putRolesItem(key, role);
  try {
     api.setIamConf(authConfig);
  } catch(ApiException ex) {
     log.info("failed to set iam config",ex);
  }
}

Conclusion

Protecting sensitive data is a crucial aspect of any application, and developers have several options to implement these protections. Using pure Java code can be time-consuming and challenging, as it requires manual implementation of data pseudonymization, encryption, and access control. So, I used a tool for a solution that simplifies the process of sensitive data protection. Using tools like Vault, we can efficiently secure our sensitive data, reduce the scope of compliance challenges, and focus on building our applications, ultimately saving time and resources.

Thanks for reading!

Note: The post is authorized by original author to republish on our site. Original author is Stefanie Lai who is currently a Spotify engineer and lives in Stockholm, original post is published here.

JAVA  SECURITY  ENCRYPTION  DECRYPTION  TOKENIZATION 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Go Error Handling in Practice

This is some real code from Kubernetes repo and it uses lots of errors for each call, does this looks strange or terrible? Is it a good design of error handling in Go? Any better solution? Using Rust's ? operator might be a better one in this specific case.