Hiding Data in DB2

Fine-grained access control is usually done in the database, but a programmable proxy can sometimes be used as an alternative. Learn more in this article.

Max Tardiveau

Feb. 22, 24 · Analysis

Like (1)

Save

2.8K Views

In this article, we'll talk about fine-grained access control in DB2 - hiding data that is, in fact, present in the database, but should not be accessible to certain users.

Fine-grained access control is usually done in the database itself as the data is being accessed, but it can also be done between the database server and the database client using a programmable proxy.

Each of these approaches has its advantages and disadvantages. Let's look at each one and contrast them.

DB2’s Fine-Grained Access Control Capabilities

DB2 has excellent fine-grained access control capabilities - among the best in the RDBMS world.

They come in two flavors:

Row and column access control (RCAC), which includes row permissions and column masks
Label-based access control (LBAC), which relies on security labels and security policies

There is some overlap between these two approaches. The biggest difference is that RCAC is code-based (it involves the execution of SQL code to determine whether a piece of data is accessible and how) whereas LBAC is data-based (it relies on declarations and labels, without any code).

Row and Column Access Control (RCAC)

RCAC, as its name suggests, consists of two parts:

Row permissions, which allow you to define who has access to which rows
Column masks, which allow you to define how certain columns are presented to the database client

By combining these, you can achieve the most common access control requirements.

Row Permissions

Row permissions are expressed as pieces of code that define which rows are visible. For instance:

     SQL 
   
 
 
   CREATE PERMISSION CustomersEuropeOnly ON DEMO.CUSTOMERS
    FOR ROWS WHERE
        VERIFY_GROUP_FOR_USER(SESSION_USER, 'SALES_EUROPE') AND
        COUNTRY IN ('BE', 'DK', 'ES', 'FR', 'GR', 'HN', 'IT')
    ENFORCED FOR ALL ACCESS
    ENABLE; 
  

This specifies that database users who belong to group SALES_EUROPE can see the rows in table DEMO.CUSTOMERS where the COUNTRY column has one of the specified values. These values could be looked up or computed from the database, and permissions can call SQL functions including secure user-defined functions, so there is quite a bit of flexibility.

Additional permissions can be added to the same table, and DB2 will automatically apply all of them and show only the rows that satisfy at least one permission.

This type of permission covers all access to the table: select, insert, update, and delete, directly or through views. For instance, if the user tries to insert a row that does not satisfy at least one permission, the insert will fail. Same, obviously, for update and delete.

Column Masks

Column masks are also expressed as pieces of code that specify how a column should be presented to the client. For instance:

     SQL 
   
 
 
   CREATE MASK CUSTOMERS_LAST_NAME_MASK ON DEMO.CUSTOMERS
  FOR COLUMN LAST_NAME RETURN
    CASE WHEN (VERIFY_ROLE_FOR_USER(SESSION_USER, 'ENGINEER') AND COUNTRY IN ('BE', 'ES'))
    THEN SUBSTR(LAST_NAME, 1, 1) || 'XXXX'
    ELSE LAST_NAME
END
  ENABLE; 
  

This specifies that, for database users who have the role ENGINEER, the column LAST_NAME in table DEMO.CUSTOMERS should be masked, with just the first character showing, followed by XXXX, but only for customers in certain countries.

You can only have one mask per column, so things can get painful to manage if you have a lot of logic.

Label-Based Access Control (LBAC)

Unlike permissions and masks in RCAC, which are defined by code, label-based access control is declarative.

It works by creating a system of label components, which can be organized either as unordered sets, e.g.:

     SQL 
   
   CREATE SECURITY LABEL COMPONENT COMPARTMENT
        SET {'RESEARCH', 'MANAGEMENT', 'FINANCE'};

As ordered arrays, where higher levels subsume lower levels, e.g.:

     SQL 
   
   CREATE SECURITY LABEL COMPONENT CLASSIFICATION_LEVEL
        ARRAY [ 'TOP SECRET', 'SECRET', 'CONFIDENTIAL', 'PUBLIC' ];

Or in trees, where again higher levels subsume lower levels:

     SQL 
   
 
 
   CREATE SECURITY LABEL COMPONENT REGION
TREE (
    'WORLD' ROOT,
        'AMERICAS' UNDER 'WORLD',
            'CANADA' UNDER 'AMERICAS',
            'USA' UNDER 'AMERICAS',
                'CALIFORNIA' UNDER 'USA',
                'TEXAS' UNDER 'USA',
                etc... 
  

This is quite powerful, though managing all these can quickly become a challenge.

Label components can then be assembled into labels, e.g.:

     SQL 
   
   CREATE SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
   COMPONENT COMPARTMENT 'RESEARCH',
   COMPONENT CLASSIFICATION_LEVEL 'TOP SECRET',
   COMPONENT REGION 'USA';

These labels can then be used to protect individual columns:

     SQL 
   
 
 
   CREATE TABLE DEMO.CUSTOMERS (
    ID            INT NOT NULL,
    FIRST_NAME    NVARCHAR(100) NOT NULL,
    LAST_NAME     NVARCHAR(100) NOT NULL,
    COUNTRY       CHAR(2) NOT NULL,
    BALANCE       NUMERIC(12,2) NOT NULL SECURED WITH FINANCE_CONFIDENTIAL_WORLD,
    SECLABEL      DB2SECURITYLABEL NOT NULL,
    PRIMARY KEY (ID)
)
SECURITY POLICY POLICY2024; 
  

As well as individual rows:

     SQL 
   
   INSERT INTO DEMO.CUSTOMERS(ID, FIRST_NAME, LAST_NAME, COUNTRY, SECLABEL) VALUES
(21, 'Wernher', 'von Braun', 'US', SECLABEL_BY_NAME('POLICY2024', 'RESEARCH_TOPSECRET_USA'));

Finally, users, roles, and groups can be given access to the labels for reading, writing, or both:

     SQL 
   
   GRANT SECURITY LABEL POLICY2024.RESEARCH_TOPSECRET_USA
TO USER TJONES FOR READ ACCESS;

There is even more to LBAC, such as an exemption mechanism.

This is an elaborate system, which has complex rules governing things like security upgrades, default behaviors, backups and data movements, and everything else security auditors like to poke around. It is powerful enough to implement the vast majority of fine-grained access control needs, though scaling this system to cover large, complex requirements with many dimensions can be a significant challenge to design, implement, and manage.

Pros and Cons of DB2's Fine-Grained Access Control

Pros

Built into the database: No need for any other software, other than perhaps something to help manage all that security metadata
Rock-solid and certified: This is in use by many of the most security-conscious organizations around the world.
Automatically governs all relevant queries and updates

Cons

Adds an additional burden on the database: Most of these mechanisms are actually implemented by merging them into the query, which can result in some very complex (and expensive) queries if you have a lot of masks, permissions, and labels.
Does not cover certain data types, such as XML and LOBs
If a permission becomes invalid (which can easily happen in the case of complex permissions), it will block all access to the table for which it was defined. This can obviously be disruptive, and non-trivial to debug.
All these labels and permissions must be managed by a database user with the SECADM authority.

Many of the advantages and disadvantages of this solution come from the fact that it's a centralized solution, which is great for some scenarios, and less great for others.

Proxy-Based Fine-Grained Access Control

A very different approach consists of restricting data access between the database server and the database client, using a programmable proxy.

This is a more decentralized approach: some trusted clients may access the database directly, whereas some less trusted clients may have to go through the proxy. And of course, you may have different proxies for different types of clients.

Because this kind of access control is done outside of the database, it is a less powerful approach in some ways, but it does have certain advantages.

Restricting or Rewriting Queries

A proxy can intercept any SQL command on its way to the database and potentially change it or reject it outright.

For instance, the proxy can simply reject certain queries based on the application user, the data being accessed, the time of day, etc.

Going further, the proxy can rewrite queries so that they satisfy our data-hiding requirements. This rewriting can of course vary depending on who the user is, where the call is coming from, or any other useful conditions.

For instance, a proxy can trivially intercept a query like:

     SQL 
   
   SELECT ID, FIRST_NAME, LAST_NAME, BALANCE
FROM DEMO.CUSTOMERS 
WHERE COUNTRY = 'FR'

And rewrite it to:

     SQL 
   
   SELECT ID, FIRST_NAME, SUBSTR(LAST_NAME, 1, 1) || 'XXXX' AS LAST_NAME, 0 AS BALANCE
FROM DEMO.CUSTOMERS 
WHERE COUNTRY = 'FR' AND REGION <> 'Z'

This all happens before it gets sent to the database. This approach works best for known queries, although it is often possible to analyze the query dynamically and modify it as needed.

Modifying Result Sets

When the database responds to a query, the result sets or result values go through the proxy, which can manipulate them as needed.

For example, using Gallium Data as a proxy, a trivial result set filter might look like:

     JavaScript 
   
   if (row.COUNTRY === 'FR') {
    row.LAST_NAME = row.LAST_NAME.substring(0, 1) + "XXXX";
    row.BALANCE = 0;
}

This extra processing will add a small amount of latency, but that work needs to be done somewhere, and by doing it in the proxy, you're saving the database from having to do it.

The proxy has complete control over result sets and result values: it can modify rows, remove entire rows, or even insert new rows into a result set.

The Pros and Cons of Proxy-Based Data Hiding

Pros

Requires no changes to the database, the database server, or the clients. This is clearly the most attractive feature of this approach: nothing has to change in your existing system.
Puts no extra load on the database
Can cover stored procedure execution, including parameter values and returned result sets
Can be modulated based on the database user, the application user, the application's behavior, or any other factors
Different proxies can enforce different sets of constraints.
The constraints are administered outside of DB2 and require no special permissions in the database.

Cons

Slight performance cost (typically on the order of 5%-10% increased latency)
Limited in the case of free-form queries, since the proxy will never understand the queries as deeply as the database can
Cannot entirely hide some data (e.g., aggregate data can reveal the existence and value of some hidden data).

Conclusion

The easiest way to look at this is to ask yourself what will best serve your requirements: a centralized solution, or a decentralized solution?

If you're ok managing all your access control in DB2, then you should clearly go that way: it's extremely solid and flexible, and it's all built-in.
If you'd rather not change your database, or DB2's access control does not fulfill your requirements, then using a proxy may be the easiest route - sometimes the only route.

Database server security sql Management system

Published at DZone with permission of Max Tardiveau. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Hiding Data in DB2

Fine-grained access control is usually done in the database, but a programmable proxy can sometimes be used as an alternative. Learn more in this article.

DB2’s Fine-Grained Access Control Capabilities

Row and Column Access Control (RCAC)

Row Permissions

Column Masks

Label-Based Access Control (LBAC)

Pros and Cons of DB2's Fine-Grained Access Control

Pros

Cons

Proxy-Based Fine-Grained Access Control

Restricting or Rewriting Queries

Modifying Result Sets

The Pros and Cons of Proxy-Based Data Hiding

Pros

Cons

Conclusion

Related

Partner Resources