Monday, August 25, 2014

Introduction to Oracle Label Security 12c

In the previous post, we looked at how to support fine-grained access control of tables using Oracle VPD technology. In this post, we are going to have a quick look at another technology with similar purpose called OLS (Oracle Label Security) which is built on top of VPD.

A commonly raised question is that if both VPD and OLS serve the same purpose, why do we need OLS? While VPD is a generic solution, OLS addresses a very specific and common use case in many government and corporate environment - allow access to data based on the need to know basis. Unlike VPD which requires administrator to write PL/SQL code for the policy function, OLS requires no programming. You can think of OLS as a out-of-the-box VPD with built-in policies.

What exact does OLS address?
Many government and business organizations, label their data using different sensitivity levels, such as top secret (TS), secret (S), confidential (C), unclassified (UC), depending on the perceived risk levels so that they can selectively protect most valuable assets of the organization. Users are assigned different clearance levels based on their job functions and seniority. Each user is allowed to access only those data records with the sensitivity level no more than the clearance level of the user. For example, if user tom is assigned the clearance level C, he can access data records with the sensitive levels C or UC only. He is not allowed to access data records with sensitivity levels greater than C, that is, TS or S.

How is OLS enforced?
OLS is a security in depth feature, meaning that users must first have the DAC table privileges necessary to access the OLS protected rows. On top that, there could be VPD policies that may be enforced as well. The following diagram shows the order of enforcement. (Not shown in the diagram: a database vault policy may be enforced on the table as well.)

Figure: OLS Enforcement

How does OLS work?
The following diagram shows the example that we will walk through later in this post. The table flight has data of different sensitivity level and based users' clearance levels they see different set of records even though they issue the same query. Row level access control is enforced transparent to end users.

Figure: OLS Example

How are row sensitivity labels assigned?
When a new record is inserted to an OLS protected table, there are three ways to assign sensitivity label to each row:
- Explicit specify a label (which should be between the maximum and minimum clearance of the user)
- Select 'LABEL_DEFAULT' option when creating the policy and the sessions ROW_LABEL value will be used automatically when inserting records
- Create a labeling function and specify when applying the policy on the table. This will automatically call your labeling function when inserting records

How are user clearance levels decided?
If it is a database user, SA_USER_ADMIN.SET_LEVELS procedure assigns a default level to each user. At the time of login user will have this clearance level. In a session, user can change the clearance level to a level less than the maximum level granted using SA_SESSION.SET_LABEL procedure.

If it is an application user who does not have a database user, SA_SESSION.SET_ACCESS_PROFILE procedure allows to set an existing database user's profile to be the profile of the application user.

A working example:
SQL> -- First do the test setup
SQL> conn sys/password@cdb1_pdb1 as sysdba
Connected.
SQL> create user gov identified by gov;
User created.

SQL> create user bob identified by bob;
User created.

SQL> create user tim identified by tim;
User created.

SQL> grant create session to gov, bob, tim;
Grant succeeded.

SQL> grant create table, unlimited tablespace to gov;
Grant succeeded.

SQL> conn gov/gov@cdb1_pdb1
Connected.
SQL> create table flight(flight# number, destination varchar2(100), payload varchar2(100));
Table created.

SQL> grant select, insert on flight to bob, tim, scott;
Grant succeeded.

SQL> -- insert test data
SQL> insert into flight values (505, 'Iraq', 'Weapon');
1 row created.
SQL> insert into flight values (506, 'Canada', 'Charcoal');
1 row created.
SQL> insert into flight values (706, 'Japan', 'Battery');
1 row created.
SQL> insert into flight values (501, 'Syria', 'Weapon');
1 row created.
SQL> insert into flight values (508, 'Israel', 'Jets');
1 row created.
SQL> insert into flight values (509, 'India', 'Aid');
1 row created.

SQL> 
SQL> conn lbacsys/password@cdb1_pdb1
Connected.
SQL> BEGIN
  2  -- Create OLS Policy
  3  -- Notice that the default_options is set to no_control to disable the policy
  4  -- in order add labels to the existing data items
  5  SA_SYSDBA.CREATE_POLICY(
  6    policy_name => 'ols_pol1',
  7    column_name => 'lb_col',
  8    default_options => 'no_control'
  9  );
 10  
 11  -- Create label component levels
 12  -- TOP_SECRET has the highest level of access
 13  SA_COMPONENTS.CREATE_LEVEL(
 14    policy_name => 'ols_pol1',
 15    level_num   => 4,
 16    short_name => 'TS',
 17    long_name   => 'top_secret'
 18  );
 19  
 20  
 21  SA_COMPONENTS.CREATE_LEVEL(
 22    policy_name => 'ols_pol1',
 23    level_num   => 3,
 24    short_name => 'S',
 25    long_name   => 'secret'
 26  );
 27  
 28  SA_COMPONENTS.CREATE_LEVEL(
 29    policy_name => 'ols_pol1',
 30    level_num   => 2,
 31    short_name => 'C',
 32    long_name   => 'confidential'
 33  );
 34  
 35  SA_COMPONENTS.CREATE_LEVEL(
 36    policy_name => 'ols_pol1',
 37    level_num   => 1,
 38    short_name => 'UC',
 39    long_name   => 'unclassified'
 40  );
 41  
 42  -- Create data labels
 43  SA_LABEL_ADMIN.CREATE_LABEL(
 44    policy_name => 'ols_pol1',
 45    label_tag   => 40,
 46    label_value => 'TS',
 47    data_label  => TRUE
 48  );
 49  
 50  SA_LABEL_ADMIN.CREATE_LABEL(
 51    policy_name => 'ols_pol1',
 52    label_tag   => 30,
 53    label_value => 'S',
 54    data_label  => TRUE
 55  );
 56  
 57  SA_LABEL_ADMIN.CREATE_LABEL(
 58    policy_name => 'ols_pol1',
 59    label_tag   => 20,
 60    label_value => 'C',
 61    data_label  => TRUE
 62  );
 63  
 64  SA_LABEL_ADMIN.CREATE_LABEL(
 65    policy_name => 'ols_pol1',
 66    label_tag   => 10,
 67    label_value => 'UC',
 68    data_label  => TRUE
 69  );
 70  
 71  -- Apply access_pol policy on table gov.flight
 72  SA_POLICY_ADMIN.APPLY_TABLE_POLICY(
 73    policy_name       => 'ols_pol1',
 74    schema_name       => 'gov',
 75    table_name        => 'flight',
 76    table_options  => null,
 77    label_function => null,
 78    predicate         => null
 79  );
 80  
 81  -- Add user authorizations (i.e. clearance levels)
 82  SA_USER_ADMIN.SET_LEVELS(
 83    policy_name => 'ols_pol1',
 84    user_name   => 'bob',
 85    max_level   => 'S',
 86    min_level   => 'UC',
 87    def_level   => 'S',
 88    row_level   => 'S'
 89  );
 90  
 91  SA_USER_ADMIN.SET_LEVELS(
 92    policy_name => 'ols_pol1',
 93    user_name   => 'tim',
 94    max_level   => 'UC',
 95    min_level   => 'UC',
 96    def_level   => 'UC',
 97    row_level   => 'UC'
 98  );
 99  END;
100  /
PL/SQL procedure successfully completed.

SQL> 
SQL> conn system/password@cdb1_pdb1
Connected.
SQL> -- Add labels to rows
SQL> update gov.flight set lb_col = char_to_label('ols_pol1', 'TS') where payload in ('Weapon');
2 rows updated.

SQL> update gov.flight set lb_col = char_to_label('ols_pol1', 'S') where payload in ('Jets');
1 row updated.

SQL> update gov.flight set lb_col = char_to_label('ols_pol1', 'C') where payload in ('Battery');
1 row updated.

SQL> update gov.flight set lb_col = char_to_label('ols_pol1', 'UC') where payload in ('Charcoal', 'Aid');
2 rows updated.

SQL> 
SQL> conn lbacsys/password@cdb1_pdb1
Connected.
SQL> BEGIN
  2  -- Now we change the policy to enfoce on read by first altering the policy
  3  -- and then removing and applying the policy again
  4  SA_SYSDBA.ALTER_POLICY(
  5    policy_name => 'ols_pol1',
  6    default_options => 'read_control, label_default'
  7  );
  8  
  9  SA_POLICY_ADMIN.REMOVE_TABLE_POLICY(
 10    policy_name => 'ols_pol1',
 11    schema_name => 'gov',
 12    table_name  => 'flight',
 13    drop_column => false
 14  );
 15  
 16  SA_POLICY_ADMIN.APPLY_TABLE_POLICY(
 17    policy_name => 'ols_pol1',
 18    schema_name => 'gov',
 19    table_name  => 'flight'
 20  );
 21  END;
 22  /

PL/SQL procedure successfully completed.

SQL> -- This is to demonstrate OLS can be bypassed with
SQL> -- special privileges. Here, scott is granted to read
SQL> -- the OLS protected table without any specific 
SQL> -- clearance level  
SQL> BEGIN
  2  SA_USER_ADMIN.SET_USER_PRIVS(
  3    policy_name  => 'ols_pol1',
  4    user_name    => 'scott',
  5    privileges   => 'READ'
  6  );
  7  END;
  8  /
PL/SQL procedure successfully completed.

SQL> column flight# format 9999;
SQL> column destination format a15;
SQL> column payload format a15;
SQL> 
SQL> 
SQL> conn bob/bob@cdb1_pdb1
Connected.
SQL> -- The following demonstrates how to change the row label
SQL> -- of the current session so that that label is used
SQL> -- when bob is inserting new records to gov.flight. 
SQL> select SA_SESSION.ROW_LABEL('ols_pol1') from DUAL;

SA_SESSION.ROW_LABEL('OLS_POL1')
--------------------------------
S

SQL> select SA_SESSION.LABEL('ols_pol1') from DUAL;

SA_SESSION.LABEL('OLS_POL1')
----------------------------
S

SQL> 
SQL> BEGIN
  2  SA_SESSION.SET_ROW_LABEL(
  3    policy_name  => 'ols_pol1',
  4    label        => 'UC'
  5  );
  6  END;
  7  /
PL/SQL procedure successfully completed.

SQL> -- The label of this row is automatically assigned as UC
SQL> insert into gov.flight (flight#, destination, payload) values (599, 'Peru', 'Medicine');
1 row created.

SQL> select flight#, destination, payload from gov.flight;

FLIGHT# DESTINATION     PAYLOAD
------- --------------- ---------------
    506 Canada          Charcoal
    706 Japan           Battery
    508 Israel          Jets
    509 India           Aid
    599 Peru            Medicine

SQL> 
SQL> conn tim/tim@cdb1_pdb1
Connected.
SQL> select flight#, destination, payload from gov.flight;

FLIGHT# DESTINATION     PAYLOAD
------- --------------- ---------------
    506 Canada          Charcoal
    509 India           Aid
    599 Peru            Medicine

SQL> 
SQL> conn scott/tiger@cdb1_pdb1
Connected.
SQL> select * from gov.flight;

FLIGHT# DESTINATION     PAYLOAD             LB_COL
------- --------------- --------------- ----------
    505 Iraq            Weapon                  40
    506 Canada          Charcoal                10
    706 Japan           Battery                 20
    501 Syria           Weapon                  40
    508 Israel          Jets                    30
    509 India           Aid                     10
    599 Peru            Medicine                10

7 rows selected.
SQL> -- Now we do the clean up
SQL> conn lbacsys/password@cdb1_pdb1
Connected.
SQL> BEGIN
  2  SA_SYSDBA.DROP_POLICY(
  3    policy_name => 'ols_pol1'
  4  );
  5  END;
  6  /
PL/SQL procedure successfully completed.

SQL> 
SQL> conn sys/password@cdb1_pdb1 as sysdba
Connected.
SQL> drop user gov cascade;
User dropped.
SQL> drop user bob cascade;
User dropped.
SQL> drop user tim cascade;
User dropped.

There is more to OLS:
I only scratched the surface of OLS. A data label is actually consists of three components:

1. Sensitivity level - has an ordering relationship
2. Compartment (optional) - used to segregate data within a level
3. Group (optional) - used to organizationally segregate data within a level - has a hierarchical relationship

In order for a user to able to access a labeled record, the following conditions must be satisfied:
a. User's clearance level should be at least as high as the the data record's sensitivity level
b. User has all the compartments associated with the data record
c. User belongs to one of the groups associated with the data record or a parent group of one of the associated groups

With the quick understanding of the OLS technology, I encourage you to look the Oracle documentation to learn more about OLS and its new powerful features.


Thursday, August 21, 2014

Introduction to Oracle Virtual Private Database (VPD) 12c

Before we get our hands on Virtual Private Databases (VPDs), let's try to understand if there is a real use case for VPD? Organizations have data of different sensitivity levels. For example, highly confidential trade secrets to classified publicly available news articles. From the point of view of databases, organizations needs to store data of different sensitivity levels in database tables and allow different users to access different set of data. For example, a manager is allowed to access records for all employees working under him/her whereas an engineer is allowed to access only his employment record. Oracle DAC privileges (or the privileges in any other major database system for that matter) either allow or deny access to the complete database objects such as tables, and views. There is no way to grant partial access to tables or views. What can we do about it?

One possible solution is to segregate data into different tables based on access control requirements. For the employee table example above, for each manager, one employee table with all of his/her subordinates could be created. That's a little too many tables! A better idea would be to create one table and multiple views on the same table for each manager. Even such an approach could be difficult to manage with so many objects and grants. Further, such approaches are not transparent to users as users need to know which table/view they can access in order to view employee records. Is there a magical way to do this segregation overcoming the above challenges? VPD is the answer! A bunch of smart people at Oracle got together and implemented a way to transparently present different records to different users based on context information. VPD has been around since Oracle 8i. Over time, now 12c, VPD has matured and evolved to support many powerful features. In this post, we are going to look at the very basic features of VPD as our goal is to get an idea of what VPD does. Once you have a hang of it, it is just a matter of referring to the user guide to absorb more advanced features of VPD.

So, what exactly VPD does? The basic idea behind VPD is really simple. It modifies user queries to add a predicate to show different set of records to different users. The query modification is transparent to users and based on the context information. The following diagram shows a very simple example.

Figure: A simple VPD example

As you can see from the above diagram, the context sensitive access policy is as follows:
"An employee can access her records and her subordinates' records if she is a manager"

Notice that VPD is transparent to database users. All employees execute the same query "select * from hr.emp". However, based on the access control policy, different users see different set of records. VPD does this by adding the predicate "upper(user) = SYS_CONTEXT('USERENV', 'CURRENT_USER') OR upper(manager) = SYS_CONTEXT('USERENV', 'CURRENT_USER'), where SYS_CONTEXT('USERENV', 'CURRENT_USER') returns the current database user.

Simple Code Example:
The following CDB setting is used to illustrate a dynamic VPD policy in action. (There are a few types of VPD polices such static, shared-static, and shared context-sensitive. These different types decide when to re-evaluate VPD polices as they use different caching strategies. For dynamic VPD policies, no caching strategy is used and policies are evaluated every time the a related statement is parsed or executed.)


Figure: CDB with a single PDB

Create the necessary users and grant privileges.
SQL> CONN sys/password@cdb1_pdb1 AS SYSDBA
Connected.

SQL> CREATE USER bob IDENTIFIED BY bob;
User created.

SQL> CREATE USER anne IDENTIFIED BY anne;
User created.

SQL> CREATE USER tim IDENTIFIED BY tim;
User created.

SQL> CREATE USER hr IDENTIFIED BY hr;
User created.

SQL> CREATE USER vpdadmin IDENTIFIED BY vpdadmin;
User created.

SQL> 
SQL> GRANT CREATE SESSION TO hr, bob, anne, tim, vpdadmin;
Grant succeeded.

SQL> GRANT CREATE TABLE, UNLIMITED TABLESPACE TO hr;
Grant succeeded.

Notice that one needs execute privilege on DBMS_RLS package in order to manage VPD policies.
SQL> GRANT EXECUTE ON DBMS_RLS TO vpdadmin;
Grant succeeded.

SQL> GRANT CREATE PROCEDURE TO vpdadmin;
Grant succeeded.

SQL> 
SQL> CONN hr/hr@cdb1_pdb1
Connected.
SQL> CREATE TABLE emp (ename VARCHAR2(50), title VARCHAR2(50), manager VARCHAR2(50), deptid NUMBER, salary NUMBER);
Table created.

SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('bob', 'manager', NULL, 1, 10000);
1 row created.
SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('alice', 'engineer', 'bob', 1, 8000);
1 row created.
SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('tim', 'engineer', 'bob', 1, 8000);
1 row created.
SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('eve', 'analyst', 'bob', 1, 7500);
1 row created.
SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('anne', 'manager', NULL, 2, 10500);
1 row created.

SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('denis', 'engineer', 'anne', 2, 9000);
1 row created.
SQL> INSERT INTO emp (ename, title, manager, deptid, salary) VALUES ('jack', 'engineer', 'anne', 2, 9500);
1 row created.

SQL> 
SQL> GRANT SELECT ON emp TO bob, anne, tim;
Grant succeeded.

Now vpdadmin user creates the VPD policy; first creates the policy function and then uses it to create the policy.
SQL> 
SQL> CONN vpdadmin/vpdadmin@cdb1_pdb1
Connected.
SQL> -- Should have two input arguments and a return value - all of them are
SQL> -- of type VARCHAR2
SQL> -- Return value should be a predicate (that fits in the WHERE clause).
SQL> CREATE OR REPLACE FUNCTION get_emp_pred
  2    (schema_v IN VARCHAR2, tbl_v IN VARCHAR2) RETURN VARCHAR2 IS
  3    BEGIN
  4      RETURN 'upper(ename) = SYS_CONTEXT(''USERENV'', ''CURRENT_USER'') OR upper(manager) = SYS_CONTEXT(''USERENV'', ''CURRENT_USER'')';
  5    END;
  6  /
Function created.


SQL> -- create VPD policy
SQL> -- Note: for INSERT statements, you need to specify update_check => true as well
SQL> BEGIN
  2    SYS.DBMS_RLS.ADD_POLICY(
  3      object_schema   => 'hr',
  4      object_name     => 'emp',
  5      policy_name     => 'emp_vpd1',
  6      function_schema => 'vpdadmin',
  7      policy_function => 'get_emp_pred',
  8      statement_types => 'select'
  9    );
 10  END;
 11  /
PL/SQL procedure successfully completed.

SQL> 
SQL> column ename format a10;
SQL> column title format a10;
SQL> column manager format a10;
SQL> column deptid format 99;
SQL> column salary format 999999;
SQL> 
SQL> CONN bob/bob@cdb1_pdb1
Connected.
SQL> select * from hr.emp;

ENAME      TITLE      MANAGER    DEPTID  SALARY
---------- ---------- ---------- ------ -------
bob        manager                    1   10000
alice      engineer   bob             1    8000
tim        engineer   bob             1    8000
eve        analyst    bob             1    7500

SQL> 
SQL> CONN tim/tim@cdb1_pdb1
Connected.
SQL> select * from hr.emp;

ENAME      TITLE      MANAGER    DEPTID  SALARY
---------- ---------- ---------- ------ -------
tim        engineer   bob             1    8000

SQL> 
SQL> CONN anne/anne@cdb1_pdb1
Connected.
SQL> select * from hr.emp;

ENAME      TITLE      MANAGER    DEPTID  SALARY
---------- ---------- ---------- ------ -------
anne       manager                    2   10500
denis      engineer   anne            2    9000
jack       engineer   anne            2    9500

Now we are going to clean up.
SQL> 
SQL> CONN vpdadmin/vpdadmin@cdb1_pdb1
Connected.
SQL> BEGIN
  2    SYS.DBMS_RLS.DROP_POLICY(
  3      object_schema => 'hr',
  4      object_name   => 'emp',
  5      policy_name   => 'emp_vpd1'
  6    );
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> 
SQL> CONN sys/password@cdb1_pdb1 as SYSDBA
Connected.
SQL> DROP USER bob CASCADE;
User dropped.

SQL> DROP USER anne CASCADE;
User dropped.

SQL> DROP USER tim CASCADE;
User dropped.

SQL> DROP USER hr CASCADE;
User dropped.

SQL> DROP USER vpdadmin CASCADE;
User dropped.

Today we looked at a very basic example of VPD. In a real world application, a user defined context is utilized to construct the VPD predicate. In addition to row level control, VPD allows column level control  as well.

Wednesday, August 13, 2014

Is Oracle Data Redaction broken?

In blackhat 2014 and earlier in 2013, David Litchfield has pointed out that "Oracle data redaction is broken and trivial to bypass". Here is an exaggerated scribe of his blackhat 2014 talk. Is that true that you can trivially bypass data redaction? Yes, of course, if you are using ad-hoc queries. You can trivially write a join query to fetch sensitive data if you have direct access to the database. Does that mean it is broken? No, not really; the problem is that Oracle data redaction does not claim to protect against ad-hoc queries. It is designed to be a helper function for applications to redact sensitive data before showing to application users. It can be considered as an application layer security feature. It is not a Database security feature that prevents leakage of sensitive data from the database when an attacker has direct access to the database.

Let me provide you an analogy using encryption which is not the defense for all threats. Transparent database encryption techniques provided by many database vendors, including, Oralce, Microsoft and IBM, is designed to protect data at rest. That is, if an attacker steals or gets access to a hard drive where the data is stored, they won't be able to access the actual data as it is encrypted and they don't have keys to decrypt it. However, if the attacker can compromise an application that access these encrypted data, they get access to the plaintext data. Does that mean the transparent encryption provided by all these vendors is broken? Certainly not. The same applies for Oracle data redaction.

Without the data redaction technique, an app developer would do the following to mask sensitive data, for example, credit card numbers:

1. Fetch credit card numbers from the database
2. Write application code to mask credit card numbers
3. Apply the application defined masking function before pushing the data user interface (e.g. web browser)

From the development and maintenance point of view, there are some pratical issues with the above approach:
- App developers need to write custom redaction code
- If multiple apps uses similar redaction or accesses multiple databases, it is difficult to maintain or update these redaction so that consistency is maintained in the long run. There is no way to centrally control it. It may require constant code changes.

Figure: Oracle data redaction example (Source: oracle.com)


Oracle data redaction helps overcome these practical issues. Further, notice that app already has privilege to access full credit card numbers. The objective of Oracle data redaction is not to block access to these credit card numbers. If the application is compromised, the attacker can use trivial bypass techniques to get his hands on the credit card numbers. Instead, the objective is to defend against app user compromises. If an app user is compromised, the can use his/her credential to login to the application and get access only to the redacted credit card number. In short, Oracle data redaction is designed to show the least information to app users.

If you want to prevent leakage of sensitive data from the database, you need use other preventive and detective techniques such as Oracle database vault and Oracle audit vault and firewall. This is a good example why organizations should have defense in depth to protect their valuables in databases!

Monday, August 11, 2014

Why NoSQL databases?


As you probably know, adoption of NoSQL databases is on the rise; and we see more and more NoSQL data stores being developed. So, what is the secret? Why is there a rise in the adoption? Here are some of the key point I think of:

- Some applications do not require all the complicated features offered by RDBMS, instead they want to scale. For example, ranking pages based on search keywords. NoSQL databases scales out (horizontal scaling) better than RDBMS.
- RDBMS is poor at handling unstructured data such as documents, user reviews, comments, etc. Certain NoSQL databases, such as MongoDB, and couchDB, are built specifically for such purposes.
- For rapid development, there is an increased desire to avoid data/schema pre-design. For example, in MongoDB the developer specify the schema in the code itself.
- Simpler APIs to access the data. For example, key-value stores such as Redis and BerkelyDB have a very simple store and fetch methods.
- Most of the NoSQL databases are open-source and works/scales better on commodity servers compared to RDBMS.
- Law latency compared to RDBMS for certain operations.
- Some applications do not require strict consistency guarantees. They can live with weaker eventual consistency (eventually everyone sees the same data). For example, twitter/facebook feeds do not need to be super consistent across all users. NoSQL takes advantage of this fact to scale better.

It's all good with NoSQL and let's switch from RDBMS to NoSQL, or is it not? NoSQL is not for all applications. In fact, not for 90% of the current business use cases. So, when you decide to use NoSQL think twice about the requirements before you jump into it. Here are some pointers to consider before you dive into long term adoption of NoSQL:

Some dangers of NoSQL:
- No schema - NoSQL databases such as MongoDB do not use fixed schemas and the schemas are not defined inside the database; schemas are defined in the code itself. While it looks great to have such flexibility, in the long run, it can create management issues if these schemas are not document properly as there is no fixed schema to look at.
- Denormalized data - maintain relationship in the code; this can create data quality issues
- Inconsistent APIs by different NoSQL providers (compared to SQL) - switching between products is not easy. So, fully understand the capabilities and limitations of the NoSQL database you are going to use before you commit to it.

NoSQL is not for you when you want:
- Strong consistency guarantees
- To execute complex queries (for example, MongoDB does not support transactions, nor joins)
- To maintain normalized data at the database to support integrity of the data.
- Strong compliance and security features - currently NoSQL databases lacks rigorous security controls
- To use transactions

While the adoption of NoSQL databases continue to rise, it is important to understand that one specific NoSQL database or a RDBMS may not meet all your organizations requirements. Depending the on the type of data you are working with (e.g. structural, documents, key-values, etc.) and the type of workload (e.g. OLTP, OLAP, batch processing, search, stream processing, etc.), you may have to utilize multiple databases in your organization. The industry has identified this need and that is why we increasingly hear about polyglot persistence. It basically means that multiple databases are utilized to build applications. We will look into polyglot persistence in another blog post.

Friday, August 8, 2014

Introduction to Oracle 12c Multitenant Database - Part 1

Up until 12c, Oracle database had one instance can work with only one database (barring RAC - Real Application Clusters; multiple instances share one single physical database). Before we look at what 12c is, let's try to understand why one instance - one database approach is not optimal with the current advances in hardware and software. Usually in an organization, for different applications, one or more databases are used. Often, these databases run on multiple physical servers, possibly in different operating systems. Usually one database instance per one server. The servers are grossly underutilized. What can we do about it?

Figure: Oracle legacy database (11g and before) - one-to-one mapping

One possible solution is to run multiple instances-databases in the same server. It works to some extent to support a few database instances per machine, but the approach still is not so efficient in terms of resource utilization. The reason is that multiple instances do not share background processes, system/process memory or Oracle meta-data. Further, it is difficult to management many database instances given the fact that some organizations deploy thousands of them. This naturally leads to the question whether we can create a shared resource architecture to support multiple instances.

Figure: Oracle legacy database - each application uses multiple database instances (source: http://goo.gl/HznYKe)

Another possible solution is to isolate database files for different applications using a virtual private database (VPD); that way we do not have duplicate background processes, system/process memory or Oracle meta-data. However, it is difficult manage security, isolation, and transport data in VPD.

Having seen the pros and cons of the above two approaches, it is logical think if we can combine the best of both and create a better database system. Yes, you guessed right, that's exactly what Oracle 12c (c stands for cloud) has done! Oracle 12c is called CDB (Container Database) or Pluggable Database.

CDB consists of two levels. A root container called CDB$ROOT and one or more pluggable databases PDBs. A PDB is very similar to a legacy database except that it shares background processes, memory and meta-data with other PDBs. Unlike VPD based solution, at the same time, it provides stronger security and isolation controls, and further easier provisioning via cloning and portability via pluggable feature.

Figure: The same legacy deployment with new CDB - consolidated to one machine (source: http://goo.gl/HznYKe)

Think like a USB device to be a PDB and computers with USB ports as CDB container. Once the CDB$ROOT is up and running, you can plug in any number of PDBs to the CDB. There is a seed PDB called PDB$CDB which you can use to clone new PDBs. You can move (plug/unplug) PDBs between CDBs.  


Figure: CDB - Multitentant architecture

We will look further into the CDB concepts in the next blog post.

Wednesday, August 6, 2014

[Aug 2014] What kind of DBMS are currently being used/built?

What kind of databases are trending? (becoming popular more)

As you can see, there is an increasing popularity towards non relational database systems. I was surprised to see the graph DBMS curve. But then again, the rise of sensors and connected devices, which produce so much graph data, should be no surprise to see the above trend.

In the chart above, it may give you the wrong impression that organizations are moving away from relational RDBMSes. However, that is not the case. All DBMSes except RDBMS together makes only a small market (around 10%). Therefore, small change in popularity for non relational DBMS is very visible. To confirm this fact, look at the chart below. Still RDBMS rules the world! IMHO, it will remain this way for the foreseeable future. 



What kind of databases are people building these days? As you can see, most software companies invest on relational systems. However, at the same time, they are investing more on other types of database systems mainly due to the following facts:
1. Amount of data is getting bigger and bigger (e.g. "every two days we create as much data as up to 2003", Google CEO)
2. Data is more connected than ever before (e.g.: rise of social media, sensor networks, connected devices; Cisco estimates 50 billion connected devices by 2020.)
3. Data is unstructured or semi-structured (e.g.: Amazon review comments, tweets, blogs)


I encourage you to visit db-engines.com to see more drilled down stats.

(Source: http://db-engines.com/en/ranking_categories)

Is relational database (RDBMS) doomed in a sea of NoSQL?

Short Answer: No. Relational databases are created out of the need to support certain business needs which NoSQL databases cannot replace. NoSQL databases addresses business use cases where relational databases fall short. 

Long Answer: Now we are going to get in to the details:

RDBMS provides strong consistency guarantees: specifically, it guarantees ACID (Atomicity - all or nothing of the transaction is committed, Consistency - transactions leave the database in a consistent state - data stays valid in terms of integrity constraints, Isolation - each transaction is independent and Durability - once committed, the data stays ) properties. Due to the strong consistency guarantee, all users/apps see the same data at the same time. Downside of such strong guarantees is that they are more difficult to scale.

On the other hand, NoSQL systems support BASE (Basically Available - available as in CAP theorem, Soft state - state changes over time, Eventually consistent - with time all users will see the same output) properties which are less strict than ACID properties. Due to the loose consistency guarantee, NoSQL systems scale out easier than RDBMS. Further, unlike RDBMS systems, NoSQL systems do not have fixed schemas to work with. Developers decide the schema in their code which may lead to faster development. 

As you can see, RDBMS and NoSQL systems demonstrate different set of properties. When you want to have fast data at the expense of providing inaccurate data at times (e.g. Twitter/FB feeds) and/or scale based on changing demand, NoSQL is a better solution. However, when you want to have super valid/consistent data and support complex and dynamic queries, RDBMS is a better solution. In fact, there are already attacks on NoSQL based systems due to lack of consistency. The recent attack on flexcoin is one such example. This teaches us to use the right tool for the job at hand. Hence, it is less like to see the demise of RDBMS technology in the near future.

Tuesday, August 5, 2014

Life is like a camera man

Life is like a camera man
Focus on what is important .. you can neither do everything nor make everyone happy
Capture the good times .. positive attitude helps succeed in life
Share the captures with the loved ones.. sharing is caring
If you miss an opportunity, keep looking for the next one .. when one door closes, many other opens
Sometimes you have to wait patiently .. but the reward is high
With time you become better at taking shots .. practice makes you perfect
Develop from the negatives .. mistakes are what makes you stronger
And if it does not look good, take another shot.. never give up, try try and try
Recharge the camera regularly .. life is not a rat race .. stop and take a moment to enjoy as you shoot away
Take good care of the camera as it is too precious to loose .. your health is wealth

(Taken from elsewhere and altered)

[Thoughts] Why are we so reluctant to change?

I am no motivational speaker or a self-help guru .. but I was pondering about the fact that how reluctant we are to change. We are very comfortable to continue to what we do even if there is a new and better way of doing it. Why are we so reluctant to change? In my opinion, it is the culture and education that we are brought up. Our culture has taught us that failure is a bane; failure is look down upon. We fear failure when thinking about change. Further, our education system has taught us to follow the path others had followed before us. Taking risks or trying something new is not rewarded unless you succeed. It is so deeply rooted that we sometimes say "the known devil is better than the unknown angel" to avoid change. This has to change! Even the cooperate culture is no different. Organizations are also very reluctant to change even though the change provides many benefit. The following picture depicts this fact nicely:


Friday, August 1, 2014

[Cloud] Cloud Computing Definition

Since the term cloud computing is a relatively new term compared to other technologies in the IT field, different people/organizations have come up with different definitions. Out of all of them, in my opinion, NIST definition is by far the best in terms of clarity and completeness.

NIST definition of cloud computing is quite simple and upto the point:

  • Essential characteristics
    • On-demand self-service - provisioning of computing capabilities without any human intervention
    • Broad network access - capabilities are available over the network/any device
    • Resource pooling - multi-tenant model to serve multiple customers from pooled resources
    • Rapid elasticity - capabilities can be elastically provisioned or released based on demand
    • Measured service - resource usage can be monitored to provide metering capability
  • Service models
    • Software as a Service - consumers have access to applications running on provider's cloud - zero software installation; e.g. Salesforce, Oracle CRM
    • Platform as a Service - consumers have control over deployed apps - consumers are provided middleware running in provider's cloud
    • Infrastructure as a Service - consumers have control over OS, storage and deployed apps - consumers are provided virtual machines
The following diagram illustrate the above three service models:
(Source: Wikipedia)
  • Deployment models
    • Private cloud - provisioned for exclusively used by one organization
    • Community cloud - provisioned for used by one community with similar interest
    • Public cloud - provisioned for open use by general public - e.g. Amazon cloud service
    • Hybrid cloud - any combination of the above three
The following diagram from Microsoft provide some more information on the deployment models:
(Source: Microsoft.com)