Database – Teknonauts https://teknonauts.com Stay Tuned With Technology Wed, 26 May 2021 13:35:24 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.5 https://teknonauts.com/wp-content/uploads/2021/01/cropped-teknonauts_favicon_original-1.png Database – Teknonauts https://teknonauts.com 32 32 #20 Oracle Database 21C World’s best and new age RDBMS https://teknonauts.com/oracle-database-21c/ https://teknonauts.com/oracle-database-21c/#respond Wed, 12 May 2021 11:33:21 +0000 https://teknonauts.com/?p=3975

World’s most popular Oracle database has come up with major upgrade in its database 21c. With this launch RDBMS is back in the game and challenging the use cases where NoSQL was supposed to be the best fit. New age RDBMS Oracle database 21c contains more than 200 new innovations including in-database machine learning AutoML, immutable blockchain table, graph processing optimization, in-memory optimization, JavaScript support, dramatic improvement to JSON operations, sharding, multitenant, security and many more.

Oracle database 21c is available on Oracle cloud and can be deployed as a Database Service Virtual Machine (for clusters and single instance) or Bare Metal service (single instance). It is also available with Oracle Cloud always free tier of Oracle Autonomous database with limited storage and CPU capacity. In addition, Oracle APEX (Application Express) Application Development is also available with Oracle database which is a new low-code service for developing and deploying data-driven enterprise applications quickly and easily.

Data-Driven Future

Oracle has consistently worked on managing and storing data in converged database which is more efficient and productive than breaking up into multiple single-use engines. Converged database addresses the data integrity, consistency and security issues. Now, let’s understand what is Converged database? A converged database is a multi-model, multitenant, multi-workload database. It supports the data model and access method each development team wants, without unneeded functionality getting in the way. It provides both the consolidation and isolation these different teams want but don’t want to think about. And it excels in all the workloads (like OLTP, analytics, and IoT) these teams require. Oracle Database 19c is the world’s first converged database.

Figure 1 Converge Oracle Database

Oracle database 21c has put more innovation to Oracle’s converged database and  offers: best of breed support for all data types (e.g. relational, JSON, XML, spatial, graph, OLAP, etc.), and industry-leading performance, scalability, availability and security for all their operational, analytical and other mixed workloads. Oracle’s converged strategy also ensures that developers benefit from all Oracle Database 21c key capabilities (e.g. ACID transactions, read consistency, parallel scans and DML, online backups, etc.) – freeing them to focus on developing applications without having to worry about data persistence.

Figure 2 Multi-model innovation – Oracle Database

What’s new in Oracle Database 21c

This latest Innovation release introduces several new features and enhancements that further extend database use cases, improves developer, analyst and data scientist productivity, and increases query performance.  Listed below is a subset of what’s new in Oracle Database 21c.

New Blockchain Tables

Blockchain as a technology has promised much in terms of solving many of the problems associated with the verification of transactions. While considerable progress has been made in bringing this technology to the enterprise, several problems exist. Arguably, the largest being the complex nature of building applications that can support a distributed ledger. Oracle Database 21c addresses this problem with the introduction of Blockchain Tables. These tables operate like any normal heap table, but with a few important differences. The most notable of these being that rows are cryptographically hashed as they are inserted into the table, ensuring that the row can no longer be changed later.

Figure 3 Blockchain Table – Oracle Database

New Native JSON Datatype

Oracle Database has had JSON support since version 12c and storing JSON data as a VARCHAR2 or a LOB (CLOB or BLOB). In Oracle Database 21c, JSON support is further enhanced by offering a native data type, “JSON”. This means that instead of having to parse JSON on read or update operations, the parse only happens on an insert and the JSON is then held in an internal binary format which makes access much faster. This can result in read and update operations being 4 or 5 times faster and updates to very large JSON documents being 20 to 30 times faster.

Figure 4 Native JSON Datatype
Figure 5 function JSON_TRANSFORM

JSON Performance in Oracle Database

Here is comparison of Oracle database performance with Mongodb  

Text Box: Autonomous JSON Database with 8 OCPUs compared to MongoDB Atlas on M60
Industry-standard Yahoo Cloud Serving Benchmark (YCSB)
Source of MongoDB results: https://www.mongodb.com/atlas-vs-amazon-documentdb/performance as of 8/12/2020

New JavaScript Execution inside the Database

Oracle Database 21c adds support for JavaScript via the Oracle Multilingual Engine (MLE), powered by GraalVM. JavaScript is a ubiquitous scripting language that, among its many uses, enables richer user interaction in web applications and mobile apps. It’s one of the few languages that runs in a web browser and can be used to develop both client-side and server-side code. There is a large collection of existing JavaScript libraries for implementing complex programs, and JavaScript works in conjunction with popular development technologies such as JSON and REST. The MLE automatically maps JavaScript data types to Oracle Database data types and vice versa. The JavaScript code itself can execute PL/SQL (stored procedures) and SQL through a built-in JavaScript module.

Figure 6 JavaScript Execution in Oracle Database

More Innovations to Support Multi-Workload

Figure 7 Multi-workload – Oracle Database

In-Memory improvement

Database In-Memory requires a lot of manual effort from DBAs, developers and users. Now you can set INMEMORY_AUTOMATIC_LEVEL to HIGH, and all columns will be considered for in-memory analysis.

Also, in-memory hash joins have been optimized using SIMD vectorization, resulting in up to 10x higher speed. Also, columnar scans have been sped up for the case where not every column is in memory. Now, the optimizer can perform a hybrid scan and fetch projected column values from the row store if needed, instead of needing to perform the scan entirely on the row store.

Figure 8 In-Memory Column Store – Oracle Database

Persistent Memory Store

“Dramatic performance improvement for I/O-bound workloads”

Oracle Database 21c added support for a Persistent Memory (PMEM) store on single instance databases. This allows for storage of data and redo in local PMEM on commodity hardware. Oracle Database 21c runs SQL directly on data stored in the mapped PMEM file system, eliminating the need for a large buffer cache and reduces the code path. And it provides much faster transaction durability and near-instant recovery. This feature is especially useful for workloads that tend to be I/O bound.

Figure 9 Persistent Memory Store – Oracle Database

AutoML

Oracle Database 21c makes it even simpler for data scientists and analysts to take advantage of in-database machine learning by providing Python machine learning interfaces to the database. This new client tool complements existing R and SQL interfaces already available. AutoML simplifies the development of predictive machine learning models by automating the model selection, feature selection, and parameter tuning processes required for building accurate models.

Automatically build and compare Machine Learning models

Figure 10 AutoML

More Innovations to Support Multi-Tenant

Figure 11 Multi-Tenant

An Oracle database is a multi-tenant container database (CDB) that holds one system seed pluggable database and any number of user-created pluggable databases (PDBs). (See the third diagram below.) Users interact only with the PDBs, and a user may never realize that the PDB they are using isn’t a standalone database. In the past, Oracle also supported non-CDB database instances, but that option is no longer supported in Oracle Database 21c.

Multi-tenant Enhancement

Figure 12 Data Guard Per-PDB Physical Standby

DbNest

DbNest provides Enhanced Security Isolation for both Container and Pluggable Databases. CDBs and PDBs each reside within their own security realms called a “Nest”, which are enforced using advanced O/S features. DbNest provides security isolation of following

  • Processes
  • CPU
  • Memory
  • Network
  • File access
Figure 13 DbNest

Conclusion

Oracle Database 21c is World’s best-converged database that comes up with 200+ new features for all data-driven application use cases. Please explore more exciting features ( https://docs.oracle.com/en/database/oracle/oracle-database/21/whats-new.html ) listed here and experience the new age RDBMS.

Oracle Database 21c More Features

Reference

https://oracle.com/

Explore more at Teknonauts.com

]]>
https://teknonauts.com/oracle-database-21c/feed/ 0
#17 Google Logica – Language of Big Data https://teknonauts.com/17-google-logica-language-of-big-data/ https://teknonauts.com/17-google-logica-language-of-big-data/#respond Mon, 19 Apr 2021 11:14:31 +0000 https://teknonauts.com/?p=3822

Google Logica is an open source declarative logic programming language in the field of data manipulation. Yedalog was predecessor of Logica. Yedalog was also created at Google before. Logica is for data scientists, engineers and other specialists who want to use logic programming syntax while writing queries and pipelines to run on Big Query.

Logica stands for Logic with aggregation.

Why to check Google Logica?

Compiler compiles Logica to StandardSQL and gives developer access to the power of BigQuery engine with the easy usability of logic programming syntax. This is helpful because BigQuery is much more powerful than state of the art native logic programming engines.

Explore logica if you find yourself in one or the below conditions:

Already using a programming language and require more computational power.

Readability is a concern while using SQL.

You are keen to learn a logic programming and apply it to Big Data Use cases.

Hearing Logic programming for the first time. What is it?

Logic programming is a declarative programming standard where the program is written with a set of logical executable statements.

This programming concept was first developed in academia from the late 60s. Prolog and Datalog are the most common examples of logic programming languages.

Logica is one of the languages of the Datalog family.

Datalog and relational databases start from the same thought process,

Conceptualize data as relations and data manipulation as a sequence of operations over these relations. But Datalog and SQL differ from operation aspects. Datalog is inspired completely by the mathematical syntax whereas SQL follows the syntax of natural language.

SQL was based on the natural language to give access to databases to the developer community without formal training in any programming languages. This usability may become pricy when the logic that you want to express is important. There are many examples present over internet of hard-to-read SQL queries that correspond to simple logic programs.

How does Google Logica work?

Logica compiles written logic program into a SQL programming expression, so it can be executed on BigQuery (SQL Engine).

Among database practitioners Datalog and SQL are known to be very much similar. The conversion from Datalog to SQL and back is often straight. However there are a few small differences, for e.g. how to utilize disjunction and negation. In Logica, Google has tried to make SQL structure as easy as possible, thus empowering user to write programs that are executed very efficiently.

How to learn?

Learn basics of Google Logica with the CoLab tutorial located at tutorial folder. See e.g. of using Logica in examples folder.

Tutorial and examples show how to access Logica from CoLab. You can also install Logica command line tool.

Setup Prerequisites

To run Logica programs on BigQuery you will need a Google Cloud Project. Once you have a project you can run Logica programs in CoLab providing your project id.

To run Logica locally you need Python3.

To initiate Logica predicates execution from the command line you will need bq, a BigQuery command line tool. For that you need to install Google Cloud SDK.

Installation

Google Cloud Project is the only thing you need to run Google Logica in Colab, see Hello World example.

You can install Logica command with pip as follows.

# Install.
python3 -m pip install logica
# Run:
# To see usage message.
python3 -m logica
# To print SQL for HelloWorld program.
python3 -m logica - print Greet <<<'Greet(greeting: "Hello world!")'

If your PATH includes Python’s bin folder then you will also be able to run it simply as

logica - print Greet <<<'Greet(greeting: "Hello world!")'

Alternatively, you can clone GitHub repository:

git clone https://github.com/evgskv/logica
cd logica
./logica - print Greet <<<'Greet(greeting: "Hello world!")'

Code samples

Here a couple examples of how Logica code looks like.

Prime numbers

Find prime numbers less than 30.

Program primes.l:

# Define natural numbers from 1 to 29.
N(x) :- x in Range(30);
# Define primes.
Prime(prime: x) :-
  N(x),
  x > 1,
  ~(
    N(y),
    y > 1,
    y != x,
    Mod(x, y) == 0
  );

Conclusion

The Logica tutorial is already available on Google.

Google Open Source Logica

Developers should start learning Logica if they are working with Google platforms and Big Data world. This is here to stay. Keep looking for more Teknonauts.com

]]>
https://teknonauts.com/17-google-logica-language-of-big-data/feed/ 0
#3 Best Data Architecture need for modern applications https://teknonauts.com/data-architecture/ https://teknonauts.com/data-architecture/#comments Tue, 30 Mar 2021 09:00:07 +0000 https://teknonauts.com/?p=2882

Basics of Data Architecture

Data architecture is a critical component of the enterprise applications, data does not leads to only information and knowledge but it define the business strategy of any organization.  It is the basis of providing any service in the digital world. The indicative characteristics of an information system around the data it holds would include:

  • Personal and sensitive data
  • Large size and large number of datasets
  • Interdependent, Complex and diverse 
  • Dynamic e.g. stock market data
  • Unstructured e.g. social media data
  • Short and long life data

Because of above diverse characteristics designing the Data architecture is very important and necessary factor while designing enterprise application.

However, before we get into the Architecture, let us talk about what is happening with data in modern days. Why RDBMS limiting its use cases? Why everyone talking about NOSQL? Why everyone talking about BIG data and data analytics? Why data science became a stream for research?

 Over the past few decades, we have collected huge amount of data and only some percentage are  supposed to be meaningful data people used to fit them in RDBMS but when we realized the potential of ignored, unstructured massive data,  World started shifting towards NOSQL, Big data and data analytics. In addition, with advancement of the processing power Speed became one of the key driver to access huge unstructured data.

So, enterprise data architecture of modern days applications is a break from traditional data application where data is disconnected from other application and analytics at the same time.

The enterprise data architecture supports fast data created in a multitude of new endpoints, operationalizes the use of that data in applications, and moves data to a “data lake” where services are available for the deep, long-term storage and analytics needs of the enterprise. The enterprise data architecture can be represented as a data pipeline that unifies applications, analytics, and application interaction across multiple functions, products, and disciplines

Modern Data and databases

Key to understanding the need for an enterprise data architecture is an examination of the “database universe” concept, which illustrates the tight link between the age of data and its value.

Most technologists support data existence in a time continuum. In almost every business, data moves from function to function to inform business decisions at all levels of the organization. While data silos still exist, many organizations are moving away from the practice of  dumping data in a database—e.g., Oracle, Postgres, DB2, MSSQL, etc.—and holding it statically for long periods of time before taking action.

Why Architecture Matters

Interacting with fast data is a fundamentally different process than interacting with big data that is at  rest, requiring systems that are architected differently. With the correct assembly of components that

reflect the reality that application and analytics are merging, an enterprise data architecture can be built that achieves the needs of both data in motion (fast) and data at rest (big).

Building high-performance applications that can take advantage of fast data is a new challenge. Combining these capabilities with big data analytics into an enterprise data architecture is increasingly becoming table stakes.

Objective of designing Enterprise Data Architecture

The objective of designing the enterprise architecture is to deliver key service delivery value to the Enterprises. The value delivered by investing in designing the architecture can be evaluated using follow 

  • Cost savings
  • Efficiency
  • Service quality
  • Strategic control
  • Availability

To achieve two things emerge as need for today application- FATS & BIG. Enterprise applications should able to process big amount of data and server as fast as possible to achieve the highest output from it.

Here is the reference architecture for Modern application considering the above facts

                                                              Reference Architecture for Modern Enterprise Application

The first thing to notice is the tight coupling of fast and big, although they are separate systems; they have to be, at least at scale. The database system designed to work with millions of event decisions per second is wholly different from the system designed to hold petabytes of data and generate extensive historical reports.

Big Data, the Enterprise Data Architecture, and the Data Lake

The big data portion of the architecture is centered around a data lake, the storage location in which the enterprise dumps all of its data. This component is a critical attribute for a data pipeline that must  capture all information. The data lake is not necessarily unique because of its design or functionality; rather, its importance comes from the fact that it can present an enormously cost-effective system to store everything. Essentially, it is a distributed file system on cheap commodity hardware.

Today, the Hadoop Distributed File System (HDFS) looks like a suitable alternative for this data lake, but it is by no means the only answer. There might be multiple winning technologies that provide solutions to the need.

The big data platform’s core requirements are to store historical data that will be sent or shared with other data management products, and also to support frameworks for executing jobs directly against the data in the data lake.

Necessary components for Enterprise Architecture

  1. Business intelligence (BI) – reporting

Data warehouses do an excellent job of reporting and will continue to offer this capability. Some data will be exported to those systems and temporarily stored there, while other data will be accessed directly from the data lake in a hybrid fashion. These data warehouse systems were specifically designed to run complex report analytics, and do this well.

  • SQL on Hadoop

Much innovation is happening in this space. The goal of many of these products is to displace the data warehouse. These systems have a long way to go to get near the speed and efficiency of data warehouses, especially those with columnar designs. SQLon- Hadoop systems exist for a couple of important reasons:

  1. SQL is still the best way to query data
  2. Processing can occur without moving big chunks of data around
  • Exploratory analytics

This is the realm of the data scientist. These tools offer the ability to “find” things in data: patterns, obscure relationships, statistical rules, etc.

  • Job scheduling

  This is a loosely named group of job scheduling and management tasks that often occur in Hadoop. Many Hadoop use cases today involve pre-processing or cleaning data prior to the use of the analytics tools described above. These tools and interfaces allow that to happen.

The big data side of the enterprise data architecture has gained huge attention in Modern Enterprise Applications. Few would debate the fact that Hadoop has sparked the imagination of what is possible when data is fully utilized. However, the reality of how this data will be leveraged is still largely unknown.

Integrating Traditional Enterprise Applications into the Enterprise Data Architecture

The new enterprise data architecture can coexist with traditional applications until the time at which those applications require the capabilities of the enterprise data architecture. They will then be merged

into the data pipeline. The predominant way in which this integration occurs today, and will continue for the foreseeable future, is through an extract, transform, and load (ETL) process that extracts, transforms as required, and loads legacy data into the data lake where everything is stored. These applications will migrate to full-fledged fast + big data modern applications.

Conclusion

It is absolutely necessary to understand the promise and value of fast data but it is not sufficient enough for guaranteed success for enterprise working on implementing Big data initiatives. However, technologies and skillset to take advantage of fast data is necessary and critical for business and enterprises across the globe.

Fast data is a product of Big data, while there are unleashed opportunities from mining the data to derive business insights to enable growth there are much that still need to be accomplished. So by collecting vast amounts of data for exploration and analysis will not prepare a business to act in real time, as data flows into the organization from millions of endpoints: sensors, mobile devices, connected systems, and the Internet of Things.

We have to understand the architectural requirement for both fast and Big data separately and address the challenges with the right tools and technologies. But to take the business advantage we have to architecturally integrate and serve the applications on fast data processed from big data.

For more information refer wiki and explore more at Teknonauts

]]>
https://teknonauts.com/data-architecture/feed/ 2
#9 Aadhaar Card Architecture – World biggest and best biometric database https://teknonauts.com/aadhaar-card-architecture/ https://teknonauts.com/aadhaar-card-architecture/#respond Tue, 30 Mar 2021 03:28:07 +0000 https://teknonauts.com/?p=3291

‘Aadhaar’ (aadhaar card ) is undoubtedly one of the most important projects rolled out by the Government of India. Ambitious, one-of-a-kind, and a game-changer, ‘Aadhaar’ is on the path of delivery to every Indian resident, a ‘national identity and triggering thereby the much-desired governance system based on social inclusion, transparency & accountability. Considering India’s population of 130 crores, it is obviously the world’s biggest biometric database.

Aadhaar Card System Reference Architecture

Below figure depicts the Aadhaar system at a high level.

aadhaar card system architecture

Use of Biometrics in Aadhaar

The Aadhaar ( aadhaar card ) biometric system design has followed global best practices. UIDAI has reviewed existing state-of-the-art biometric systems, consulted with the world’s top biometric experts, conducted a proof of concept study and has built a biometric system that is currently considered to be state-of-the-art.

Multi-ABIS De-duplication System

Since de-duplication at this scale (1.3 billion residents) had not been previously attempted anywhere in the world, UIDAI decided to procure 3 ABIS (Automatic Biometric Identification System) software solutions to perform biometric de-duplication as a risk mitigation strategy.

Aadhaar ( aadhaar card ) is the first ever multi-ABIS system implemented in the world, and brings significant advantages:

Biometrics in Authentication

Fingerprint and iris are the biometric modalities that are being used by UIDAI to allow residents to authenticate themselves. Online biometric authentication is a 1:1 verification of the biometric(s) presented at the time of authentication with templates generated from the data collected during enrolment or biometric updates.

Enrolment Process Reference Architecture

Enrolment module handles entire Aadhaar lifecycle including initial enrolment, corrections, subsequent demographic/biometric updates, back-end workflow related to handling enrolment exceptions, etc. Diagram above depicts the high level picture of the entire enrolment system.

Enrolment Client

Aadhaar ( aadhaar card )enrolment strategy is based on a multi-registrar model. This means that for uniformity of data capture, process, and security, it was essential that standardized “Enrolment Client” (EC) software be created and given to all Registrars to be used by their appointed Enrolment Agencies (EAs). Enrolment client software is provided by UIDAI to use it in the field for first-time enrolment and for subsequent data updates.

key features of enrolment client-

Architecture Principles

Architecture principles used in building Aadhaar system ( aadhaar card )ensuring openness, vendor neutrality, scalability, and security. Before introducing Aadhaar architecture principles, this chapter looks at software system architecture trends over the last couple of decades and looks at some of the high impact changes in recent years. This understanding is critical in appreciating Aadhaar architecture and the reasoning behind those decisions.

Architecture Evolution & Trends

Software architecture has evolved from mainframe era to cloud computing era and as part of that change, computing, storage, and programming technologies also have evolved. Monolithic architecture has given way to large scale distributed computing architectures, proprietary storage and compute has given way to commodity computing and large scale low cost storage, user interface has changed from character based fixed green screens to highly interactive gesture based mobile interfaces, and nearly no connectivity to having pervasive connectivity. Massive increase in the amount of data managed within applications from mere in kilobytes and megabytes to petabytes and hexabytes have forced architects to rethink on design choices for computing and data store within applications.

These changes have had huge impact when building next generation, large scale applications. Subsequent sections in this chapter explore these trends, changes, and set the context for understanding Aadhaar ( aadhaar card )architecture strategy.

Scale-up, Scale-out, and Open Scale-out

Over the last four decades, software system architecture has evolved from a monolithic, single large server deployment to highly distributed, heterogeneous technology deployment. This has resulted in significant architecture and design shift from a vendor locked-in system to highly open and distributed system.

Scale-Up Architecture

In the 70’s and 80’s, large scale systems were designed to be monolithic and deployed in a large mainframe class machine. These systems are built using specific technologies provided by one vendor who also provide the large hardware for deploying these applications. Once you decide to build using technology provided by one mainframe (or similar large scale computing platforms) vendor, then for several years following it, application scaling, system upgrades, maintenance, etc all completely depends on that particular vendor.

Such architecture is referred to as scale-up architecture where entire system needs to scale within one machine (by adding more computing power within the machine) and that too using the technology provided by that vendor. Most of the time, significant upfront investment needs to be made planning for future growth and scaling instead of upgrading the systems at a later stage when it is really required to do so.

Scale-Out Architecture

From the 90’s, with the advent of client-server architecture, application architecture changed to support different components of an application to be deployed in different machines and allowing them to scale independently. Most of these systems were built to run on Unix or Windows environments and middleware technologies allowed components to communicate with each other across machines over a network. Large scale applications deployed within enterprises and early Internet applications followed this architecture.

Aadhaar card Application Architecture

Aadhaar application is primarily written in Java™ language using open source components and frameworks. Application is built in tune with all the architecture principles described above

Reference Client Architecture and Implementation

Enrolment Server Logical View

Aadhaar ( aadhaar card ) enrolment server components are highly scalable to handle million+ enrolments every day and is built to manage data stores in 1000’s of terabytes. Entire enrolment workflow is broken up into many logical stages

Enrolment Biometric Subsystem

Aadhaar ( aadhaar card ) system deploys 3 independent ABIS solutions adhering to common ABIS API  for multi-modal biometric de-duplication. Enrolment server needs to therefore integrate with these solutions using the ABIS API and allocate de-duplication requests as per UIDAI policy of dynamic allocation (which uses on-going accuracy and performance data to decide which solution gets maximum de-duplication requests).

Architectural requirements of multi-ABIS solution are as follows:

Following diagram depicts the multi-ABIS architecture with ABIS middleware orchestrating the insert/identify flows between 3 different ABIS solutions as per UIDAI dynamic allocation policy.

Information Privacy & Security

Application security for the above architecture cuts across all places where an un-trusted source or destination is used. The encrypted enrolment data file is uploaded in the DMZ to ensure against Trojans or malwares.

The enrolment/update data packets are encrypted by the client using public key cryptography with each data record having an HMAC which can identify any integrity violation of the data. Master keys are stored and managed within HSM (Hardware Security Module) appliance. It must be noted that the enrolment packet is constructed in memory and encrypted prior to writing of the file. All of the data, including biometrics and demographic data, is never stored in unencrypted form. The packet is encrypted with a randomly generated AES-256 symmetric session key and the key itself is encrypted with a 2048 bit public key, selected from a bank of UIDAI public keys.

Data Model Aadhaar Card

Following diagram depicts enrolment data model:

Other modules such as upload, ABIS middleware, Data Quality, Manual Adjudication, etc use an extended database of their own with link to the above core data via either EID or RefID. They can also be linked via Aadhaar number itself in the case of components that provide features at Aadhaar level. These extended databases are typically component level audits that can be archived away after a period. Analytics (BI) module is used for long term analytics/reporting where all events are stored within Hadoop Hive Atomic Data Warehouse (ADW) for long period of time. Various Hive and map-reduce jobs are run on that atomic data warehouse to derive aggregate metrics.

APIs wrap the core data models and provide uniform way to access this data via EID/RefID/UID irrespective of the state they are in (being processed, Aadhaar ( aadhaar card ) allocated, or rejected). Status tracking API, Common Search API, Advanced Search API are these core APIs. Other services such as e-KYC API, e-Aadhaar service, etc all built as wrappers on top of these core internal data access APIs.

Below diagram depicts technology stack used within enrolment server module at a high level. In addition to those depicted above, there are several open source libraries used throughout the system. Following table lists the technology stack of Aadhaar ( aadhaar card ) enrolment server.

Resident Data Extraction

Enrolment server processes enrolment packets and stores the Aadhaar’s demographic data in UID Master which contains demographic and photo data. Aadhaar data that is needed for authentication has to be stored in database that offers faster read/writes and promises higher throughput. Since RDBMS and XFS based archival system cannot be used for reading resident data during authentication (100+ million reads a day) as they do not offer desired performance, it is imperative that data is extracted from enrolment data stores and stored in a high performance, distributed, read data store such as HBase.

In addition, the biometric authentication requires usage of biometric template gallery, which is biometric feature set extracted from raw biometric images. Hence, the processing of biometric images also has to be performed as part of preparing data for authentication purposes.

Technology Platform

Application modules are built on common technology platform that contains frameworks for persistence, security, messaging, etc. The Platform standardizes on a technology stack based on open standards and using open source where prudent. A list of extensively used open source technology stacks is given below:

Spring Framework – application container for all components and runtime

The diagram below depicts the e-Governance platform as a set of layered technology building blocks that are used to build applications.

Conclusion – ( aadhaar card )

Aadhaar ( aadhaar card ) system is built purely as an identity platform that other applications, Government and private, can take advantage of. A sound strategy and a strong technology backbone enabled the program to be launched ahead of plan in Sept 2010 and reach the kind of scale that was never achieved in any biometric identity systems across the world. In less than 4 years since launch, Aadhaar system has grown in capability and more than 600 million Aadhaar numbers have been issued so far using the system.

Entire technology architecture behind Aadhaar is based on principles of openness, linear scalability, strong security, and most importantly vendor neutrality. Aadhaar software currently runs across two of the data centres within India managed by UIDAI and handles 1 million enrolments a day and at the peak doing about 600 trillion biometric matches a day to meet its peek needs. In coming years Aadhaar system will cover rest of the country proving identity to more than 1.2 billion residents in India and its electronic usage (Authentication, e-KYC, etc) is expected to grow exponentially.

Knowing Architecting of Aadhaar system keeping the scale, security, ecosystem design, and most importantly the constraints of an e-Governance system, is a lifelong learning experience.

Explore more at Teknonauts

Reference Link

Aadhaar developer section

Aadhaar card conceptualization

Aadhaar card Architecture

Aadhaar developer section

Aadhaar card conceptualization

Aadhaar card Architecture

Disclaimer

Architectural concepts and components explained in this article are taken from public information (provided in reference link). We have provided our own explanation for knowledge purposes only.

]]>
https://teknonauts.com/aadhaar-card-architecture/feed/ 0