IBM’s Big Insights: A primer
Recently I had the opportunity to attend a training session
about IBM’s Big Insights, in November 2013. Below are my notes about this
product.
What is Big Insights in a nutshell?
Big Insights is IBM’s Big Data platform. It is comprised of
an all-in-one Big Data infrastructure, with IBM’s flavor of Hadoop and its
ecosystem, as well as proprietary tools to query the data like JAQL and AQL,
and out-of-the-box connectors and interfaces called accelerators. We’ll review
these components in details in the below section.
Big Insights Hadoop infrastructure
Big Insights is composed of a Hadoop infrastructure
(independent from vendors like Cloudera). It is using a released version of
Hadoop that is well-tested, usually a bit older from trunk. However it differs
from the Apache version in some ways also. Big Insights comes integrated with:
-
GPFS (IBM’s version of HDFS) for its file system
-
Adaptive Map Reduce, an enhanced version of MR
that attempts to optimize task executions, by way of using automatic job tuning
of speculative execution and Task JVM reuses. Map Reduce tasks become aware of
the global state of the job they are working in. This helps balance the
workload across Map tasks.
-
Zookeeper, HBase, Hive, Pig
Of note is the fact that Big Insights is not bundled with
Cloudera’s CDH anymore; IBM has its own version of Hadoop.
New query language: JAQL
Big Insights offers a language called JAQL, a functional
language that can interface will of all the Big Insights tools. It provides
API's (or modules) for reaching out to external IBM and 3rd party
tools, such as relational databases, indexing services, text analytics, machine
learning etc. JAQL stands for Json Query Language, because it is represented
via Json. Similar to Pig, Jaql is automatically taking care managing the
complexities of the MapReduce world to optimally perform the work. However it
also manages deep level nested semi-structured data.
Jaql can be executed either from its own shell, or from
within Eclipse.
Big Insights Applications
Big Insights provides an environment for developing and
executing applications. A business user can launch existing applications from
the Web console, supply any input parameters and view results. These applications may be developed using Big
Insights’ development tooling which enables programmers to publish completed
applications through the Web console.
The BigInsights Eclipse tools include wizards, code
generators, context-sensitive help, and a test environment to simplify your
development efforts.
Workflow
applications are run by Oozie as a workflow job.
Big Sheets
Big Insights also comes with a spreadsheet-like interface to
interact with Big data in a manner business users would use Excel. To do so, it
presents a familiar interface (e.g. Pivot, Union, Intersection functions) that
allows users to gather, filter, combine, explore, and visualize data from
various sources. Big Sheets has
been designed to be used by non-technical professionals to rapidly gather
insight (BigSheets executes work on a simulated environment of sample data
first) and analysis from huge amounts of data, and to be able to act on those
insights in a timely manner. No need to understand database schemas, no need to
understand a query language. And Big Sheets conveniently has a built-in
visualization module to chart and publish the results.
Also, the
nice thing about it is that Big Sheets is integrated natively with the other
Big Insights components, so it’s easy to navigate between the different tools
that Big Insights provides; e.g. create an ETL job in Jaql and export the
results to Big Sheets..
Big Data Accelerators
Big Insights bundles in some pre-built components for
specific solutions to accelerate development on certain specific use cases. The
accelerators generally provide business logic, data processing and
visualization. An example of this is the Social Data Analytics accelerator,
providing a set of predefined elements
as workbooks and dashboards to analyse social data.
Other Big Data tools
The IBM Big Data platform is comprised of Big Sheets, but
also other tools like Infosphere Streams for low latency data, and an MPP
(Massively Parallel Processing) database. The IBM ecosystem also seems to
support Big Data: R is supported in Big Insights, Cognos supports Hive, Netezza
integrates with Streams. These systems offer complementary analytical
approaches.
IBM offers a free downloadable virtual machine to play with
Big Insights.
Overall a good experience, although one can get easily lost by
the sea of products IBM offers. On the other hand tools like Big Sheets and the Accelerators
seem very valuable.
Really a valuable content, keep sharing post like this. It will be helpful to many like me in finding the institute for Hadoop training chennai velachery
ReplyDeleteGenuinely a critical substance, keep sharing post like this. It will be valuable to various like me in finding the association forhadoop training in chennai | hadoop training in chennai
ReplyDeleteIt was really a wonderful article and I was really impressed by reading this blog. We are giving all software and Database Course Online Training. Oracle Training in Chennai is one of the reputed Training institute in Chennai. They give professional and real time training for all students.
ReplyDeleteOracle Training in chennai
hey nice site..learn Oracle Training we provided by Oracle Certified Experts. Best Oracle Training institute in Chennai with Job Placement. Oracle Training in chennai
ReplyDeleteJump Start Your Career & Get Ahead. Choose sas training method that works for you. We offer an extensive list of courses in a variety of formats that make learning as easy as possible. SAS Training in Chennai
ReplyDeleteThis site has very useful inputs related to qtp.This page lists down detailed and information about QTP for beginners as well as experienced users of QTP. If you are a beginner, it is advised that you go through the one after the other as mentioned in the list. So let’s get started QTP Training in Chennai
ReplyDeleteVery nice articles,thanks for sharing this useful information.
ReplyDeleteOracle Golden Gate Online Training
Ruby On Rails Online Training
SAP XI Online Training
OBIEE Online Training
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteNice site.... refer this site .if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with Realtime methodologies.
ReplyDeleteOracle Rac Training Chennai
haddoop:
Execellent ! I am truly impressed that there is so much about this subject that has been revealed and you did it so nicely
ReplyDeletesas online training