Tuesday, January 7, 2014

IBM's Big Insights training notes


IBM’s Big Insights: A primer


Recently I had the opportunity to attend a training session about IBM’s Big Insights, in November 2013. Below are my notes about this product.


What is Big Insights in a nutshell?


Big Insights is IBM’s Big Data platform. It is comprised of an all-in-one Big Data infrastructure, with IBM’s flavor of Hadoop and its ecosystem, as well as proprietary tools to query the data like JAQL and AQL, and out-of-the-box connectors and interfaces called accelerators. We’ll review these components in details in the below section.

Big Insights Hadoop infrastructure

Big Insights is composed of a Hadoop infrastructure (independent from vendors like Cloudera). It is using a released version of Hadoop that is well-tested, usually a bit older from trunk. However it differs from the Apache version in some ways also. Big Insights comes integrated with:
-       GPFS (IBM’s version of HDFS) for its file system
-       Adaptive Map Reduce, an enhanced version of MR that attempts to optimize task executions, by way of using automatic job tuning of speculative execution and Task JVM reuses. Map Reduce tasks become aware of the global state of the job they are working in. This helps balance the workload across Map tasks. 
-       Zookeeper, HBase, Hive, Pig

Of note is the fact that Big Insights is not bundled with Cloudera’s CDH anymore; IBM has its own version of Hadoop.

New query language: JAQL

Big Insights offers a language called JAQL, a functional language that can interface will of all the Big Insights tools. It provides API's (or modules) for reaching out to external IBM and 3rd party tools, such as relational databases, indexing services, text analytics, machine learning etc. JAQL stands for Json Query Language, because it is represented via Json. Similar to Pig, Jaql is automatically taking care managing the complexities of the MapReduce world to optimally perform the work. However it also manages deep level nested semi-structured data.
Jaql can be executed either from its own shell, or from within Eclipse.

Big Insights Applications

Big Insights provides an environment for developing and executing applications. A business user can launch existing applications from the Web console, supply any input parameters and view results.  These applications may be developed using Big Insights’ development tooling which enables programmers to publish completed applications through the Web console.
The BigInsights Eclipse tools include wizards, code generators, context-sensitive help, and a test environment to simplify your development efforts.
Workflow applications are run by Oozie as a workflow job.

Big Sheets

Big Insights also comes with a spreadsheet-like interface to interact with Big data in a manner business users would use Excel. To do so, it presents a familiar interface (e.g. Pivot, Union, Intersection functions) that allows users to gather, filter, combine, explore, and visualize data from various sources. Big Sheets has been designed to be used by non-technical professionals to rapidly gather insight (BigSheets executes work on a simulated environment of sample data first) and analysis from huge amounts of data, and to be able to act on those insights in a timely manner. No need to understand database schemas, no need to understand a query language. And Big Sheets conveniently has a built-in visualization module to chart and publish the results.
Also, the nice thing about it is that Big Sheets is integrated natively with the other Big Insights components, so it’s easy to navigate between the different tools that Big Insights provides; e.g. create an ETL job in Jaql and export the results to Big Sheets..


Big Data Accelerators

Big Insights bundles in some pre-built components for specific solutions to accelerate development on certain specific use cases. The accelerators generally provide business logic, data processing and visualization. An example of this is the Social Data Analytics accelerator, providing  a set of predefined elements as workbooks and dashboards to analyse social data.

Other Big Data tools

The IBM Big Data platform is comprised of Big Sheets, but also other tools like Infosphere Streams for low latency data, and an MPP (Massively Parallel Processing) database. The IBM ecosystem also seems to support Big Data: R is supported in Big Insights, Cognos supports Hive, Netezza integrates with Streams. These systems offer complementary analytical approaches.

IBM offers a free downloadable virtual machine to play with Big Insights.

Overall a good experience, although one can get easily lost by the sea of products IBM offers. On the other hand  tools like Big Sheets and the Accelerators seem very valuable.

13 comments:

  1. Really a valuable content, keep sharing post like this. It will be helpful to many like me in finding the institute for Hadoop training chennai velachery

    ReplyDelete
  2. Genuinely a critical substance, keep sharing post like this. It will be valuable to various like me in finding the association forhadoop training in chennai | hadoop training in chennai

    ReplyDelete
  3. It was really a wonderful article and I was really impressed by reading this blog. We are giving all software and Database Course Online Training. Oracle Training in Chennai is one of the reputed Training institute in Chennai. They give professional and real time training for all students.

    Oracle Training in chennai

    ReplyDelete
  4. hey nice site..learn Oracle Training we provided by Oracle Certified Experts. Best Oracle Training institute in Chennai with Job Placement. Oracle Training in chennai

    ReplyDelete
  5. Jump Start Your Career & Get Ahead. Choose sas training method that works for you. We offer an extensive list of courses in a variety of formats that make learning as easy as possible. SAS Training in Chennai

    ReplyDelete
  6. This site has very useful inputs related to qtp.This page lists down detailed and information about QTP for beginners as well as experienced users of QTP. If you are a beginner, it is advised that you go through the one after the other as mentioned in the list. So let’s get started QTP Training in Chennai

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. Nice site.... refer this site .if Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with Realtime methodologies.
    Oracle Rac Training Chennai
    haddoop:

    ReplyDelete
  12. Execellent ! I am truly impressed that there is so much about this subject that has been revealed and you did it so nicely
    sas online training

    ReplyDelete

Note: Only a member of this blog may post a comment.