Data Integration , map Reduce algorithm , virtualisation relation and trends

30 Friday May 2014

Tags

DATA INTEGRATION, ETL, HADOOP, MAP REDUCE

In year 2011 This reply i did to a discussion. would later structure it into proper article.

As of 2010 data virtualization had begun to advance ETL processing. The application of data virtualization to ETL allowed solving the most common ETL tasks of data migration and application integration for multiple dispersed data sources. So-called Virtual ETL operates with the abstracted representation of the objects or entities gathered from the variety of relational, semi-structured and unstructured data sources. ETL tools can leverage object-oriented modeling and work with entities’ representations persistently stored in a centrally located hub-and-spoke architecture. Such a collection that contains representations of the entities or objects gathered from the data sources for ETL processing is called a metadata repository and it can reside in memory[1] or be made persistent. By using a persistent metadata repository, ETL tools can transition from one-time projects to persistent middleware, performing data harmonization and data profiling consistently and in near-real time.

———————————————————————————————————————————————-

– More then colmunar databases i see probalistic databases : link:http://en.wikipedia.org/wiki/Probabilistic_database

A probabilistic database is an uncertain database in which the possible worlds have associated probabilities. Probabilistic database management systems are currently an active area of research. “While there are currently no commercial probabilistic database systems, several research prototypes exist…”[1]

Probabilistic databases distinguish between the logical data model and the physical representation of the data much like relational databases do in the ANSI-SPARC Architecture. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical database), succinctly.

————————————————————————————————————————————————

For Bigdata analysis the software which is getting popular today is IBM big data analytics

: http://www-01.ibm.com/software/data/infosphere/bigdata-analytics.html

I am writing about this too..already written some possible case study where and how to implement.

Understanding Big data PDF attached.

———————————————————————————————————————————————–

There are lot of other vendors which are also moving in products for cloud computing..in next release on SSIS hadoop feed will be available as source.

– Microstraegy and informatica already have it.

– this whole concept is based on mapreduce algorithm from google..There are online tutorials on mapreduce.(ppt attached)

—————————————————————————————————————————————–

Without a doubt, data analytics have a powerful new tool with the “map/reduce” development model, which has recently surged in popularity as open source solutions such as Hadoop have helped raise awareness.

Tool: You may be surprised to learn that the map/reduce pattern dates back to pioneering work in the 1980s which originally demonstrated the power of data parallel computing. Having proven its value to accelerate “time to insight,” map/reduce takes many forms and is now being offered in several competing frameworks.

If you are interested in adopting map/reduce within your organization, why not choose the easiest and best performing solution? ScaleOut StateServer’s in-memory data grid offers important advantages, such as industry-leading map/reduce performance and an extremely easy to use programming model that minimizes development time.

Here’s how ScaleOut map/reduce can give your data analysis the ideal map/reduce framework:

Industry-Leading Performance

ScaleOut StateServer’s in-memory data grids provide extremely fast data access for map/reduce. This avoids the overhead of staging data from disk and keeps the network from becoming a bottleneck.
ScaleOut StateServer eliminates unnecessary data motion by load-balancing the distributed data grid and accessing data in place. This gives your map/reduce consistently fast data access.
Automatic parallel speed-up takes full advantage of all servers, processors, and cores.
Integrated, easy-to-use APIs enable on-demand analytics; there’s no need to wait for batch jobs.

http://www.scaleoutsoftware.com/solutions/mapreduce/

A Day in Life of Business Intelligence (BI) Architect- part 1

30 Friday May 2014

Posted by sandyclassic in Uncategorized

≈ Leave a comment

Tags

BI, BI ARCHITECT, BUSINESS INTELLIGENCE, BUSINESS OBJECTS, COGNOS, COGNOS ARCHITECT, ETL, ETL ARCHITECT, SAP BO ARCHITECT

BI Architect most important responsibility is maintaining semantic Layer between Datawarehouse and BI Reports.
There are basically Two Roles of Architect: BI Architect or ETL Architect in data warehousing and BI. (ETL Architect in Future posts).
Semantic Layer Creation
Once data-warehouse is built and BI reports Needs to created. Then requirement gathering phase HLD High level design and LLD Low Level design are made.
Using HLD and LLD BI semantic layer is built in SAP BO its called Universe, in IBM Cognos using framework manager create Framework old version called catalogue, In Micro strategy its called project.
Once this semantic layer is built according to report data SQL requirements.
Note: Using semantic layer saves lot of time in adjustment of changed Business Logic in future change requests.
Real issues Example: Problems in semantic Layer creation like in SAP BO: Read

http://sandyclassic.wordpress.com/2013/09/18/how-to-solve-fan-trap-and-chasm-trap/

Report Development:
Reports are created using objects created by semantic layer.Complex reporting requirement for
1. UI require decision on flavour of reporting Tool like within
There are sets of reporting tool to choose from Like in IBM Cognos choose from Query Studio, Report Studio, Event Studio, Analysis Studio, Metric Studio.
2. Tool modification using SDK features are not enough then need to modify using Java/.net of VC++ API. At html level using AJAX javascript API or integrating with 3rd party API.
3. Report level macros/API for better UI.
4. Most important is data requirement my require Coding procedure at database or consolidations of various databases. Join Excel data with RDBMS and unstructured data using report level features. Data features may be more complex than UI.
5. user/data level security,LDAP integration.
6. Complex Scheduling of reports or bursting of reports may require modification using rarely Shell script or mostly Scheduling tool.
List is endless
Read More:
details of

http://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-bi-engineer-part-2/

Integration with Third party and Security
After This BI’s UI has to fixed to reflect customer requirement. There might be integration with other products and seamless integration of users By LDAP. And hence Objects level security, User level security of report data according to User roles.
Like a Manager see report with data The same data may not be visible to clerk when he sees same report. Due filtering of data by user roles using User Level security.

BI over Cloud
setting BI over cloud Read blog.
Cloud Computing relation to Business Intelligence and Datawarehousing

Read :
1. http://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/

2. http://sandyclassic.wordpress.com/2013/06/18/bigdatacloud-business-intelligence-and-analytics/

Cloud Computing and Unstructured Data Analysis Using
Apache Hadoop Hive
Read: http://sandyclassic.wordpress.com/2013/10/02/architecture-difference-between-sap-business-objects-and-ibm-cognos/
Also it compares Architecture of 2 Popular BI Tools.

Cloud Data warehouse Architecture:
http://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/

Future of BI
No one can predict future but these are directions where it moving in BI.
http://sandyclassic.wordpress.com/2012/10/23/future-cloud-will-convergence-bisoaapp-dev-and-security/

A day in life of BI Engineer part 2

30 Friday May 2014

Posted by sandyclassic in Uncategorized

≈ Leave a comment

Tags

ACTUATE, BI, BI MODELLING, BIRT, BUSINESS INTELLIGENCE, CLOUD, CLOUD COMPUTING, DATA DESIGN, DATA MODELLING, ETL, HADOOP, HIVE, IBM COGNOS, MICROSTRATEGY, ORACLE BI, SAP, SAP BO

Read Part1:

http://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-business-intelligence-engineer/

Part 2:
First few days should understand business otherwise cannot create effective reports.
9:00 -10am Meet customer to understands key facts which affect business.
10-12 prepare HLD High level Document containing 10,000 feet view of requirement.
version 1. it may refined later subsequent days.
12-1:30 attend scrum meeting to update status to rest of team. co-ordinate with Team Lead, Architect and project Manager for new activity assignment for new reports.
Usually person handling one domain area of business would be given that domain specific reports as during last report development resource already acquired domain knowledge.
And does not need to learn new domain..otherwise if becoming monotonous and want to move to new area. (like sales domain report for Chip manufactuers may contain demand planning etc…)
1:30-2:00 document the new reports to be worked on today.
2:00-2:30 Lunch
2:30-3:30 Look at LLD and HLD of new reports. find sources if they exist otherwise Semantic layer needs to modified.
3:30-4:00 co-ordinate with other resource reports requirement with Architect to modify semantic layer, and other reporting requirements.
4:00-5:00 Develop\code reports, conditional formatting,set scheduling option, verify data set.
5:00-5:30 Look at old defects rectify issues.(if there is separate team for defect handling then devote time on report development).
5:30-6:00 attend defect management call and present defect resolved pending issue with Testing team.
6:00-6:30 document the work done. And status of work assigned.
6:30-7:30 Look at report pending issues. Code or research work around.
7:30-8:00 report optimisation/research.
8:00=8:30 Dinner return back home.
Ofcourse has to look at bigger picture hence need to see what reports other worked on.
Then Also needed to understand ETL design , design rules/transformations used for the project. try to develop frameworks and generic report/code which can be reused.
Look at integration of these reports to ERP (SAP,peopesoft,oracle apps etc ), CMS (joomla, sharepoint), scheduling options, Cloud enablement, Ajax-fying reports web interfaces using third party library or report SDK, integration to web portals, portal creation for reports.
So these task do take time as and when they arrive.

SAP Business Objects Architecture, Reporting Infrastructure and BODI ETL

30 Friday May 2014

Posted by sandyclassic in Uncategorized

≈ Leave a comment

Tags

BODI, ETL, Reporting Infrastructure, SAP Business Objects Architecture

SAP BO part1 overview slides…
1 Overview
2-10 16 reporting tools in detail
11- Universe design
12- SAP HANA
13- BO Admin
14- BO SDK
15 – performance tunning and optimisation

SAP Business Objects Trianing from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

http://sandyclassic.wordpress.com/2011/11/04/architecture-and-sap-hana-vs-oracle-exadata-competitive-analysis/

http://sandyclassic.wordpress.com/2013/10/02/architecture-difference-between-sap-business-objects-and-ibm-cognos/

A Day in Life of Business Intelligence (BI) Architect- part 1

30 Friday May 2014

Posted by sandyclassic in Uncategorized

≈ Leave a comment

Tags

BI, BI ARCHITECT, BUSINESS INTELLIGENCE, BUSINESS OBJECTS, COGNOS, COGNOS ARCHITECT, ETL, ETL ARCHITECT, SAP BO ARCHITECT

http://sandyclassic.wordpress.com/2013/09/18/how-to-solve-fan-trap-and-chasm-trap/

http://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-bi-engineer-part-2/

BI over Cloud
setting BI over cloud Read blog.
Cloud Computing relation to Business Intelligence and Datawarehousing

Read :
1. http://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/

2. http://sandyclassic.wordpress.com/2013/06/18/bigdatacloud-business-intelligence-and-analytics/

Cloud Data warehouse Architecture:
http://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/

Future of BI
No one can predict future but these are directions where it moving in BI.
http://sandyclassic.wordpress.com/2012/10/23/future-cloud-will-convergence-bisoaapp-dev-and-security/

Aside

BI and Enterprise service Bus ESB, BPM orchestration and Process Re-engineering

30 Friday May 2014

Posted by sandyclassic in Uncategorized

≈ Leave a comment

Tags

BI, BPEL, BPM, BPR, ESB, ESM, ETL, PEGA, WEBLOGIC

Lets understand how Cognos product works internally

Most of BI product Architecture are almost similar internally.
BI Bus: Enterprise service Bus which surrounds all the services/servers which tool provide.
Typical ESB from Oracle BEA Aqualogic Stack engulfing many Web services looks like:
Now you can compare this popular ESB with BI internal Architecture.
you can read more about ESB at : http://docs.oracle.com/cd/E13171_01/alsb/docs20/concepts/overview.html
Under 4 tier system: A client connects the Web server (which is protected by firewall) using dispatcher. Dispatcher connects to Enterprise Service Bus (ESB) which surrounds all the application server services (Web services). ESB in case of cognos is Cognos BI Bus surrounds Web services Servers (like Report Server, Job server, Content Management server etc ). Mediation Layer Cognos BI Bus interacts with Non Java , C++ code which could not to converted or purposefully kept in C++ for may be more flexibility and speed
http://pic.dhe.ibm.com/infocenter/cbi/v10r1m1/index.jsp?topic=%2Fcom.ibm.swg.ba.cognos.crn_arch.10.1.1.doc%2Fc_arch_themulti-tierarchitecture.html

In case of SAP Business Objects (BO) ESB was not properly developed so an intermediate layer was created which works for interfacing between multiple servers like Job server, report server, page server etc. BO XI R2 came in pervius version was more in C++ to C++ to java bridge was created in ESB layer. Since Java was preferred language for coarse grain interoperability provided by web services. Each server was developed using web services.
interaction between web server was routed through BI Bus.
In latest version here u find a pipe connecting all components call Business Objects XI 3.1 Enterprise Infrastructure. Earlier version had different names. here you can see its connecting all server like Crystal report server, IFRS input file repository server( storing template of reports), OFRS Output file repository services, Program Job server(storing all programs which can be published on Portal (Infoview) ). This ESB does mediation between different server and achieves interoperability yet control of different components of products. This is in competitor product Cognos is called Cognos BI Bus.
http://bobi.blog.com/2013/06/02/sap-business-object-architecture-overview-and-comparatice-analysis/
For latest BO uses in memory product SAP HANA more about its competitors follow:
http://sandyclassic.wordpress.com/2011/11/04/architecture-and-sap-hana-vs-oracle-exadata-competitive-analysis/

In Micro-strategy there are two important server Intelligent server which creates cubes

More I will cover in later issues:
Oracle BI Architecture:
http://www.rittmanmead.com/2008/02/towards-a-future-oracle-bi-architecture/

Implementation OF BI system is not related to these product Architecture :
A typical BI system under implementation haveing componets of ETL, BI, databases, Web server, app server, production server, test/development server looks like:
More details: http://www.ibm.com/developerworks/patterns/bi/product-s390-web.html
Big Data Architecture:
From components perspective of ETL to BI implementation Aspect is little different

Hadoop Architecture layers:
http://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/
http://codemphasis.wordpress.com/2012/08/13/big-data-parallelism-and-hadoopbasics/

Just like UDDI registry is repository of Web

Business Intelligence Technology Trends

~ The greatest WordPress.com site in all the land!

Tag Archives: ETL

Data Integration , map Reduce algorithm , virtualisation relation and trends

Industry-Leading Performance

A Day in Life of Business Intelligence (BI) Architect- part 1

SAP Business Objects Architecture, Reporting Infrastructure and BODI ETL

A Day in Life of Business Intelligence (BI) Architect- part 1

BI and Enterprise service Bus ESB, BPM orchestration and Process Re-engineering