BI & BIG DATA - Frequently Asked Questions (FAQ)

What is BI(Business Intelligence)?

BI is a technological method created to help companies make the best decisions for growth. It is intelligence applied to the collection and interpretation of data(with the help of high performance software), which makes decisions move from intuition to professionalism, that is, business intelligence is the set of practices that avoid a kind of "guessing" and ensure assertiveness in decision making.

What is Big Data?

It consists of a large amount of unstructured data, such as data from social networks, web logs, and text data. In BI, it enters as another data source, which needs to go through the transformation process and be stored in the Data Warehouse to be analyzed.

What is a Data Source?

These are the spreadsheets, ERPs, CRMs, etc., from where the data is taken to be inserted into the Data Warehouse. Data sources are generally composed of structured or semi-structured data, where you can't have redundancy, and are modeled for data insertion and editing, not querying.

What is Data Mining?

While BI meets the already known needs of the business, Data Mining searches for information that is not being monitored yet, going through the data looking for patterns and anomalies.

What is Data Integration?

This is the stage where ETL takes place. It is the stage where data is taken from the source sources, transformed so that it makes sense together, and inserted into the Data Warehouse.

What is ETL and why is it important?

When we talk about BI(Business Intelligence) it is almost mandatory to talk about the ETL process. The acronym stands for Extract , Transform and Load and aims to work with all the data extraction from external sources. This transformation seeks to meet the business needs and load the data into the Data Warehouse or Data Mart or for demands of importing and exporting data.

EXTRACTION: is the phase in which the data is extracted from the OLTPs and taken to the stagingarea, where it is converted into a single format.

TRANSFORMATION: This is the stage where we make adjustments to improve the quality of the data and consolidate data from two or more sources.

LOAD: consists of physically structuring and loading the data into the presentation layer following the dimensional model.

The ETL process today is considered one of the most important processes within a BI project. It is one of the most critical phases. It is where the intelligence is. It is where the rules regarding the business are defined and implemented.

What is the Stage Area?

A temporary area that is usually in a relational database and is decoupled from the source. It has loose, unrelated tables where the data is transformed to be sent to the Data Warehouse.

What is the difference between Data Warehouse (DW) and Data Mart (DM)?

The difference between a DW and a DM basically consists of data volume, scope and focus. While the DW focuses on the organization as a whole the DMs focus on a certain department or specific set of users, for example. Building this warehouse can happen in two ways, each approach has its pros and cons. The circumstances and particularities of each project will determine which one to use.

In the Top-Down approach you first build the DW(corporate) and then create the DMs(departmental) or you can use the Bottom-Up approach where you first create the DMs and then build the organization's DW.

Data Warehouse applications: DW is a tool for executives, which aims to assist decision making at the strategic level, through the manipulation of historical data. It is applicable to a wide range of companies from the most diverse segments.

Data Mart Applications: The DM is a smaller tool, which can serve the most diverse companies in the same way, but can serve a specific department of the company, such as the sales or purchasing sector, for example. Because it has a lower development cost, it can be a viable option for smaller companies and can be implemented module by module until it constitutes a DW.

What is Data Architecture?

Data architecture values the data asset base of organizations and requires a process of rationalization of data and associated flows. This initiative results in the development of the data organization and modifies the traditional view of Business Intelligence architecture.

What is Dimensional Modeling?

It is a form of data modeling that seeks to simplify the database and make queries faster for decision support systems.

What is Star Schema or Star Model?

The star model is composed in the center of a fact table that is surrounded by dimensions and therefore has the name Star Schema, because it looks like a star.

What is the Snowflake Schema or Snowflake Model?

The Snowflake model also has a fact table surrounded by dimensions, but follows the principle of normalizing dimensions by removing low cardinality attributes and creating separate tables.

What is Fact Constellation Schema or Fact Constellation Model?

Model with multiple fact tables that share dimensions, also known as a Galaxy Schema.

What is Fact Table?

It is the main table in the Data Warehouse, it is in the center of the Star Schema and is surrounded by dimensions. The fact table stores what has happened, it is the fact itself.

The suit stores 2 things:

  • The metrics
  • The keys to the dimensions

What is Dimension Table?

It describes the fact that occurred, it contains the characteristics of the event. It will qualify, classify or describe the metrics that are in the fact.

The dimension stores 3 things:

  • A Surrogate Key
  • A Natural Key
  • The attributes

What is Cube?

Cube is a concept. It serves to manipulate and analyze a large volume of data from multiple perspectives and hypotheses. Cubes allow you to filter, slice and pivot data in real time, like in a pivot table.

What is Granularity?

It is the level of detail of the data. High granularity is the grain, the lowest level of the data.

What is Drill-Up, Drill-Down and Drill-Through?

Drill-Down: is when you go down the data hierarchy level, increasing the granularity and level of detail.

Drill-Up: is when you go up the data hierarchy level, decreasing the granularity and level of detail.

Drill-Through: Instead of moving vertically, like drill down and drill up, drill through moves horizontally, moving from one report to another while analyzing the same sample of data.

What is Data Visualization?

It is the stage where the information is presented, with dashboards, graphs, and reports.

What is a Dashboard?

One of the data visualization tools. It is a dashboard that visually presents the most important and necessary information for decision making.

What is a Metric?

Anything that the company is going to measure is a metric. They are used to measure something and they are always numbers, because they have to be countable. These numbers come from the company's transactions.

What is a KPI (Key Performance Indicator)?

It is an index to measure percentually the variations that occur in the company.

What is Machine Learning?

The combined use of large amounts of information and relatively simple learning algorithms makes it possible to solve problems that, until recently, were considered unsolvable. A major discipline of artificial intelligence, machine learning deals with the analysis of exploratory data for the most sophisticated inference and classification or regression techniques. Machine learning allows companies to work with efficient predictive and prescriptive analysis models to anticipate and optimize their decision processes, costs and revenues.

What is Predictive Analysis for?

The concept of predictive analytics is closely linked to the notions of "data mining," which are already familiar in the sphere of business intelligence. Progress in algorithms today allows inferences to be extended beyond the analysis of retrospective trends. The goal now is to help companies obtain a potential, anticipatory result in order to then produce forecasting and decision-making methods automatically, based on results from data analysis.

What is the role of the Data Scientist?

It is a rare and well-demanded strategic profile. Data scientists enable technology companies and innovators to tackle the biggest problem of the new digital economy: that of the development of the data network.

What is Visual Thinking?

Without data visualization, it is not possible to interpret the results of Big Data analysis intelligibly and simply. To direct the focus to what is most important to achieve quick and optimal decision making: this is the goal of data visualization. The tools used in companies offer a limited choice of graphical representations that prove ineffective and unimpressive. The links between business intelligence, data visualization and the brain are inherent mechanisms involved when consulting an analysis that contains graphical elements, a report or a dashboard.

What is User Experience for?

Consumer intelligence represents the development and merging of customer insight, interaction, personalization, and performance. The goal is to develop the 360-degree customer view by facilitating the aggregation and visualization of structured and unstructured data. In this way, the customer gets a holistic view of consumers(individuals, professionals and companies) made possible by massive data science(Big Data).