Rapidly Rebuild Enterprise BI with Power BI Dataflows
[this blog post was written before the release of Power BI datamarts]
Moving from a traditional business intelligence system to Power BI can be a large undertaking, with plenty of risk. Two challenges that I’d like to address are 1. moving from flat table system to a model-based system, and 2. changing from centralized to decentralized reporting.
As a model-based tool, Power BI is different from many other approaches. Many systems, like Crystal Reports, OBIEE, and large Tableau deployments, use SQL queries embedded in reports to generate visuals. Or they may use custom SQL views or extracts to prep the data specifically for a particular report. Large enterprises typically have data warehouses with data neatly organized into fact tables (records of transactions) and dimension tables (descriptive data about companies, people, or other features in the fact tables). In Power BI the data model will ideally be a simple star schema. One fact table connected to many dimensions. Or, several fact tables, each connected to a shared set of dimensions. These models can be built using the data warehouse, but it takes time to learn how to build good models which can power many reports. It’s better to create a small flat table from several tables and a few columns than to make a massive model with complex relationships.
With the licensing model of Power BI, an organization can move from a centralized model to a distributed model for reporting. This is a huge cultural change for organizations. While departments know what makes sense with their data, the world of facts, dimensions, and data modeling can be strange. Both technical users and business users will need to learn a new reporting tool and a new way of organizing reports.
Educational Case Study
Details here are for purposes of discussion only and reflect my personal thoughts and opinions. Readers are encouraged to form their own conclusions.
Internal reporting for a large enterprise wanted to move from an older version of a traditional BI platform to Power BI. Project goals were to reduce costs on license renewal and to decentralize reporting, which was becoming unsustainable. Above all, they wanted to consolidate from many tools to a single platform. The enterprise had an extensive on-premises data warehouse and operational data store system, bringing together information from many different systems. They had thousands of reports used to support decision making on a weekly, monthly, quarterly basis.
Working with Microsoft on the implementation, the project team learned that Power BI performs best using a data model. For the first phase of the project, however, they wanted to quickly move reports to Power BI. It was critical to limit the time frame for users to split their attention between two systems. For each subject area, the team quickly put together large models drawing from views in the data warehouse. Because they wanted to make these models available quickly to analysts, they brought in the views as is— without simplifying and transforming the tables into star schemas. In order to connect these models, they resorted to creating many-to-many relationships and using bi-directional filtering. As a result, the project team spent a lot of time building enhancements for the large models and troubleshooting their confusing logic.
A Different Approach: dataflows [to simplify data warehouse tables]
The primary goal of the enterprise was to quickly standardize on Power BI. With this requirement, the simplest approach would be to enable departments to rebuild their flat tables in Power BI. This would give them immediate familiarity with the tool and their own data, while allowing the internal reporting team time to learn Power BI data modeling. Flat tables for traditional reports are typically small, bringing in just the tables and columns needed to create a specific visual or set of visuals.
Enterprises need to control access and limit the impact on the data warehouse. The internal reporting team could create Power BI dataflows instead of building data models. Dataflows are similar to traditional database views in that they enable access to data for analysts, but instead of residing in the database, these tables are stored as csv files in the cloud. Rather than building large Power BI data models, the project team would provide analysts with data marts of dataflows organized in Power BI workspaces according to subject area. The analysts would then look at the SQL for the original reports and re-create those joins, filters, and formulas in Power BI. Analysts would need to be trained on how to do merges in Power Query to recreate the original joins. They would also need to re-create filters in Power Query to ensure that the data is right sized for import. For larger tables, DirectQuery connection to the dataflows can be enabled by putting the dataflows in Premium capacity workspaces.
Conclusions
When making changes, it’s critical to limit scope to ensure success. With this dataflow data mart model, analysts quickly get their hands on the data and begin building reports in Power BI. The project team provides support in learning the tool and in re-creating the SQL for reports. This approach avoids creating supersets of data that are overly large and complicated (either as single tables or huge models). This approach also gives both the project team and analysts time to learn how to create robust, right-sized data models which take full advantage of the distinct advantages of Power BI.
Moving to Power BI from a traditional system is a large, difficult process, and “anyone who says otherwise is selling something.” It’s a big topic, and some others have tackled this subject in different ways. I liked this paper by Josh Crittendon on Creating Efficient Power BI Datasets Over Snowflake. I also appreciate the phased migration case study provided by Microsoft. The hard work is figuring out how to build robust star schemas out of data warehouse tables of varying granularity. I plan to look at that in more depth.