Importing data into Power BI is a crucial step in creating meaningful visualizations and insights. Here are four ways to import data into Power BI along with best practices to impress your boss:
1. Data Source
Power BI, a business analytics solution offered by Microsoft, empowers users to create compelling data visualizations from diverse data sources. It allows for seamless data import from a wide range of sources, followed by data modeling and report generation with rich visuals. The supported data sources include:
|Data Source Type||Examples of Data Sources|
|File||Excel, flat files, XML, JSON, PDF, Sharepoint|
|Database||SQL Server, Oracle, MySQL, IBM Db2, IBM Informix, SAP HANA, Teradata, Snowflake, MariaDB, Google BigQuery, Amazon Redshift|
|Power Platform||Power BI datasets, workflows, Microsoft Dataverse|
|Azure||Azure SQL Database, Azure Analysis Services database, Azure Data Explorer, Azure HDInsight (HDFS), Azure Databricks, Azure Blob Storage|
|Online Services||Salesforce, GitHub, LinkedIn Sales Navigator, Dynamics 365, Microsoft Exchange Online, Emigo Data Source, Smartsheet, Google Analytics, Adobe Analytics|
|Other||Web, ODBC, OLE DB, Active Directory, SharePoint, Python, R scripts, Hadoop File (HDFS), TIBCO|
With Power BI, you can import data from these sources, create a data model, and generate reports and visuals for effective business analytics and data-driven decision making.
2. Data Model
Power BI offers different data models for connecting to both on-premises and cloud-based data sources. These data models include:
- Import data: In this data model, data from various sources is imported into Power BI and stored in Power BI Desktop or Power BI Service. Power BI creates an in-memory data model that allows for fast and efficient data visualization and analysis. Any changes to the data source after importing will not be reflected until the data is refreshed in Power BI.
- Direct Query: In this data model, Power BI connects directly to the data source and retrieves data in real-time when a report or visual is accessed. This allows for real-time data analysis and ensures that the most up-to-date data is used in the reports. Direct Query is ideal for large datasets that may require frequent data updates.
- Composite mode: This data model allows for a combination of both imported data and Direct Query. Users can create a report that combines data from multiple sources, where some data is imported into Power BI while other data is queried directly from the source. This provides flexibility in choosing the appropriate data model for different data sources or scenarios.
- Live Connection: In this data model, Power BI establishes a live connection to an external data source, such as SQL Server Analysis Services, Azure Analysis Services, or Power BI Report Server. This allows for real-time data analysis and ensures that any changes to the data source are immediately reflected in Power BI reports.
The choice of data model depends on the specific requirements of the data source, the frequency of data updates, and the desired level of real-time data analysis. Power BI provides flexibility in choosing the appropriate data model based on the needs of the business and the data source being used.
3. Importing a data model in Power BI Desktop
When using the Import mode in Power BI Desktop, data is imported into the Power BI cache, which is the default and standard method for developing data models and creating visualizations. Once the Power BI Desktop solution is saved, the imported data is stored on disk and loaded into the cache, allowing for data querying and visual preparation. The imported data is compressed using the VertiPaq compression engine based on column-store in-memory technology, which reduces the file size of the data model.
For example, if you are importing data from a 1 GB Excel sheet, your data model in Power BI Desktop may not acquire a 1 GB file size. Due to data compression, the solution file size of the Power BI Desktop (PBIX) file may be only a few MBs.
In the Import data mode, you can use one or more data sources to import data into your data model. For instance, you can import 5 million sales records from a 595 MB CSV file for demonstration purposes.
The benefits of using the Import data model in Power BI Desktop include speedy performance for data visualizations and support for all features such as Q&A and Quick Insights.
However, there are also some disadvantages of using the Import data model, including the requirement for sufficient memory and resources on your system for both loading data and refreshing data. Generally, the model per dataset size cannot exceed 1 GB, although Power BI Premium offers more dataset storage. Data refresh requires reloading the entire table, although Power BI Premium does offer a preview feature for incremental data loading.
In the DirectQuery model in Power BI, data is not imported into the Power BI cache. Instead, only metadata that defines the structure of the data model is retained. When querying the data model, Power BI uses native queries against the data source to fetch the data.
For example, if your data is stored in tables of a SQL Server database, you have two options in the data connectivity mode: Import and DirectQuery.
The DirectQuery mode is suitable for cases where data is stored in relational databases, such as Microsoft SQL Server, Oracle, Amazon Redshift, Azure Data Bricks, Azure SQL Database, Impala, Google BigQuery, Snowflake, Teradata, and SAP HANA.
The benefits of using the DirectQuery model in Power BI include:
- Metadata storage: The DirectQuery model allows you to avoid importing massive amounts of data into the Power BI model, which helps you avoid model size limitations in Power BI.
- Visualizations over large volumes of data: With DirectQuery, you can easily create visualizations over large volumes of data without the need to import all the data into the Power BI model.
- Real-time data: Since Power BI sends queries to the data source in real-time, you do not need to manually refresh data like in a data import model. Report users get the latest data while interacting with report filters and slicers, and you can use the Automatic page refresh feature for real-time reports.
- Minimal memory requirement: The DirectQuery model requires minimal memory for metadata load, as it does not store data in the Power BI cache.
However, there are some disadvantages of using the DirectQuery model, including:
- Limited data source integration: DirectQuery does not support the integration of data from multiple data sources. It is limited to querying data from a single data source.
- Performance dependency on underlying data source: The performance of the DirectQuery model depends on the underlying data source. If there is a resource crunch on the data source, it may impact the performance of the Power BI Desktop model.
- Limited DAX function support: DirectQuery has limited support for DAX functions, especially for measure and time intelligence expressions, compared to the Import data model.
- No calculated tables: DirectQuery does not support calculated tables in the data model.
- Limited feature support: DirectQuery does not support features such as Q&A and Quick Insights in Power BI.
- No 1 GB dataset limitation: Unlike the Import data model, the DirectQuery model does not have a limitation of 1 GB for a dataset, as it does not store data in the Power BI cache.
6. Composite model in the Power BI
In the Composite mode in Power BI, you can combine both Import and DirectQuery models to achieve the best performance. You can configure the storage mode of a table as Import, DirectQuery, or Dual mode. When a table is configured as Dual storage, it can use both Import and DirectQuery modes, and Power BI service determines the most efficient way to query the data.
The Composite model allows you to leverage the benefits of both Import and DirectQuery modes. For example, you can use Import mode for smaller datasets that can be easily loaded into the Power BI cache for fast performance, and use DirectQuery mode for larger datasets that can be queried directly from the data source in real-time. This enables you to optimize performance and responsiveness based on the size and complexity of your data.
7. Live Connection
On the other hand, in Live Connection mode, Power BI does not import data into the cache like the Import data mode, nor does it store a copy of metadata like the DirectQuery mode. Instead, it connects directly to an existing data model using a visualization tool. Live Connection is suitable for cases where you already have a data model in place, and it is supported for data sources such as SQL Server Analysis Service (SSAS) tabular or multi-dimensional, Azure Analysis Service (tabular model), and Power BI dataset.
Live Connection mode allows you to leverage the existing data model and perform real-time analysis and visualization without importing the data into Power BI. This can be useful when you have a large and complex data model already set up in a separate data source, and you want to use Power BI as a visualization tool to create interactive reports and dashboards based on that data model.
The advantages of using Live Connection mode in Power BI include:
- Faster performance compared to DirectQuery: Live Connection mode leverages the analytical engine performance of the data source, such as SSAS tabular model or Azure Analysis Service, which can result in faster query execution and data retrieval compared to DirectQuery mode.
- Efficient data model processing: The calculations and processing are done at the analytics engine of the data source, which is typically more efficient than the Power BI engine. This can result in optimized performance and improved scalability for large and complex datasets.
- Support for complex calculations with DAX: SSAS tabular model, used as a live connection data source, allows you to use Data Analysis Expressions (DAX) to create complex calculations for your dashboards and reports, providing advanced analytical capabilities.
- Compressed data storage in tabular format: The data in the Azure Analysis Service, when used as a live connection data source, is stored in a compressed tabular format, which can optimize storage and reduce the amount of data transferred between Power BI and the data source, resulting in improved performance.
The disadvantages of using Live Connection mode in Power BI include:
- Single data source limitation: Like DirectQuery mode, Live Connection mode supports only a single data source. If you have multiple data sources, you need to combine them in the SSAS tabular model or Azure Analysis Service before using it as a live connection in Power BI.
- Limited data transformation capabilities: Power Query Transformations, which allow data transformation and cleansing operations in Power BI, cannot be used in Live Connection mode. All data transformations must be handled in the analysis service model.
- Lack of relationship definition: Unlike Import or DirectQuery modes, you cannot define relationships between tables in the Power BI model when using Live Connection mode. Relationships must be defined in the analysis service model.
- No support for multi-dimensional models: Live Connection mode does not support multi-dimensional models in Azure Analysis Service. It is only supported for tabular models in both SSAS and Azure Analysis Service.