Power Query to build code free transformations in Azure Data Factory (2023)

By: John Miner |Updated: 2022-02-10 |Comments (2) | Related: > Azure Data Factory


Problem

Azure Data Factory is a very popular extract, load and translate tool (ELT).The copy activity is at the center of this design paradigm. Many citizendata engineers have experience withPower Query. The mash-up editorallows the citizen to create code free transformations on the data.

How can we leverage existing knowledge ofPower Query to bring transforms tostandard ADF pipeline?

Solution

Microsoft hasannounced the general availability of the Power Query activity in October of2021. This feature has been in preview since 2019. In the past, it wasreferred to aswrangling data flows. How does Power Query execute in the existing pipelineframework that is available with Azure Data Factory?

I am going to use a quote from on-linedocumentation that sums up how the M code executes.

"Data Factory translates M generated by the Power Query Online Mashup Editorinto spark code for cloud scale execution by translating M into Azure DataFactory Data Flows. Wrangling data with Power Query and data flows areespecially useful for data engineers or 'citizen data integrators '."

Today, we are going to explore how to use this activity in a pipeline that readsa WEB table, cleans up the data, and writes it into a SQL table.

Business Problem

January is a fun time for Football fans in the United States. The AFC and NFCplayoffs take place during the month of January with teams’ competingfor a spot in the Superbowl championship in early February. Today, the bestsix teams of the two conferences make it to the playoffs. The wild card or1st round of the playoffs isused to pick two winners out of teams ranked 3 to 6 in each division. Thetwo winners from each divisional round go to the conference championship or 3rdround. The two conference champions compete in one final game for the titleof Superbowl winner. Please seeWikipediafor some history around the post season playoffs.

Our manager has asked us to create a simple prototype to demonstrate how newPower Query activity can be used to clean up (transform) data from this Wikipediapage. This proof of concept (POC) will be created using Azure Data Factory.

Power Query to build code free transformations in Azure Data Factory (1)

The above image shows how information will flow from the web table in Wikipediato the database table in SQL Server. Because spark is at the heart of thePower Query activity, an intermediate step must be used. Raw data will belanded saved in the bronze quality zone of a data lake. The refined (cleaned-up)data from our Power Query activity will be saved in the silver quality zone.We will use three separate pipelines during our prototype phase for each step ofthe data flow. Once we have a handle on this new control, we will combinethe activities into one final pipeline.

Source Linked Service and Data Set

Before we can start developing a pipeline, we must define both the source andtarget linked services and data sets. The image below shows how objects arebuilt upon each other to provide the final pipeline solution.

Power Query to build code free transformations in Azure Data Factory (2)

The first step is to define the source linked service. Use the manage iconwithin the Data Factory Studio to get access to the linked services screen.Please hit the new + icon to create the new definition. Theimage below shows that the web table is the correct object to create. Oneprofessional tip is to use the search button to find the correct linked serviceto create. I did a search on "we" which broughtup two results. It is important to note that only a self-hosted integrationruntime (SHIR) can access a table located on a web page. This will not workwith the default Azure Integration Runtime.

Power Query to build code free transformations in Azure Data Factory (3)

It is good practice to follow anaming convention when defining the objects with Data Factory. The imagebelow shows the newly created linked service called LS_WEB_TABLE_WIKIPEDIAthat points to the correct WIKIPEDIA page for Super Bowl information.Since this page is open to the whole world, we are using anonymous authentication.Please note that my self-integration runtime is named SHIR-01.Another professional tip is to always check the connection by testing. Tocomplete the task, save the newly created object and publish if necessary.

Power Query to build code free transformations in Azure Data Factory (4)

The second step is to define the source data set. Use the authoricon to access the factory resources. Click the new + iconto create a new dataset. Please select the web table as the source type.

Power Query to build code free transformations in Azure Data Factory (5)

Please save the dataset without testing. The reason behind this requestis the fact that the web page has multiple WEB (HTML) tables. The image belowshows the index selection on the web page to be [1]. I found this out by examiningthe HTML source code behind the page using my web browser. Check out thisarticleon how to find this out using various web browsers. It is important to documentyour work by using the description area of all objects you create. The nameof the new source data set is DS_WEB_TABLE_SUPERBOWLS. Tocomplete the task, save the newly created object and publish if necessary.

Power Query to build code free transformations in Azure Data Factory (6)

The preview data button can be used to unit test the data set. The imagebelow shows the results pulled in from the web page. There is a lot of cleanup to be done on the columns in the table. It was decided that the last twocolumns in the table are not important and should be dropped from the data set.

Power Query to build code free transformations in Azure Data Factory (7)

Now that we have defined the source objects, we need to define the destinationobjects. However, at this point the Azure SQL server, Azure SQL database,and Azure SQL table do not exist.

Destination Database and Table

I already have a logical Azure SQL Server named svr4tips2020.This object has existed in my subscription for at least 2 years. I can connectto the server using SQL Server Management Studio and execute the following codeto create a new database called dbs4tips2022.

-- Which database to use. USE [master] GO -- Delete existing database DROP DATABASE IF EXISTS [dbs4tips2022] GO -- Create new database CREATE DATABASE [dbs4tips2022] GO 

Next, I want to create a schema called stage. Executethe following code in a SSMS query window to accomplish this task.

-- Which database to use. USE [dbs4tips2022] GO -- Delete existing schema DROP SCHEMA IF EXISTS [stage] GO -- Create new schema CREATE SCHEMA [stage] AUTHORIZATION [dbo] GO 

To finish the task, we want to create the table to hold the Super Bowl data inthe stage schema. Please execute the code below to create our target table.

-- Which database to use. USE [dbs4tips2022] GO -- Delete existing table DROP TABLE IF EXISTS [stage].[superbowl] GO -- Add new table CREATE TABLE [stage].[superbowl] ( [game] nvarchar (8) NULL, [date] datetime NULL, [winning_team] nvarchar (32) NULL, [score] nvarchar (16) NULL, [losing_team] nvarchar (32) NULL, [venue] nvarchar (32) NULL, [city] nvarchar (32) NULL, [attendance] nvarchar (8) NULL ) GO 

Please repeat the above tasks to create an active schema andcorresponding table. The image below shows the results. Was executingthe create database statement without any options a wise thing to do?

Power Query to build code free transformations in Azure Data Factory (8)

By default, a database will be created with Virtual CPUs using the General-Purposetier. As a result, this database for a quick proof of concept (POC) wouldcost us $380 dollars a month or $4560 per year.Please see image below.

Power Query to build code free transformations in Azure Data Factory (9)

What happens if we leave this database deployed after the proof of concept?Our boss would not be happy about the monthly charge. Is there a better tierto use for testing new ideas? That is where theserverless computing tier shines. When we use the database or VirtualCPUs, we get charged for the hours that we use, not by the month or year.Yes, there is a cost for storage, but it is only $5 per month.See image below for details.

Power Query to build code free transformations in Azure Data Factory (10)

Now that we have a destination database and table, let us define the destinationobjects in Azure Data Factory in the next section.

Destination Linked Service and Data Set

The first step is to define the destination linked service. Use the manageicon within the Data Factory Studio to get access to the linked services screen.Please hit the new + icon to create the new definition. Theimage below shows that the Azure SQL Database is the correct object to create.I used the search button to find the correct linked service to create. A searchon the word "azure" which brought up eight results and I picked thecorrect object from the list to create.

Power Query to build code free transformations in Azure Data Factory (11)

The image below shows the options for the new linked service. The optionsthat I used to create the destination linked service are the following:LS_ASQL_TIPS_2022 is the name of the object, the auto resolve integrationruntime is the computing power, the server and database names were chosen from thecurrent subscription, and the managed identity was used as the authentication method.When possible, use a managed identity to get out of password maintenance cycle.Make sure you test the connection before moving on. Any errors will be relatedto either the server firewall or access privileges to the database. Resolvethem to get a successful connection.

Power Query to build code free transformations in Azure Data Factory (12)

The second step is to define the destination data set. Use the authoricon to access the factory resources. Click the new + iconto create a new dataset. Please select the Azure SQL Database the source type.

Power Query to build code free transformations in Azure Data Factory (13)

Use the settings dialog box to configure the data set. Then choose to namethe data set DS_ASQL_TABLE_SUPERBOWLS. It is using the linkedservice we previously defined above and it points to the [stage].[superbowl]table.

Power Query to build code free transformations in Azure Data Factory (14)

Again, it is very important to document your work by using the description textbox. We could parameterize the schema and table names if we think there isgoing to be re-use of the data set. For now, the information is hard codedin the text box.

Power Query to build code free transformations in Azure Data Factory (15)

The final step is to preview the destination data set. We can see thatthe Azure SQL Database table is currently empty.

Power Query to build code free transformations in Azure Data Factory (16)

Copy WEB table to ADLS file

The first leg of the data flow (information journey) is to read the source datafrom the WEB page and write this table information to a CSV file in the raw zoneof the data lake. This action is required since a WEB TABLE as a source isnot supported by Wrangling Data Flows (Power Query Activity). We are goingto try using the first format in the list that is supported by ADLS Gen2 connector.

Power Query to build code free transformations in Azure Data Factory (17)

Please create a new pipeline called PL_COPY_WEB_TABLE_2_ADLS_FILE,drag over the copy activity onto the canvas and name the object ACT_MT_CPY_WEB_TABLE_2_ADLS_FILE.The next few paragraphs go over how to configure the activity. I am givingthe integration engine 5 minutes to copy the data and 1 retry attempt if the copyaction fails. This might be overkill for this activity, but why not give theengine time to grab the data?

Power Query to build code free transformations in Azure Data Factory (18)

The source data set is the web table that we defined earlier. The imagebelow shows the name chosen from the drop-down box.

Power Query to build code free transformations in Azure Data Factory (19)

I am assuming you know how to create both a linked service and data set for aCSV file. If not, please refer to myprior article that goes over supporting multiple file formats in Azure DataLake using Azure Data Factory. The DS_ADLS2_BRONZE_CSVFILE_SUPERBOWLSdata set points to a CSV file named games.csv located in the/bronze/superbowl directory. The sc4tips2021container resides within the sa4tips2021 storage account.The image below shows the sink (target) being set to this data set.

Power Query to build code free transformations in Azure Data Factory (20)

Please test the pipeline by clicking the debug button which tells the self-hostedintegration engine to execute the code. The image below shows a successfulexecution.

Power Query to build code free transformations in Azure Data Factory (21)

If we browse to the ADLS Gen 2 container, we can download the newly created CSVfile. What is going on with the directory location and storage container nameseen below? I am cleaning up my MSDN subscription and consolidating storageaccounts. Therefore, this image reflects that change.

Power Query to build code free transformations in Azure Data Factory (22)

The image below shows the CSV file stored in my sub-directory and currently openedin my favorite editor, notepad++. Here is the spoiler alert. The fieldsthat are pulled from Wikipedia have carriage returns (#cr) and line feeds (#lf)in the CSV (ASCII) formatted file. These characters will cause issues withthe default settings of the Spark Engine that executes the translated M languagecode. Therefore, I suggest using a parquet (binary) formatted file for thefinal program to work.

Power Query to build code free transformations in Azure Data Factory (23)

Now that we have the data stored in the bronze quality zone of the data lake,we can start to create the Power Query script to execute in the ADF activity.

Transform CSV file with Power Query

The second leg of the data flow (information journey) is to read the raw datafrom bronze quality zone, to transform the data using the Power Query (script) activityand to write this information to a CSV file in refined quality zone of the datalake. Please seedocumentation for details.

Please create a new pipeline named PL_TRANS_CLEANUP_CSVFILE,drag over the new Power Query activity onto the canvas and name the objectACT_PQ_FLOW_SUPERBOWL. Since wrangling data flows (PowerQuery Activity) run under a spark engine, we need to pick a compute size.For now, please choose the default settings.

Power Query to build code free transformations in Azure Data Factory (24)

Please skip to the sink section of the activity. Click the add button tostart defining the Power Query transformations. The image below shows sinksettings in which the output from the Power Query is mapped to a CSV file in therefined quality directory in the data lake. This data set is calledDS_ADLS2_SILVER_CSVFILE_SUPERBOWLS.

Power Query to build code free transformations in Azure Data Factory (25)

The first action to perform within the online Power Query interface is to adda source data set. I am using the data set named DS_ADLS2_BRONZE_CSVFILE_SUPERBOWLS.By default, all rows and columns will be loaded into our familiar interface.If you are Power BI report designer, you should be right at home.

Power Query to build code free transformations in Azure Data Factory (26)

We can see that the field named Game has unwanted charactersat the end of the string. One common design pattern in the Power Query interfaceis to split a column into two fields on the delimiter. The first field containsthe clean data while the second field is tossed away. The image below showsthis action being performed on the Game field using the [ character.

Power Query to build code free transformations in Azure Data Factory (27)

The image below shows the results of our action. While the online graphicalinterface understands the chosen action, the underlying Spark Engine does not supportthe syntax for data wrangling. Please see on-line Microsoftdocumentation for which transformation functions are supported.

Power Query to build code free transformations in Azure Data Factory (28)

The data that is coming from the Wikipedia website is considered text.The destination of the data in this example is a database. We might want totransform the human readable date of January 15th,1967, to 01/15/1967. The image below shows the conversion of a string fromtext to date as not being supported.

Power Query to build code free transformations in Azure Data Factory (29)

I am well versed in using the M language. Using the supported functionsfound in the documentation as reference, I came up with the following script thatwill transform our data. Please use the advanced editor feature to replacethe existing script with this one.

let Source = DS_ADLS2_BRONZE_CSVFILE_SUPERBOWLS, Step01 = Table.SelectRows(Source, each [Losing team] <> "To be determined"), Step02 = Table.ReplaceValue(Step01, "#(cr)#(lf)", " ", Replacer.ReplaceText, {"Game"}), Step03 = Table.ReplaceValue(Step02, "50", "L", Replacer.ReplaceText, {"Game"}), Step04 = Table.ReplaceValue(Step03, "#(cr)#(lf)", " ", Replacer.ReplaceText, {"Winning team"}), Step05 = Table.ReplaceValue(Step04, "#(cr)#(lf)", " ", Replacer.ReplaceText, {"Losing team"}), Step06 = Table.ReplaceValue(Step05, ",", "", Replacer.ReplaceText, {"Attendance"}), Step07 = Table.AddColumn(Step06, "game", each Text.Trim(Text.Start([#"Game"] & "[", Text.PositionOf([#"Game"] & "[", "[")))), Step08 = Table.AddColumn(Step07, "date", each Text.Trim(Text.Start([#"Date/Season"] & "(", Text.PositionOf([#"Date/Season"] & "(", "(")))), Step09 = Table.AddColumn(Step08, "winning_team", each Text.Trim(Text.Start([#"Winning team"] & "(", Text.PositionOf([#"Winning team"] & "(", "(")))), Step10 = Table.AddColumn(Step09, "score", each Text.Trim(Text.Start([#"Score"] & " ", Text.PositionOf([#"Score"] & " ", " ")))), Step11 = Table.AddColumn(Step10, "losing_team", each Text.Trim(Text.Start([#"Losing team"] & "(", Text.PositionOf([#"Losing team"] & "(", "(")))), Step12 = Table.AddColumn(Step11, "venue.1", each Text.Trim(Text.Start([#"Venue"] & "(", Text.PositionOf([#"Venue"] & "(", "(")))), Step13 = Table.AddColumn(Step12, "city.1", each Text.Trim(Text.Start([#"City"] & "(", Text.PositionOf([#"City"] & "(", "(")))), Step14 = Table.AddColumn(Step13, "venue", each Text.Trim(Text.Start([#"venue.1"] & "[", Text.PositionOf([#"venue.1"] & "[", "[")))), Step15 = Table.AddColumn(Step14, "city", each Text.Trim(Text.Start([#"city.1"] & "[", Text.PositionOf([#"city.1"] & "[", "[")))), Step16 = Table.RemoveColumns(Step15, {"Game", "Date/Season", "Winning team", "Score", "Losing team", "Venue", "City", "Referee", "Ref", "venue.1", "city.1"}), Step17 = Table.RenameColumns(Step16, {{"Attendance", "attendance"}}), Step18 = Table.TransformColumnTypes(Step17, {{"game", type text}, {"date", type text}, {"winning_team", type text}, {"score", type text}, {"losing_team", type text}, {"venue", type text}, {"city", type text}, {"attendance", type text}}), Step19 = Table.SelectColumns(Step18, {"game", "date", "winning_team", "score", "losing_team", "venue", "city", "attendance"}) in Step19 

If we execute the script within the graphical interface, we can see that thecorrect output is shown below.

Power Query to build code free transformations in Azure Data Factory (30)

Unlike a regular pipeline, debugging a mapping or wrangling data flow requirescomputing power for the spark cluster. By default, this auto resolve integrationruntime will stay active for 1 hour before terminating. See image below fordetails.

Power Query to build code free transformations in Azure Data Factory (31)

If we save and execute the pipeline, we get the following error. This issuetook me some time to figure out. I am going to let the suspense build beforeI give you the answer.

Power Query to build code free transformations in Azure Data Factory (32)

The issue resides in the fact that the Spark Engine that executes the translatedM language handles the malformed CSV file differently than the traditional pipeline.A traditional pipeline understands that a carriage-return or linefeed that occursinside a quoted field should be ignored. Instead, the Spark Engine splitsfile into many more records than there are super bowls. The null value assertis triggered by the fact that many fields are null.

A quick fix is to modify the file to remove any unwanted character sequences.Just download the file from the raw zone. Use a text editor to modify thefile. Last, overwrite the existing file with an upload action.

Power Query to build code free transformations in Azure Data Factory (33)

Please debug the completed pipeline with the updated input file. It willnow run to completion without any errors. We can use the Azure Portal to explorethe files in the silver (refined) quality zone. We can see that the SparkEngine create many output files. This is normal for a Spark program.We just need to use a directory path instead of fully qualified file name when readingdata.

Power Query to build code free transformations in Azure Data Factory (34)

In summary, the Online Power Query interface allows for transformations thatare not available with wrangling data flows. Consult the documentation whendesigning your mashup script. Because Spark appends to the output directory,there will be an issue in the future with duplicate records. The easiest wayto fix this problem is to insert adelete folder activity to remove the folder before writing the output.

I did open anissue on the git hub repository for the Product Team at Microsoft who maintainsthe code base for the Power Query activity. Either the functionality of parsinga malformed CSV file needs to be corrected for the Spark Engine that runs the PowerQuery script or a pre-processing step needs to be added to look for malformed filesand display a more user-friendly error message.

Product Team Update

I want to thank the Microsoft product team for getting back to me on this issue.There are some advanced settings on the source data set when you specify the namein the Power Query Designer. Please see above image in last section that dealswith the source data set. Use the settings icon to bring up this option box.

Power Query to build code free transformations in Azure Data Factory (35)

The first setting allows for multiple rows in the delimited data file as longas the text fields that contains the row delimiters is quoted. The secondsetting allows for two additional actions that can be use to post-process the file.Unfortunately, these settings are notdocumented online. I did a test after picking the above options.The pipeline completed successfully without any issues.

Copy ADLS file to SQL table

The third leg of the data flow (information journey) is to read the source datafrom the CSV file in the refined zone of the data lake and write this informationto an Azure SQL Server Table. Please drag over a copy activity over to thecanvas, name the object properly, add a valid description, set the time out valueand choose your retry settings. The ultimate name of the pipeline will bePL_COPY_CSVFILE_2_ASQL_TABLE.

Power Query to build code free transformations in Azure Data Factory (36)

The next step is to fill out the details about the source data set. I amgoing to hard code the output path. However, we can see a wild card is beingused to find the CSV file. This is due to the fact that the Spark Engine writesout multiple files. The CSV file can be found with the following search pattern"*.csv".

Power Query to build code free transformations in Azure Data Factory (37)

The last step is to fill out the details about the destination data set.Make sure you choose the [stage].[superbowl] table as the destinationlocation. We need to use a pre-copy script to truncate the destination tableto prevent duplicates.

Power Query to build code free transformations in Azure Data Factory (38)

We can use SQL Server Management Studio to verify the contents of the targettable.

Power Query to build code free transformations in Azure Data Factory (39)

Redesign and unit test

We can see that using a CSV file format for an intermediate file staging is problematicsince it is a weak file format. A better solution is to use the Apache Parquetfile format which is considered a strong file format. The following componentswere used in the final solution.

NameDescription
LS_WEB_TABLE_WIKIPEDIAConnection information for Web Page.
LS_ADLS2_DATALAKE_STORAGEConnection information for Azure Data Lake.
LS_ASQL_TIPS_2022Connection information for Azure SQL Database.
DS_WEB_TABLE_SUPERBOWLSSource Web Table.
DS_ADLS2_BRONZE_PARQUET_FILERaw dataset. Parquet file format.
DS_ADLS2_SILVER_PARQUET_FILERefined dataset. Parquet file format.
DS_ASQL_TABLE_SUPERBOWLSDestination Azure SQL Server table.
ACT_MT_CPY_WEB_TABLE_2_ADLS_FILECopy activity for stage 1 of data journey.
ACT_GEN_DEL_SILVER_DIRRemove refined folder to prevent duplicates.
ACT_PQ_FLOW_SUPERBOWLExecute Power Query for stage 2 of data journey.
PQ_CLEAN_SUPERBOWL_DATAActual Power Query
ACT_MT_ADLS_FILE_2_ASQL_TABLECopy activity for stage 3 of the data journey
PL_COPY_N_CLEAN_SUPERBOWL_DATAOverall ADF pipeline object.

If we execute the pipeline program, we see a successful execution.

Power Query to build code free transformations in Azure Data Factory (40)

As a developer, I always want to verify the output of any program that is executedduring testing. The image below shows the delimited file was created in thecorrect BRONZE folder. This is the raw data before transformation by the PowerQuery.

Power Query to build code free transformations in Azure Data Factory (41)

We can verify that the files were created in the SILVER folder using the AzurePortal. The four files created by the Spark Engine are show below.

Power Query to build code free transformations in Azure Data Factory (42)

Finally, we can verify if the data in the target table is correct. If youlook closely at the winning and losing team columns, you can see additional informationat the end. I fixed the files manually when we were prototyping the program.Now, we have to add code to fix this issue. The root cause of the issue isthe web site page that superscripts the name of the teams for which division theyare in.

Power Query to build code free transformations in Azure Data Factory (43)

The best way to fix this is tomanually create a translation table within Power Query that has old and newvalues. This table can be joined at the end of the script to provide a lookupand replace value task. I leave this exercise for you, the reader to accomplish.

Summary

I really like the fact that wrangling data flows allow the citizen developerto transform data using a code free environment. The preview release of thisproduct was in 2019 and a general acceptance release was available in October of2021. However, the graphical interface that is shared with Power Query Onlinemust be modified before wide acceptance is gained. Otherwise, end users willget in trouble by creating scripts that will not be able to be executed.

Today, we found out that the CSV file format is weak in nature. Any charactersthat represent a column or row delimiter will cause issues with the Spark Enginewhich interprets the generated M language script. There are some advancedsource data set options that can fix this issue. While the Excel file formatis better than CSV, we can’t generate this file format as a destination inAzure Data Factory Pipeline. Additionally, it does have a limit on the numberof rows of data that can be stored. The best file format is Apache Parquetwhich is considered strong. This format allows for compression, is binaryin nature and does not have a limit on the number of rows.

In summary, I look forward to when the Microsoft Product team updates the PowerQuery GUI (Graphical Interface) so that transformations that are not valid withwrangling data flows are not shown to the end user. If you are familiar withthe M language, you will have no problems creating transformations that adhere tothe restrictions outlined in the documentation. Have fun transforming datawith wrangling data flows!

Next Steps
  • Check out these otherAzure Data Factory Articles
About the author

John Miner is a Data Architect at Insight Digital Innovation helping corporations solve their business needs with various data platform solutions.

View all my tips

Article Last Updated: 2022-02-10

FAQs

Power Query to build code free transformations in Azure Data Factory? ›

Power Query in Azure Data Factory enables cloud-scale data wrangling, which allows you to do code-free data preparation at cloud scale iteratively. Data wrangling integrates with Power Query Online and makes Power Query M functions available for data wrangling at cloud scale via spark execution.

Can Azure Data Factory do transformations? ›

Azure Data Factory supports the following data transformation activities and the compute environments for the activities. The transformation activities can be added to pipelines either individually or chained with another activity.

What is the use of Power Query in Azure Data Factory? ›

The Power Query activity allows you to build and execute Power Query mash-ups to execute data wrangling at scale in a Data Factory pipeline. You can create a new Power Query mash-up from the New resources menu option or by adding a Power Activity to your pipeline.

Can we write code in Azure Data Factory? ›

With the changes introduced in the Data Factory V2 and Synapse pipeline Custom Activity, you can write your custom code logic in your preferred language and execute it on Windows and Linux Operation Systems supported by Azure Batch.

Is Azure Data Factory code-free? ›

Code-free Data Flow – Azure Data Factory enables any developer to accelerate the development of data transformations with code-free data flows. By using the ADF Studio, any developer can design data transformation without writing any code.

Which 3 types of activities can you run in Microsoft Azure Data Factory? ›

Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.

Is Azure Data Factory obsolete? ›

Azure Data Factory version 1 is deprecated by 31 August 2023 #73.

Is Power Query used for ETL? ›

What is Power Query? Power Query is a data preparation and transformation ETL engine that lets you connect to various data sources. Power Query is available in Microsoft Excel, Power BI, Power BI dataflow, Azure data factory wrangling dataflows, SQL Server Analysis Services, and much more.

What is Power Query best for? ›

Transforming data

The core purpose of Power Query is to transform data. This means modifying the layout, removing errors, and generally making the data more usable for analysis. Common data transformations performed in Power Query include: Pivot / Unpivot.

When should I use Power Query? ›

You'll want to use Power Query when:
  1. Processing large amounts of data (where Excel is limited to one million rows)
  2. Importing data from different sources (e.g. Excel, CSV, SharePoint, SQL, and Salesforce)
  3. Manipulating, cleaning and combining several tables, files or even folders!
Mar 28, 2022

Is Azure Data Factory an ETL tool? ›

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale.

Can we run Python code in Azure Data Factory? ›

Under Factory Resources, select the + icon, and then select Pipeline. In the Properties pane on the right, change the name of the pipeline to Run Python.

Is Azure Data Factory an ETL? ›

Introducing the Azure Data Factory

It provides a rich set of tools and capabilities to build ETL pipelines, allowing you to extract data from diverse sources, transform it using powerful data transformation activities, and load it into target systems or data warehouses.

Can I practice Azure for free? ›

Microsoft Azure

Continue your learning journey with Microsoft Virtual Training Days that provide free, instructor-led, technical skilling in multiple languages and time zones across a variety of topics.

Which of these Azure data services are free for unlimited usage? ›

Take advantage of free products
Azure serviceDescriptionFree period
IoT EdgeExtend cloud intelligence and analytics to IoT edge devices.Always
IoT HubConnect, monitor, and manage IoT assets with a scalable platform.Always
Key VaultSafeguard and maintain control of keys and other secrets.12 months
65 more rows

Is Azure free forever? ›

We'll never charge you unless you decide to move to pay-as-you-go pricing. If you move to pay as you go, you'll only be charged for services that you use above the free monthly amounts. You can check your usage of free services in the Azure portal.

What are the limitations of Azure Data Factory? ›

Version 2
ResourceDefault limitMaximum limit
Bytes per object for dataset and linked service objects3100 KB2,000 KB
Bytes per payload for each activity run4896 KB896 KB
Data Integration Units1 per copy activity run256256
Write API calls1,200/h1,200/h This limit is imposed by Azure Resource Manager, not Azure Data Factory.
27 more rows

What is select transformation in Azure Data Factory? ›

In a select transformation, users can specify fixed mappings, use patterns to do rule-based mapping, or enable auto mapping. Fixed and rule-based mappings can both be used within the same select transformation. If a column doesn't match one of the defined mappings, it will be dropped.

What is data transformation in ADF? ›

With ADF, users are able to easily create and manage data pipelines that incorporate data transformation tasks, such as data conversion, mapping, and cleansing, using an intuitive and easy-to-use interface, which provides a quick and easy method for data transformation tasks.

What can Azure Data Factory do? ›

Data Factory provides a data integration and transformation layer that works across your digital transformation initiatives. Enable citizen integrators and data engineers to drive business and IT-led Analytics/BI. Prepare data, construct ETL and ELT processes, and orchestrate and monitor pipelines code-free.

Top Articles
Latest Posts
Article information

Author: Errol Quitzon

Last Updated: 12/01/2023

Views: 5949

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.