How to use Wrangling data flows in Azure Data Factory (2024)

Data transformation is an essential step in the data engineering process. In order to prepare raw data for analysis, it must be cleaned, shaped, and structured in order to be ready for analysis. As far as transforming data in Azure Data Factory is concerned, there are two main approaches: Mapping data flows and Wrangling data flows. The purpose of this article is to demonstrate how we can use Wrangling data flows to transform data in a simple and logical manner.

The Wrangling data flows tool in Azure Data Factory is a tool that allows you to clean, transform, reshape, and re-purpose your data with the help of an interactive, code-free user interface. The program provides you with the ability to perform data wrangling operations on a variety of data sources, including delimited text files, JSON files, and Excel spreadsheets.

If you like to get the big picture of Data Transformation in Azure Data Factory, check out Data Transformation in Azure Data Factory – An overview

Before we dive into wrangling data flows in Azure Data Factory

Before diving deeper, it is important to understand some key concepts and terms that we will be using in this article. In wrangling data flows, you can use a wide range of data transformations to manipulate your data, such as:

Filter: This transformation filters rows based on a condition.
Split: This transformation splits a column into multiple columns based on a delimiter.
Extract: This transformation extracts a sub-string from a column based on a pattern.
Replace: This transformation replaces a value in a column with another value.
Merge: This transformation merges multiple columns into one column based on a separator.
Pivot: This transformation pivots data from a long format to a wide format.
Un-pivot: This transformation un-pivots data from a wide format to a long format.

Now let us see step-by-step on how to use Wrangling data flow to transform data in Azure Data Factory.

1. Create a Wrangling data flow

The first step is to create a Wrangling data flow. To do this, follow these steps:

Log in to your Azure portal and navigate to your Data Factory instance – https://azure.microsoft.com
Click on the “Author & Monitor” button to open the Data Factory authoring interface.
Click on the “Author” tab to open the authoring workspace.
In the left-hand panel, click on “Data flows” and then click on the “New data flow” button.
Give your data flow a name and select the data source you want to use.
Click on the “Create” button to create your data flow.

2. Add a data source

The next step is to add a data source to your data flow. To do this, follow these steps:

In the data flow canvas, click on the “Source” icon in the toolbar on the left.
In the “Source settings” pane on the right, click on the “New source” button.
Select the type of source you want to use from the list of available sources. For example, if you want to use a CSV file, select “Delimited text” from the list.
Enter the connection information for your source. This will depend on the type of source you are using.
- Instance: If you are using a CSV file, you will need to enter the file path and delimiter.
Once you have entered the connection information, click on the “OK” button to add the source to your data flow.

3. Explore and clean your data

The next step is to explore and clean your data using the Wrangling Data Flows UI. The UI provides a rich set of features to explore and clean your data, including:

Data preview: View a sample of your data to see how it looks.
Column profile: View statistics and metadata for each column in your data.
Data transformations: Apply transformations to your data using a drag-and-drop interface.
Data type conversion: Convert data types for columns.
Missing value imputation: Impute missing values using various techniques.
Outlier detection: Identify outliers in your data.
Data sampling: Sample your data to test transformations.

Use the Wrangling Data Flows UI to explore and clean your data according to your business requirements.

4. Create transformations

Once you have explored and cleaned your data, the next step is to create transformations to shape and structure your data. To do this, follow these steps:

In the data flow canvas, select the column you want to transform.
Click on the “Add column” button in the toolbar.
Select the type of transformation you want to apply from the list of available transformations.
- Instance: If you want to split a column into two columns, select “Split column” from the list.
Configure the transformation settings in the pane on the right.
- Instance: If you are adding a split column transformation, you will need to specify the delimiter.
Once you have configured the transformation settings, click on the “OK” button to apply the transformation to your data.
Repeat steps 1-5 for each transformation you want to apply to your data.

5. Save and run your data flow

Once you have created your transformations, the final step is to save and run your data flow. To do this, follow these steps:

Click on the “Publish all” button in the toolbar to save your data flow.
Click on the “Debug” button in the toolbar to run your data flow.
In the “Debug settings” pane on the right, select the data integration runtime you want to use to run your data flow.
Click on the “Create” button to create a new debug session.
Wait for the debug session to start running. You can monitor the progress of your data flow in the “Output” pane on the right.
Once your data flow has finished running, you can view the output in the “Data preview” pane on the right.

Additional tips to use wrangling data flows in Azure Data Factory

Here are some additional details and tips to keep in mind when using wrangling data flows in Azure Data Factory:

The Wrangling data flows tool supports the concept of data profiling, which allows you to analyze the quality and characteristics of your data, such as how it is distributed, the type of data, and the pattern in your data.
The use of dynamic expressions and parameters in wrangling your data flows can help you create a more flexible and customizable process of data transformation.
To reduce duplication and improve consistency, you can reuse the wrangling of data flows across several pipelines and data factories.
Whenever you design your wrangling data flow, it is important to take into account the complexity and size of your data, as well as the resources required to process it. With Azure Monitor, you are able to monitor and optimize the performance and cost of the data preparation process for your business.
In conjunction with other Azure services, such as Azure Databricks and Azure Machine Learning, Wrangling data flow can be used to perform advanced analytics and machine learning tasks on your prepared data using Azure Databricks and Azure Machine Learning.
When using the “Add to Data Flow” feature, you can add data transformations from your wrangling data flow to your mapping data flow in order to create a new mapping data flow.

The data flow lineage capabilities of Wrangling data flows allow you to track how your data has been transformed, from the point of origin to the point of destination, as well as what changes have been made to it along the way.

Conclusion

Wrangling data flows provides a powerful way to transform data in Azure Data Factory by providing a powerful way to transform data. The Wrangling Data Flows UI provides a series of features which will help you explore, clean, and transform your data in a variety of ways. The steps in this article will help you to gain a better understanding of how to use Wrangling data flows to transform your data by following the steps in the article. With this knowledge, you can now start using Wrangling data flows to transform your data in Azure Data Factory.

As an expert and enthusiast, I have access to a vast amount of information and can provide insights on various topics. Regarding the concepts used in the article you provided, here is some information related to data transformation in Azure Data Factory:

Data Transformation in Azure Data Factory

Data transformation is a crucial step in the data engineering process. It involves cleaning, shaping, and structuring raw data to make it ready for analysis. In Azure Data Factory, there are two main approaches to transforming data: Mapping data flows and Wrangling data flows.

Mapping Data Flows

Mapping data flows in Azure Data Factory allow you to visually build data transformation logic using a code-free interface. It provides a set of transformations and activities that you can use to manipulate your data. Some of the available transformations include filtering rows, aggregating data, joining data, and applying expressions.

Wrangling Data Flows

Wrangling data flows, on the other hand, provide an interactive, code-free user interface to clean, transform, reshape, and re-purpose your data. It offers a wide range of data transformations that you can apply to your data sources, such as filtering rows based on a condition, splitting a column into multiple columns, extracting substrings, replacing values, merging columns, pivoting data, and un-pivoting data.

Using Wrangling Data Flows in Azure Data Factory

To use Wrangling data flows in Azure Data Factory, you can follow these steps:

Create a Wrangling data flow:
- Log in to your Azure portal and navigate to your Data Factory instance.
- Click on the "Author & Monitor" button to open the Data Factory authoring interface.
- Go to the "Author" tab and click on "Data flows" in the left-hand panel.
- Click on the "New data flow" button and give it a name.
- Select the data source you want to use and click on the "Create" button.
Add a data source:
- In the data flow canvas, click on the "Source" icon in the toolbar.
- In the "Source settings" pane, click on the "New source" button.
- Select the type of source you want to use (e.g., delimited text, JSON, Excel).
- Enter the connection information for your source and click on the "OK" button.
Explore and clean your data:
- Use the Wrangling Data Flows UI to explore and clean your data.
- Preview your data, view column profiles, apply transformations, convert data types, impute missing values, detect outliers, and sample your data.
Create transformations:
- Select the column you want to transform in the data flow canvas.
- Click on the "Add column" button in the toolbar.
- Choose the type of transformation you want to apply (e.g., split column, replace value).
- Configure the transformation settings in the pane on the right.
- Click on the "OK" button to apply the transformation.
Save and run your data flow:
- Click on the "Publish all" button in the toolbar to save your data flow.
- Click on the "Debug" button to run your data flow.
- Select the data integration runtime you want to use.
- Click on the "Create" button to start the debug session.
- Monitor the progress of your data flow in the "Output" pane.
- View the output in the "Data preview" pane.

Additional Tips for Using Wrangling Data Flows

Here are some additional details and tips to keep in mind when using wrangling data flows in Azure Data Factory:

Data profiling: The Wrangling data flows tool supports data profiling, allowing you to analyze the quality and characteristics of your data.
Dynamic expressions and parameters: You can use dynamic expressions and parameters to create a more flexible and customizable data transformation process.
Reusability: You can reuse wrangling data flows across multiple pipelines and data factories to reduce duplication and improve consistency.
Performance and cost optimization: Consider the complexity and size of your data and monitor and optimize the performance and cost of the data preparation process using Azure Monitor.
Integration with other Azure services: Wrangling data flows can be used in conjunction with other Azure services like Azure Databricks and Azure Machine Learning for advanced analytics and machine learning tasks.
Data flow lineage: The data flow lineage capabilities of Wrangling data flows allow you to track how your data has been transformed and what changes have been made to it.

Wrangling data flows provide a powerful way to transform data in Azure Data Factory. By following the steps outlined in the article you provided, you can gain a better understanding of how to use Wrangling data flows to transform your data effectively.

Please let me know if there's anything else I can help you with!