AZURE - DATA FACTORY
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
https://www.youtube.com/watch?v=i133n5y5DGo
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Azure Data Factory ?
It stores data with the help of data lake storage now any kind of data can be stored in data lake storage, you can Analise the data , you can transform the data with the help of Pipelines and and you can public the Organized data .
You can visualize the data with third party application such as Apache Apark and Hadoop as well.
What exactly is Data Factory ?
Data factory falls under the identity domain of services of the Azure Catalogue . And is a cloud based integration service, basically what it does is . It works on your data, it store your data , it orchestrates and automates the movement and transformation of data
It works heavily on the data that you store.
Lets see the flow of the work
Firstly we have the dataset
Dataset is nothing but the data that you have in the data store. The one that needs to be processed then you need to pass it to through a pipeline .
Now what does the Pipeline do > The pipeline basically performs an operation on the data that transforms it which could be anything like a data-movement or some data transformation . Now data transformation is possible by some USQL. Stored Procedures or Hive.
After this is done you get an Output data set .
Now this output set will contain data that is in a structured form because it is already been transformed and have been structured in the pipeline stage .
Then it is given to Linked services such as - Azure Data Lake , Block Storage or SQL
Now these store information that is very important when it comes to connecting to an external source . And finally you need a Gateway .
And finally you have the cloud .
Here what happens is that your data can be Analyzed or Visualized with a number of different Analytical softwares like below
What exactly is Azure Data Lake ?
Data lake is a storage or a file system that is highly scalable , distributed . It is located in cloud and works with multiple analytics frame works , external frameworks such Hadoop, Apache and so on
Lets have a look at it.
Firslty you have the output dataset , which is data from the Mobile, Video , web , social media and so on
The data from all these devices and this data is sent to the Azure Data Lake Store and then it os provided to external frameworks .
There are two main concepts when it comes to data lake storage . One of them storage and another one is analytics . Now the storage can be of unlimited size it can be of petabytes size . terabyte sizes . It shows a wide verity of data it could structured or unstructured data .
And another concept that it comes to us Data Lake Analytics
There are two examples how analytics works when it comes to Data Lake . When it comes to Analytics you can monitor and Analyze real time data for example the data you are getting from Vehicles or buildings . It can be used to optimize and see it they works , respond to certain events and generates alerts if something goes wrong. Then you can monitor fraudulent transactions on your credit card or you can identify the geographical position of your card. Perhaps even track how many transactions taken place by your card and so on.
Lets see how we can move a data storage from a SQL database to Blob storage on the cloud .
Steps
Creating Database using software known as
You need to make a database and for that you need a software known as SSMS
You need to click here to download the file . This is used to create a database that we will be transferring to the blog storage
Now go into dashboard of Microsoft Azure. We need to create a Data Ware-House
All resource > Create Resource > Database
SQL Data Warehouse
It is very important for you to remember the names of everything you are going to write in the below screen
Server we need to create a new one .
We will go to SSMS we have already opened here.
Go to the notification go to the resource . So we will go to the SSMS which I have already opened here.
After that he created a table and inserted two rows into it .
Then he goes to storage accounts in Azure Portal .
This is where the blob storage is going to happen .
Click on the Blob .
We need to create create a Blob .
After creating the blob we need to create the Data Factory . Go to service search for data factory now.
And here we will click Copy Data
Here we select SQL Data Warehouse .
Now we can select the table which is Demo Tab .
Now to our destination where
This is what it is doing here .
Establish this connection .
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
https://www.youtube.com/watch?v=WBYRl_nEj-8
Example setup of Data Factory and Data Lake
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Copy Data from Azure SQL to Azure Data Lake :
- We need to create a new dataware house in Azure and then
- Then we need to install SSMS (SQL Server Management Studio ) in our system to connect to out Azure SQL database
- Create a New server in dataware house ( Configure and deploy the server )
- Using SSMS connect to server created in azure data warehouse using SQL authentication .
Hand-On on the Operation we discussed .
Go to Azure Portal
Dashboard -- Create a New resource -- Databases -- SQL Data Warehouse -- Create a New Resource Group .
Create a New server :
In the Data Warehouse we have created : Copy the server name .
And now using SQL Management Studios - connect to the server .
Inside the database ware house we have created then you need to create a table inside it .
Now create a New Data Lake Storage the create a New Data Factory .
Mention the source data source and create linked service for it and then create the destination data source and then create linked service for it .
Creating Storage : Create data link storage .
All service -- Storage -- Data Lake Storage
Create a New Data lake storage
Now create a data factory - Dashboard -- Data factory
After the deployment of data factory - click on resource -- After the deployment of data factory - Click on Go to Resource
Click on Author & Monitor
Now select COPY DATA ,
Assigned the task data with a Name .
Now select a Source Data Store which is Dataware house
Click New Linked service for this by clicking on the right hand side button .
Which is dataware house
If you want to connect your On-Premises system . Then you need to select self hosted run time integration .
Since we are not using On-Premises server .
We are using cloud database only . And we have already created our server .
Test the connection .
And the click on Next -- Now select the dataverse which we have created .
Click the Dataset which we have created .
This is the dataset . click on Next click on the Destination Data Store .
Select Data Lake
Create a New Link service for this .
Now mention the Azure Data Lake Storage --
We will see how to create Service principle ID :
Come back to Azure Dashboard -- Azure Active Directory
Click on Registration .
Will show how to create service principle name .
Now this Application ID is our new Client ID
And paste it over here .
Now coming to service principle Key- Go to Certifications & Secret
Click a New client Secret
And this is the password over here
copy this password and copy it over here and test the connection .
The connection is not successful because of access privileges in data link store.
Lets fix it
Storage -- Data Lake Storage --
Click Data Explore
Click on the Access Tab
Add it
And now check the test
Now lets connect the
You can skip this .
Now the file format is
click next , skip this page
Next page
Now lets see if the copying process is done or not. you can edit this pipeline or you can Monitor it too.
Now let see how to load the data from Data Lake to Power BI .
Power BI is a cloud based Analytics service for analyzing or visualizing the data .
Lets discuss the processes in Power BI : It is used for connecting to your data , for shaping of the data
Comments
Post a Comment