AZURE - DATA FACTORY

 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 https://www.youtube.com/watch?v=i133n5y5DGo

 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 

Azure Data Factory ?

It stores data with the help of data lake storage now any kind of data can be stored in data lake storage, you can Analise the data , you can transform the data  with the help of Pipelines and and you can public the Organized data .


You can visualize the data with third party application such as Apache Apark and Hadoop as well. 

What exactly is Data Factory ?

Data factory falls under the identity domain of services of the Azure Catalogue . And is a cloud based integration service, basically what it does is . It works on your data, it store your data , it orchestrates and automates the movement and transformation of data

It works heavily on the data that you store. 

Lets see the flow of the work

Firstly we have the dataset

Dataset is nothing but the data that you have in the data store. The one that needs to be processed then you need to pass it to through a pipeline . 

Now what does the Pipeline do > The pipeline basically performs an operation on the data that transforms it which could be anything like a data-movement or some data transformation . Now data transformation is possible by some USQL. Stored Procedures or Hive.


After this is done you get an Output data set .


Now this output set will contain data that is in a structured form because it is already been transformed and have been structured in the pipeline stage .  

Then it is given to Linked services such as - Azure Data Lake , Block Storage or SQL

 




Now these store information that is very important when it comes to connecting to an external source . And finally you need a Gateway .

And finally you have the cloud .


Here what happens is that your data can be Analyzed or Visualized with a number of different Analytical softwares like below 


What exactly is Azure Data Lake ?

Data lake is a storage or a file system that is highly scalable , distributed . It is located in cloud and works with multiple analytics frame works , external frameworks such Hadoop, Apache and so on


Lets have a look at it. 

Firslty you have the output dataset , which is data from the Mobile, Video , web , social media and so on


The data from all these devices and this data is sent to the Azure Data Lake Store and then it os provided to external frameworks .


There are two main concepts when it comes to data lake storage . One of them storage and another one is analytics . Now the storage can be of unlimited size it can be of petabytes size . terabyte sizes . It shows a wide verity of data it could structured or unstructured data .


And another concept that it comes to us Data Lake Analytics

There are two examples how analytics works when it comes to Data Lake . When it comes to Analytics you can monitor and Analyze real time data for example the data you are getting from Vehicles or buildings . It can be used to optimize and see  it they works , respond to certain events and generates alerts if something goes wrong. Then you can monitor fraudulent transactions on your credit card or you can identify the geographical position of your card. Perhaps even track how many transactions taken place by your card and so on.

Lets see how we can move a data storage from a SQL database to Blob storage on the cloud .

Steps 

Creating Database using software known as

You need to make a database and for that you need a software known as SSMS



You need to click here to download the file . This is used to create a database that we will be transferring to the blog storage

Now go into dashboard of Microsoft Azure. We need to create a Data Ware-House 

All resource > Create Resource  > Database

 

SQL Data Warehouse 

It is very important for you to remember the names of everything you are going to write in the below screen


Server we need to create a new one .

We will go to SSMS we have already opened here.

Go to the notification go to the resource . So we will go to the SSMS which I have already opened here. 

After that he created a table and inserted two rows into it .

Then he goes to storage accounts in Azure Portal .

 


  This is where the blob storage is going to happen .  

Click on the Blob .

We need to create create a Blob . 

After creating the blob we need to create the Data Factory . Go to service search for data factory  now.


And here we will click Copy Data 



Here we select SQL Data Warehouse .



Now we can select the table which is Demo Tab . 


Now to our destination where

This is what it is doing here .


Establish this connection .

 

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

https://www.youtube.com/watch?v=WBYRl_nEj-8

 Example setup of Data Factory and Data Lake

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Copy Data from Azure SQL to Azure Data Lake :

  •   We need to create a new dataware house in Azure and then 
  •  Then we need to install SSMS (SQL Server Management Studio ) in our system to connect to out Azure SQL database 
  • Create a New server in dataware house ( Configure and deploy the server ) 
  •  Using SSMS connect to server created in azure data warehouse using SQL authentication .

 Hand-On on the Operation we discussed .

 Go to Azure Portal 

Dashboard -- Create a New resource  -- Databases -- SQL Data Warehouse -- Create a New Resource Group .



Create a New server :


In the Data Warehouse we have created : Copy the server name .


And now using SQL Management Studios - connect to the server .


Inside the database ware house we have created then you need to create a table inside it .


Now create a New Data Lake Storage the create a New Data Factory .

Mention the source data source and create linked service for it and then create the destination data source and then create linked service for it .

Creating Storage : Create data link storage .

All service -- Storage -- Data Lake Storage


Create a New Data lake storage 


Now create a data factory - Dashboard -- Data factory 


After the deployment of data factory - click on resource -- After the deployment of data factory - Click on Go to Resource

Click on Author & Monitor 


Now select COPY DATA ,

Assigned the task data with a Name . 

Now select a Source Data Store which is Dataware house



Click New Linked service for this by clicking on the right hand side button .

Which is dataware house 

If you want to connect your On-Premises system . Then you need to select self hosted run time integration .

Since we are not using On-Premises server .

We are using cloud database only . And we have already created our server .

Test the connection .


 And the click on Next -- Now select the dataverse which we have created .

Click the Dataset which we have created .




This is the dataset . click on Next click on the Destination Data Store .

Select Data Lake


Create a New Link service for this  .


Now mention the Azure Data Lake Storage  --


We will see how to create Service principle ID :

Come back to Azure Dashboard -- Azure Active Directory


Click on Registration .




Will show how to create service principle name .

Now this Application ID is our new Client ID

And paste it over here .


Now coming to service principle Key- Go to Certifications & Secret


Click a New client Secret


And this is the password over here


copy this password and copy it over here and test the connection .

The connection is not successful because of access privileges in data link store.

Lets fix it

Storage -- Data Lake Storage --


Click Data Explore


Click on the Access Tab


Add it 


And now check the test 

Now lets connect the 

You can skip this .

Now the file format is 

click next , skip this page

Next page

Now lets see if the copying process is done or not.  you can edit this pipeline or you can Monitor it too. 

Now let see how to load the data from Data Lake to Power BI .

Power BI is a cloud based Analytics service for analyzing or visualizing the data .

Lets discuss the processes in Power BI  : It is used for connecting to your data , for shaping of the data































 

 

 

 




Comments

Popular posts from this blog

Azure : 400 : Sec : 2 : NEW Configure processes and communication

Azure : 104 : Sec: 2 : Azure Concepts

Azure-104 : Sec1 :