Tag - ETL-Model

Effortless Data Migration Using MySQL Federated Engine
Mar 13, 2024

Scenario: If someone say you Hey, can you transfer one of MySQL data to another MySQL data and we think about SSIS or other Thing if yes then these article made for you to reduce your effort and save your time Introduction: In the dynamic landscape of database management, the need to seamlessly access and integrate data from multiple sources has become paramount. Whether it's consolidating information from disparate servers or synchronizing databases for backup and redundancy, MySQL offers a robust solution through its querying capabilities. In this guide, we delve into the art of fetching data from one MySQL server to another using SQL queries. This method, often overlooked in favor of complex data transfer mechanisms, provides a streamlined approach to data migration, enabling developers and database administrators to efficiently manage their resources. Through a combination of MySQL's versatile querying language and the innovative use of the FEDERATED storage engine, we'll explore how to establish connections between servers, replicate table structures, and effortlessly transfer data across the network. From setting up the environment to executing queries and troubleshooting common challenges, this tutorial equips you with the knowledge and tools to navigate the intricacies of cross-server data retrieval with ease. As we know We gonna use FEDERATED feature of MySQL workbench so first we need to check that our workbench support FEDERATED engine or not?   Simply open workbench and run below code show engines;   It shows all engines and check our system support FEDERATED OR NOT   If your system also not support don't worry we gonna enable it Open your folder where you save MySQL serve file In my case it in my C drive C>ProgramData>MySQL>MySQL Server 8.0>my.ini    open it in notepad++ or preferable software    Insert FEDERATED key word in script like below   Now need to restart MySQL Press Window+R button and paste services.msc press ok> find MySQL and restart it Now go to workbence and run show engines;  code   Now your FEDERATED engine get supported It show like below   Now our system Support FEDERATED engine This same process need to apply on destination side because both server (from source to destination server) need to support FEDERATED engine Now we make sure to we have permission of access source server for that we need to make user and and give permission of database and tables   Below code demonstrate to make user and give permission to user CREATE USER 'hmysql'@'192.168.1.173' IDENTIFIED BY 'Hardik...'; GRANT ALL PRIVILEGES ON *.* TO 'hmysql'@'192.168.1.173' WITH GRANT OPTION; FLUSH PRIVILEGES;   Now make connection of that user(we make above on source side) on destination server(our system)    Click on plus(+) icon as shown in image and fill all detail   Below image is for detail of user connection   After filling details our user added like below image   Go to user(hardikmysql) and find from which table we want to take data using MySQL query    Here i am taking 'actor' table from 'sakila' database which look like below   Now we need to run FEDERATED query on our system(destination server) with url string   Our MySQL query like below CREATE TABLE `actor` ( `actor_id` smallint unsigned NOT NULL AUTO_INCREMENT, `first_name` varchar(45) NOT NULL, `last_name` varchar(45) NOT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`actor_id`), KEY `idx_actor_last_name` (`last_name`) ) ENGINE=FEDERATED default charset=utf8mb4 CONNECTION='mysql://hmysql:[email protected]:3306/sakila/actor';   Here main part is below ENGINE=FEDERATED default charset=utf8mb4 CONNECTION='mysql://hmysql:[email protected]:3306/sakila/actor';   Here 'mysql' is mandatory for connection string you can not use other word. 'hmysql' is user name 'Hardik...'  is password for user '192.168.1.173' is server adderess '3306' is port number 'sakila' is database name 'actor' is table name   Now run above table code and you get data in our system(destination server)    

Azure Databricks CSV to SQL
Apr 14, 2023

In this blog, we will explore Azure Databricks, a cloud-based analytics platform, and how it can be used to parse a CSV file from Azure storage and then store the data in a database. Additionally, we will also learn how to process stream data and use Databricks notebook in Azure Data Pipeline.   Azure Databricks Overview Azure Databricks is an Apache Spark-based analytics platform that provides a collaborative workspace for data scientists, data engineers, and business analysts. It is a cloud-based service that is designed to handle big data and allows users to process data at scale. Databricks also provides tools for data analysis, machine learning, and visualization. With its integration with Azure Storage, Azure Data Factory, and other Azure services, Azure Databricks can be used to build end-to-end data processing pipelines.   Parsing CSV File from Azure BlobStorage to Database using Azure Databricks Azure Databricks can be used to parse CSV files from Azure Storage and then store the data in a database. Here are the steps to accomplish this:   Configure Various Azure Components 1. Create Azure Resource Group Image 1 2. Create Azure DataBricks Resource  Image 2 3. Create SQL Server Resource  Image 3 4. Create SQL Database Resource Image 4 5. Create Azure Storage Account  Image 5 6. Create Azure DataFactory Resource  Image 6 7. Launch Databricks Resource Workspace  Image 7 8. Create Computing Cluster  Image 8 9. Create New Notebook  Image 9   Parsing CSV File from Azure Storage to Database using Azure Databricks Azure Databricks can be used to parse CSV files from Azure Storage and then store the data in a database. Here are the steps to accomplish this: 1. Create a cluster: First, create a cluster in Azure Databricks as above. A cluster is a group of nodes that work together to process data. 2. Import all the necessary models in the databricks notebook  %python from datetime import datetime, timedelta from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions import pandas as pd import pymssql import pyspark.sql Code 1 3. Mount Azure Storage: Next, mount the Azure Storage account in Databricks as follows #Configure Blob Connection storage_account_name = "storage" storage_account_access_key="***********************************" blob_container = "blob-container" Code 2 4. Establish The DataBase Connection #DB connection conn = pymssql.connect(server='****************.database.windows.net', user='*****', password='*****', database='DataBricksDB') cursor = conn.cursor() Code 3 5. Parse CSV file: Once the storage account is mounted, you can parse the CSV file using the following code #get a list of all blob from the container blob_list = [] for blob_i in container_client.list_blobs(): blob_list.append(blob_i.name) # print(blob_list)      df_list = [] #Generate SAS key for each file and load to the dataframe  for blob_i in blob_list:     print(blob_i)     sas_i = generate_blob_sas(account_name = storage_account_name,                              container_name = blob_container,                              blob_name = blob_i,                              account_key = storage_account_access_key,                              permission = BlobSasPermissions(read=True),                              expiry = datetime.utcnow() + timedelta(hours=12))       sas_url = 'https://' + storage_account_name +'.blob.core.windows.net/' + blob_container + '/' +blob_i     print(sas_url)          df=pd.read_csv(sas_url)     df_list.append(df) Code 4 6. Transform and Store data in a database: Finally, you can store the data in a database using the following code #Truncate Table Sales Truncate_Query = "IF EXISTS (SELECT * FROM sysobjects WHERE name='sales' and xtype='U') truncate table sales" cursor.execute(Truncate_Query) conn.commit()   # SQL Query For Table Creation create_table_query = "IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='sales' and xtype='U') CREATE TABLE sales (REGION  varchar(max),COUNTRY  varchar(max),ITEMTYPE  varchar(max),SALESCHANNEL  varchar(max),ORDERPRIORITY  varchar(max),ORDERDATE  varchar(max),ORDERID  varchar(max),SHIPDATE  varchar(max),UNITSSOLD  varchar(max),UNITPRICE  varchar(max),UNITCOST  varchar(max),TOTALREVENUE  varchar(max),TOTALCOST  varchar(max),TOTALPROFIT  varchar(max))IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='sales' and xtype='U') CREATE TABLE sales (REGION  varchar(max),COUNTRY  varchar(max),ITEMTYPE  varchar(max),SALESCHANNEL  varchar(max),ORDERPRIORITY  varchar(max),ORDERDATE  varchar(max),ORDERID  varchar(max),SHIPDATE  varchar(max),UNITSSOLD  varchar(max),UNITPRICE  varchar(max),UNITCOST  varchar(max),TOTALREVENUE  varchar(max),TOTALCOST  varchar(max),TOTALPROFIT  varchar(max))" cursor.execute(create_table_query) conn.commit()   #Insert Data From Main DataFrame for rows in df_combined.itertuples(index=False,name=None):     row = str(list(rows))     row_data = row[1:-1]     row_data = row_data.replace("nan","''")     row_data = row_data.replace("None","''") insert_query = "insert into sales (REGION,COUNTRY,ITEMTYPE,SALESCHANNEL,ORDERPRIORITY,ORDERDATE,ORDERID,SHIPDATE,UNITSSOLD,UNITPRICE,UNITCOST,TOTALREVENUE,TOTALCOST,TOTALPROFIT) values ("+row_data+")"     print(insert_query)     cursor.execute(insert_query) conn.commit() Code 5 As, Shown here The data from all the files is loaded to the SQL server Table Image 10   Azure Databricks notebook can be used to process stream data in Azure Data Pipeline. Here are the steps to accomplish this: 1. Create a Databricks notebook: First, create a Databricks notebook in Azure Databricks. A notebook is a web-based interface for working with code and data. 2. Create a job: Next, create a job in Azure Data Factory to execute the notebook. A job is a collection of tasks that can be scheduled and run automatically. 3. Configure the job: In the job settings, specify the Azure Databricks cluster and notebook that you want to use. Also, specify the input and output datasets. 4. Write the code: In the Databricks notebook, write the code to process the stream data. Here is an example code: #from pyspark.sql.functions import window stream_data = spark.readStream \     .format("csv") \     .option("header", "true") \     .schema("<schema>") \     .load("/mnt/<mount-name>/<file-name>.csv")   stream_data = stream_data \     .withWatermark("timestamp", "10 minutes") \     .groupBy(window("timestamp", "10 Code 6   How To Use Azure Databrick notebook in Azure Data Factory pipeline and configure the DataFlow Pipeline Using it. Image 11 1. Create ADF Pipeline  Image 12 2. Configure Data Pipeline  Image 13 3. Add Trigger To the PipeLine  Image 14 4. Configure the trigger  Image 15   These capabilities make Azure Databricks an ideal platform for building real-time data processing solutions. Overall, Azure Databricks provides a scalable and flexible solution for data processing and analytics, and it's definitely worth exploring if you're working with big data on the Azure platform. With its powerful tools and easy-to-use interface, Azure Databricks is a valuable addition to any data analytics toolkit.