Category - AWS

Data Storage in AWS S3 Bucket with Glue
Feb 29, 2024

  In today's data-driven world, efficient storage and management of data are paramount for businesses of all sizes. Nowadays there are multiple data sources available in the market and the need for analytics is vastly increased, hence having a reliable and scalable database management is essential. Amazon Web Services (AWS) offers a robust set of tools for data storage, including the Simple Storage Service (S3), a highly durable and scalable object storage solution, and a fully managed extract, transform, and load (ETL) service known as AWS Glue Script.   In this blog, we'll discuss about process of storing and archiving data using an AWS S3 bucket and an AWS Glue script. We'll explore the benefits of this approach and provide a step-by-step guide to help you set up your data storage and archiving solution.    Setting Up Data Storage and Archiving with AWS S3 and AWS Glue: Let’s learn how to create S3 Bucket & AWS Glue Script.   Step 1) Create an AWS S3 Bucket: Create an S3 bucket in the AWS Management Console. Choose a unique bucket name, Remember to select the appropriate Region & Access settings according to your requirements and configure the necessary permissions. This bucket will be used to store the archived data.    Step 2) Configure Life-cycle Policies: Once the bucket is created, We can manage the Life-cycle & Replication Policies of the stored object. Open the created bucket & go to the management tab. Configure the rules for Life-cycle & Replication Policies as per the requirement. We can define rules to transition objects to different storage classes or delete them after a certain period.   Step 3) Develop an AWS Glue Script: Now we have created a storage system to store the archived data. The next step is to develop an AWS Glue script to perform the necessary ETL operations on your data. This may include extracting data from various sources, transforming it into the desired format, and loading it into your S3 bucket. AWS Glue supports Python as the scripting language for defining ETL jobs, making it flexible and easy to use for developers and data engineers.      Here's a detailed breakdown of how to develop an AWS Glue script: Create a Glue Job: In the AWS Glue console, navigate to the "Jobs" section and click on "Add job". Provide a name for your job and select the IAM role that grants necessary permissions for Glue to access your data sources and write to the S3 bucket. AWS also provides a visual interface that allows users to create, run, and monitor data integration jobs in AWS Glue. It offers a graphical, no-code interface for building AWS Glue jobs with easy steps.   Define Data Sources & Destination: Identify the data sources you'll be working with. These can include various types of data repositories such as relational databases, data lakes, or even streaming data sources like Amazon Kinesis. AWS Glue supports a wide range of data sources, allowing you to extract data from diverse platforms.   We just have to configure the Source, Transformation if needed & the Destination with the proper connection string. For example, For the Relational Database source, we have to provide a JDBC connection of the server OR Data catalog table. Once the connection is successful we can enter the schema & object name & get the data preview in AWS.   After successfully configuring the source & destination location, AWS will automatically generate the ETL script which we can refer to in the script tab.   Write Glue Script:  In the script editor, We can also directly write the Python code that defines our ETL operations. Let's understand it with an example. We will connect the SQL server Database & execute the SP which will transfer the data into a table. From this table we will archive the data into our S3 bucket.   Please find the structure of it below : import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job args = getResolvedOptions(sys.argv, ["JOB_NAME"]) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args["JOB_NAME"], args) from py4j.java_gateway import java_import source_jdbc_conf = glueContext.extract_jdbc_conf('ConnectionName')   java_import(sc._gateway.jvm,"java.sql.Connection") java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData") java_import(sc._gateway.jvm,"java.sql.DriverManager") java_import(sc._gateway.jvm,"java.sql.SQLException") conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url') + ";databaseName=DB_NAME", source_jdbc_conf.get('user'), source_jdbc_conf.get('password')) cstmt = conn.prepareCall("{call dbo.sptoGetthedatandtransferintotable(?)}"); results = cstmt.execute(); # Script generated for node SQL Server table SQLServertable_node1 = glueContext.create_dynamic_frame.from_options(     connection_type="sqlserver",     connection_options={         "useConnectionProperties": "true",         "dbtable": "Data_Table",         "connectionName": "Connection_name"     },     transformation_ctx="SQLServertable_node1", ) # Script generated for node S3 bucket S3bucket_node3 = glueContext.write_dynamic_frame.from_options(     frame=SQLServertable_node1,     connection_type="s3",     format="glueparquet",     connection_options={"path": "s3://s3_newcreatedbucket//"},     format_options={"compression": "snappy"},     transformation_ctx="S3bucket_node3", ) conn.close(); job.commit();   Step 4) Version Control: AWS also provides version controling of the job script through GIT so we can track and manage changes.   Step 5) Run the Glue Job: Once you're satisfied with the script's functionality, you can run the Glue job either on-demand or schedule it to run at specific intervals. AWS Glue will execute the script, extract data from the defined sources, perform transformations, and load the transformed data into the specified S3 bucket.   Step 6) Monitor Job Execution: Monitor the job execution in the AWS Glue console or via AWS Cloud Watch. You can track metrics such as job run time, success/failure status, and resource utilization to ensure that your ETL processes are running smoothly.   After following these steps, You should be able to efficiently store/archive data in S3 Bucket using AWS Glue script. Before wrapping up let's understand what are the benefits of AWS S3 & AWS Glue Script Service.   Benefits of Using AWS S3 and AWS Glue:  Scalability: AWS S3 provides virtually unlimited storage capacity, Individual objects can be up to 5TB in size. Allowing you to scale your storage resources seamlessly as your data grows. Durability: S3 offers 99.999999999% durability for stored objects, this means that if you store 100 billion objects in S3, you will lose one object at most. This ensures that your data is highly resilient and protected against loss. Cost-effectiveness: With AWS S3, you only pay for the storage you use, making it a cost-effective solution for businesses of all sizes. Simplified Management: AWS Glue automates the process of data discovery, transformation, and loading, streamlining the data management process and reducing the need for manual intervention. Integration: Both AWS S3 and AWS Glue seamlessly integrate with other AWS services, such as Amazon RDS, Amazon Redshift, Amazon Athena, and Amazon EMR, allowing you to build comprehensive data pipelines and analytics workflows. Availability: Amazon S3 replicates data across multiple disks, so even if one of them fails, customers can still access their data with no downtime. It Ensures that your data is always available whenever we require it.   So Overall to summarize this blog we learned that, by leveraging AWS S3 and AWS Glue, you can build a robust data storage and archiving solution that is scalable, durable, and cost-effective. Whether you're dealing with large volumes of data or need to automate the process of archiving historical data, AWS provides the tools and services you need to streamline your data management workflows. Start exploring the possibilities today and unlock the full potential of your data with AWS.   Thank you for your visit. Hoping this blog was helpful & you got what you were looking for. Best of Luck

AWS Cognito Login: Easy Setup Tips
Jan 31, 2024

  To set up the AWS Cognito for the registration/login flow, follow these steps: First Flow: User Registration in Cognito1. Install the following NuGet packages in your .NET project:   <PackageReference Include="Amazon.AspNetCore.Identity.Cognito" Version="3.0.1" /> <PackageReference Include="Amazon.Extensions.Configuration.SystemsManager" Version="5.0.0" /> <PackageReference Include="AWSSDK.SecretsManager" Version="3.7.101.27" /> Declare AWS configuration values in appsettings: "Region": "me-south-1", "UserPoolClientId": "UserPoolClientId", "UserPoolClientSecret": "UserPoolClientSecret", "UserPoolId": "me-south-pool"   Additional Configuration Add authentication in program/startup files to enable sign-in with Cognito. 2. Create a CognitoUserPool with a unique ID in the controller: private readonly CognitoUserPool _pool; private readonly CognitoUserManager<CognitoUser> _userManager; var user = _pool.GetUser(registerUserRequest.LoginId); 3.Add user attributes (email, phone number, custom attributes) using user.Attributes.Add().   user.Attributes.Add(CognitoAttribute.Email.AttributeName, registerUserRequest.Email); user.Attributes.Add(CognitoAttribute.PhoneNumber.AttributeName, registerUserRequest.Mobile); user.Attributes.Add("custom:branch_code", registerUserRequest.BranchCode); user.Attributes.Add("custom:preferred_mode", preferedMode); 4. Create the user: cognitoResponse = await _userManager.CreateAsync(user, registerUserRequest.Password); Check cognitoResponse.Succeeded to determine if the user was created successfully.   Second Flow: User Login with Cognito 1.Search for the user in Cognito using the login ID: var cognitoUser = await _userManager.FindByIdAsync(loginUserRequest.LoginId);   2.Set a password for the Cognito model: var authRequest = new InitiateSrpAuthRequest {    Password = loginUserRequest.Password };   3.Use StartWithSrpAuthAsync to get the session ID: var authResponse = await cognitoUser.StartWithSrpAuthAsync(authRequest);   4.Add MFA method and validate using MFA auth if needed. For MFA validation, set the MFA settings in Cognito:v ar authRequest = new RespondToMfaRequest {        SessionID = validateLoginUserRequest.SessionId,        MfaCode = validateLoginUserRequest.Otp,        ChallengeNameType = ChallengeNameType.SMS_MFA }; authResponse = await cognitoUser.RespondToMfaAuthAsync(authRequest);   Extract tokens from Cognito: authResponse.AuthenticationResult.IdToken authResponse.AuthenticationResult.RefreshToken   Forgot Password Flow 1.Search for the user with LoginId in Cognito and call ForgotPasswordAsync: var user = await _userManager.FindByIdAsync(loginUserRequest.LoginId); await user.ForgotPasswordAsync();   2.Optionally, call ConfirmForgotPassword method in Cognito. _userManager.ConfirmForgotPassword(userID, token, newPassword, CancellationToken cancellationToken) Here, understanding AWS Cognito Authentication Methods and Utilizing Them as Needed.  

Quick Tips: Managing Expired Tokens
Jan 01, 2024

Here, I will explain how to restrict users from using expired tokens in a .NET Core application. Token expiration checks are crucial for ensuring the security of your application.   Here's a general outline of how you can achieve this: 1. Configure Token Expiration: When generating a token, such as a JWT, set an expiration time for the token. This is typically done during token creation. For example, when using JWTs, you can specify the expiration claim:   var tokenDescriptor = new SecurityTokenDescriptor {     Expires = DateTime.Now.AddMinutes(30) // Set expiration time }; 2. Token Validation Middleware: Create middleware in your application to validate the token on each request. This middleware should verify the token's expiration time. You can configure this middleware in the startup or program file on the .NET side.   public void Configure(IApplicationBuilder app, IHostingEnvironment env) {     app.UseMiddleware<TokenExpirationMiddleware>(); } 3. Token Expiration Middleware: Develop middleware to validate the token's expiration time. Take note of the following points: ValidateIssuerSigningKey: Set to true, indicating that the system should validate the issuer signing key. IssuerSigningKey: The byte array represents the secret key used for both signing and verifying the JWT token. ValidateIssuer and ValidateAudience: Set to false, indicating that validation of the issuer and audience is skipped. By setting ClockSkew to TimeSpan.Zero, you specify no tolerance for clock differences. If the current time on the server or client is not precisely within the token's validity period, the token is considered expired.      public class TokenExpirationMiddleware     {         private readonly RequestDelegate _next;         public TokenExpirationMiddleware(RequestDelegate next)         {             _next = next;         }         public async Task Invoke(HttpContext context)         {             // Check if the request has a valid token             var token = context.Request.Headers["Authorization"].FirstOrDefault()?.Split(" ").Last();             if (token != null)             {                 var tokenHandler = new JwtSecurityTokenHandler();                 var key = Encoding.ASCII.GetBytes("YourSecretKey"); // Replace with your actual secret key of Issuer                 var tokenValidationParameters = new TokenValidationParameters                 {                     ValidateIssuerSigningKey = true,                     IssuerSigningKey = new SymmetricSecurityKey(key),                     ValidateIssuer = false,                     ValidateAudience = false,                     ClockSkew = TimeSpan.Zero                 };                 try                 {                     // Validate the token                     var principal = tokenHandler.ValidateToken(token, tokenValidationParameters, out var securityToken);                     // Check if the token is expired                     if (securityToken is JwtSecurityToken jwtSecurityToken)                     {                         if (jwtSecurityToken.ValidTo < DateTime.Now)                         {                             // Token is expired                             context.Response.StatusCode = (int)HttpStatusCode.Unauthorized;                             return;                         }                     }                 }                 catch (SecurityTokenException)                 {                     // Token validation failed                     context.Response.StatusCode = (int)HttpStatusCode.Unauthorized;                     return;                 }             }             await _next(context);         }     } Working fine with proper token time. Here is an example: I am providing an expired token, and it will result in a 401 Unauthorized status. You can also check the token in https://jwt.io/ for time expired (exp) . By following these steps, you can effectively implement checks to ensure that users are not able to use expired tokens within your .NET Core application.

Quick Setup: Kafka with ELK Integration
Aug 17, 2020

Apache Kafka is the numerous common buffer solution deployed together with the ELK Stack. Kafka is deployed within the logs delivery and the indexing units, acting as a segregation unit for the data being collected: In this blog, we’ll see how to deploy all the components required to set up a resilient logs pipeline with Apache Kafka and ELK Stack: Filebeat – collects logs and forwards them to a Kafka topic. Kafka – brokers the data flow and queues it. Logstash – aggregates the data from the Kafka topic, processes it and ships to Elasticsearch. Elasticsearch – indexes the data. Kibana – for analyzing the data.   My environment: To perform the steps below, I set up a single Ubuntu 18.04 VM machine on AWS EC2 using local storage. In real-life scenarios, you will probably have all these components running on separate machines. I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Using Apache Access Logs for the pipeline, you can use VPC Flow Logs, ALB Access logs etc. We will start by installing the main component in the stack — Elasticsearch. Login to your Ubuntu system using sudo privileges. For the remote Ubuntu server using ssh to access it. Windows users can use putty or Powershell to log in to Ubuntu system. Elasticsearch requires Java to run on any system. Make sure your system has Java installed by running the following command. This command will show you the current Java version. sudo apt install openjdk-11-jdk-headless Check the installation is successful or not by the below command ~$ java — versionopenjdk 11.0.3 2019–04–16OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing) Finally, I added a new elastic IP address and associated it with the running instance. The example logs used for the tutorial are Apache access logs.   Step 1: Installing Elasticsearch We will start by installing the main component in the stack — Elasticsearch. Since version 7.x, Elasticsearch is bundled with Java so we can jump right ahead with adding Elastic’s signing key: Download and install the public signing key: wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - Now you may need to install the apt-transport-https package on Debian before proceeding: sudo apt-get install apt-transport-https echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list Our next step is to add the repository definition to our system: echo “deb https://artifacts.elastic.co/packages/7.x/apt stable main” | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list You can install the Elasticsearch Debian package with: sudo apt-get update && sudo apt-get install elasticsearch Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml: sudo su nano /etc/elasticsearch/elasticsearch.yml Since we are installing Elasticsearch on AWS, we will bind Elasticsearch to the localhost. Also, we need to define the private IP of our EC2 instance as a master-eligible node: network.host: "localhost" http.port:9200 cluster.initial_master_nodes: ["<InstancePrivateIP"] Save the file and run Elasticsearch with: sudo service elasticsearch start To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (give Elasticsearch a minute or two before you start to worry about not seeing any response): {   "name" : "elasticsearch",   "cluster_name" : "elasticsearch",   "cluster_uuid" : "W_Ky1DL3QL2vgu3sdafyag",   "version" : {     "number" : "7.2.0",     "build_flavor" : "default",     "build_type" : "deb",     "build_hash" : "508c38a",     "build_date" : "2019-06-20T15:54:18.811730Z",     "build_snapshot" : false,     "lucene_version" : "8.0.0",     "minimum_wire_compatibility_version" : "6.8.0",     "minimum_index_compatibility_version" : "6.0.0-beta1"   },   "tagline" : "You Know, for Search" }   Step 2: Installing Logstash Next up, the “L” in ELK — Logstash. Logstash and installing it is easy. Just type the following command. sudo apt-get install logstash -y Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, processes these logs and ships them on to Elasticsearch for indexing. Verify Java is installed: java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) Let’s create a new config file: Since we already defined the repository in the system, all we have to do to install Logstash is run: sudo nano /etc/logstash/conf.d/apache.conf Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, processes these logs, and ships them on to Elasticsearch for indexing. Let’s create a new config file: input {   kafka {     bootstrap_servers => "localhost:9092"     topics => "apache"     } } filter {     grok {       match => { "message" => "%{COMBINEDAPACHELOG}" }     }     date {     match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]     }   geoip {       source => "clientip"     } } output {   elasticsearch {     hosts => ["localhost:9200"]   } } As you can see — we’re using the Logstash Kafka input plugin to define the Kafka host and the topic we want Logstash to pull from. We’re applying some filtering to the logs and we’re shipping the data to our local Elasticsearch instance.   Step 3: Installing Kibana Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana: sudo apt-get install kibana We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined: server.port: 5601 server.host: "<INSTANCE_PRIVATE_IP>" elasticsearch.hosts: ["http://<INSTANCE_PRIVATE_IP>:9200"] Then enable and start the Kibana service: sudo systemctl enable kibana sudo systemctl start kibana We would need to install Firebeat. Use: sudo apt install filebeat   Open up Kibana in your browser with http://<PUBLIC_IP>:5601. You will be presented with the Kibana home page.

Quick Tips: Terraform Infrastructure as Code
Jul 13, 2020

You may have heard infrastructure as code(IaC), But do you know what infrastructure is? Why do we need infrastructure as code? What are the benefits of infrastructure as code? Is it safe and secure?    What is Infrastructure as Code(IoC)? Infrastructure as code (IaC) means to manage and upgrade your environments as infrastructure using configuration files. Terraform provides infrastructure as code for provisioning, compliance, and management across any public cloud, private data center, and third-party service. Enables teams to write, share, manage, and automate any infrastructure using version control With automated policy enforcement for security, compliance, and operational best practices and Enable developers to provision their desired infrastructure from within their workflows. IOC has a high impact on the Business perspective by providing Increased Productivity, Reduced Risk, Reduced Cost   Why do we use Infrastructure as Code(IoC)? Terraform is a simple human-readable configuration language, to define the desired topology of infrastructure resources VCS Integration Write, version, review, and collaborate on Terraform code using your preferred version control system Workspaces Workspaces decompose monolithic infrastructure into smaller components, or "micro-infrastructures". These workspaces can be aligned to teams for role-based access control. Variables Granular variables allow easy reuse of code and enable dynamic changes to scale resources and deploy new versions. Runs Terraform uses two-phased provisioning a plan (dry run) & apply (execution). Plans can be inspected before execution to ensure expected behavior and safety. Infrastructure State The state file is a record of currently provisioned resources. State files enable a versioned history of the infrastructure and are encrypted at rest. Versions can be inspected to see incremental changes. Policy as Code Sentinel is a policy as a code framework to automate multi-cloud governance.   What are the benefits of Infrastructure as Code(IoC)? Infrastructure as Code enables Infrastructure teams to test the applications in staging environments or development environment early - likely in the development cycle Infrastructure as Code Saves You Time and Money We can have a version history like when the infrastructure is upgraded and who has done it from the code itself. Else we have to ask to check the Infrastructure admin to look into logs and which is very time-consuming. We can check it into version control and I get versioning. Now we can see an incremental history of who changed what Use Infrastructure as Code to build update and manage any cloud, infrastructure, or services Terraform makes it easy to re-use configurations for the environment for similar infrastructure, helping you avoid mistakes and save time. We can use the same configuration code for the different staging Production and development environments. Terraform supports many Providers to be built from just a simple and less line of code. Major providers are as follows AWS Azure GitHub GitLab Google Cloud Platform VMWare Docker  and  200+ more. A Simple example to create an Ec2 Instance with just a few lines of code. resource "aws_instance" "ec2_instance" {   ami = "ami-*******"   instance_type = "t2.micro"   vpc_security_group_ids = ["${aws_security_group.*****.id}"]   key_name = "${aws_key_pair.****.id}"   tags {     Name = "New-EC2-Instance"   } } But First, we have to write code for which provider we are writing our code. To do so  here is the simple basic code to assign a provider provider "aws" {   region = "us-west-2"   ## PROVIDE CREDENTIALS } Now to Create your Ec2 Instance in AWS. We have to run the commands. So terraform has Four commands to check and apply the infrastructure changes, Init Plan Apply Destroy.   1. Init $ terraform init We can understand from the name of the command that is used to initialize something. So here terraform will be initialized in our code which will create some basic backend and tfstate files in folders for internal use. 2. Plan $ terraform plan As we do compile in some code languages, it will check for the compilation errors and plan what is going to happen when we run the script to generate infrastructure code. It will show you what resources are going to be created and what will be the configuration. 3. Apply $ terraform apply It is time to run the script and check what is being generated from the scripts. So the command will execute the script and apply the changes in our infrastructure, which will generate some resources for what we have written in the code.  4. Destroy $ terraform destroy This command is used when we want to remove or destroy the resource. After some time we don't need that resource then we just run the command which will destroy the resource. And your money is saved.

magnusminds website loader