Member-only story
Querying Data in S3 Using Amazon S3 Select

We can use Amazon S3 Select to retrieve a subset of data from an S3 object using simple SQL statements. Since Amazon S3 Select filters a subset of data, it will help to reduce the data transfer cost and latency of an application. Amazon S3 select supports data stored in CSV, JSON, or Apache Parquet formats.
Note that Amazon S3 Select has some limitations. If you need a powerful tool, you may try Amazon Athena.
In this post, I am going to build a simple API to fetch data using Amazon S3 Select. You can find the complete project from this link.
Use Case
Let’s say we have a CSV file with employee information such as EmpId, FirstName, LastName, and Salary, and we want to find some salary information of employees. For example, the salary of an employee, the average salary of employees, etc.

Steps
- Creating an Amazon S3 bucket
- Implementing an API using AWS Lambda
- Integrating Amazon API Gateway with the Lambda function
- Testing
Creating an Amazon S3 bucket
In this step, we create an Amazon S3 bucket to store the CSV file. You can find a sample CSV file in the “data” folder. The following code creates an S3 bucket and uploads the sample file.
Implementing an API Using AWS Lambda
In this step, We are going to build a simple API to retrieve data from S3. The Lambda handler runs different SQL by using S3 Select.
CDK code for defining the Lambda funciton