What is the S3 API?
Amazon Simple Storage Service (Amazon S3) is an elastically-scalable public cloud storage service. It was the first service introduced by AWS in 2006. S3 is based on object storage technology—it stores information as objects, which can contain any type of data, together with metadata that can help identify it. S3 storage does not have a hierarchical structure like a file system does. Instead, it organizes using a flat hierarchy of “buckets”—each bucket with a unique identifier that lets you find and access it.
Customers and end users can interact with Amazon S3 through APIs, including REST and SOAP interfaces, which help make S3 flexible and language-neutral. The API provides the ability to programmatically store, retrieve, list, delete, and move objects in S3 buckets.
In this article, you will learn:
- S3 API Common Request Headers
- S3 API Common Response Headers
- Amazon S3 API Common Actions
- Authenticating Requests with AWS Signature Version 4
- Amazon S3 API Code Examples
This is part of an extensive series of articles about S3 Storage.
S3 API Common Request Headers
An Amazon S3 request typically contains the following headers:
- Authorization—information needed to authenticate the request, depending on your signing method (see the Authenticating Requests section below)
- Content-Length—this is required for PUT operations and any other operation that loads XML, for example access control lists (ACLs) and logging. Supply the length of the message without header as defined in RFC 2616.
- Content-Type—type of resource, needed if the request content is included in the body of the message.
- Content-MD5—base64-encoded 128-bit MD5 message digest, as specified in RFC 1864. This can be used as an integrity check, to verify the data received is the same as the data sent.
- Expect—use this header when using 100-continue and including a request body. This header tells the API endpoint that if the client receives acknowledgement, it will also send the body of the request.
S3 API Common Response Headers
Here are response headers that are common to most AWS S3 responses.
Name | Description | Data Type | Default Value |
Content-Length | Length of response body in bytes. | String | N/A |
Content-Type | MIME type of request content. For example: Content-Type: text/html; charset=utf-8 |
String | N/A |
Connection | Notifies the client if server connection is currently open. Values can be open or close. | Enum | None |
Date | Timestamp of the response from S3 responded, for example: Tue, 01 Jun 2021 14:00:00 GMT |
String | None |
ETag | Represents a version of the object. It reflects changes to the contents of an object, not to the metadata. In some cases, the ETag can be an MD5 digest of object data. | String | N/A |
Server | The name of the server that created the response. | String | AmazonS3 |
Amazon S3 API Common Actions
GetObject
This action is used to retrieve objects from S3. Note that you must first have READ access to an object before you can use GET. To return an object without using an authorization header, you need to grant READ access to an anonymous user.
CopyObject
This action is used to create a copy of an object currently stored in S3. Note that any copy request must be authenticated before it is authorized. You need read access to the source object as well as write access to the destination bucket.
PutObject
This action helps you add an object into your bucket. To do this, you need WRITE permissions on the bucket. However, note that Amazon S3 is designed as a distributed system. This means that when receiving multiple write requests for a single object—simultaneously—the system overwrites all objects except the last one written.
S3 does not offer object locking, but there are ways to introduce this option. You can either build object locking into your application layer or, alternatively, you can use versioning.
HeadObject
This action can retrieve metadata from your object without returning the object itself. It is highly useful if you only need the metadata of the object. Before you can use HEAD, you need to make sure you have READ access.
ListObjects
This option can return some or all—up to 1,000—of the objects inside a single bucket. To return a subset of the objects, you can use the request parameters as selection criteria. You should design your application to parse the contents of a response. The application also needs to know how to appropriately handle responses.
CreateBucket
This option can create new S3 buckets. Keep in mind this requires authenticating requests with an Access Key ID, which you receive when you register with S3. This is because the system does not allow anonymous requests to create buckets. Whoever creates a bucket becomes the owner of the bucket.
GetBucketPolicy
This action can return the policy of a certain bucket. To use this option, make sure that the service making the API call has the permission to get the specified bucket (the IAM permission is called GetBucketPolicy). Keep in mind that you can only use this action is the service making the API call belongs to the same account owner who originally created the bucket.
CreateMultipartUpload
This action can initiate a multipart upload and return an upload ID. The returned upload ID ihelps associate all components in a specific multipart upload.
UploadPart
This action can upload one part of a multipart upload. This first requires starting a multipart upload. S3 will return an upload ID for the multipart operation, which you must include in the upload part request.
See the documentation for a list of all S3 API operations.
Authenticating Requests with AWS Signature Version 4
Interactions with Amazon S3 may be either anonymous or authenticated.
Depending on how you sign your requests, AWS Signature Version 4 offers several benefits:
- Verification of requester’s identity—every request must have a signature to be authenticated. The signature is created using access keys, and a security token is also required when using temporary credentials.
- Protection of data in transit—the request signature is calculated using the request elements, to protect the request in transit. When Amazon S3 receives a request, it calculates the signature based on specified request elements, so if a request includes a component that doesn’t match the signature calculation, S3 will reject it.
- Prevention of request reuse—the signed portions of a request remain valid for 15 minutes once the request is sent. If an unauthorized user has access to a signed request, they can modify unsigned portions during this 15 minute window, and the request will still be accepted as valid. To prevent this scenario, you can use HTTPS to send requests, sign request headers and bodies, and set up AWS policies with the s3:x-amz-content-sha256 condition key so that users must sign request bodies.
Authentication data can be expressed using a variety of methods, including:
- HTTP Authorization—the HTTP Authorization header is the standard authentication method for Amazon S3 requests. It is a requirement of any Amazon S3 REST operation, with the exception of browser-based uploads that use POST requests.
- Query strings—the parameters in a query string can be used to express a URL request. The query parameters provide the necessary request information, including authentication data. This is sometimes called a presigned URL, because the URL contains the request signature, and it is useful for embedding clickable links in HTML, which remain valid for up to a week.
- Browser-based uploads—you can use HTTP POST requests to perform browser-based uploads to Amazon S3. A POST request allows you to directly upload content from your browser.
See the documentation for more details about S3 authentication.
Amazon S3 API Code Examples
Here are a few examples showing how to work with the API in two common programming languages—Java and Python. In addition to these languages, S3 provides SDKs for C++, Go, JavaScript, .NET, Node.js, PHP, and Ruby.
AWS SDK for Java
The code below was provided as part of the AWS Java SDK documentation.
To create a bucket, use the createBucket method in the AmazonS3 client. The method returns the new bucket, or an exception if the bucket exists. See the code example below.
To list buckets, use the listBucket method, which returns a list of buckets. See the code example below.
AWS SDK for Python
The following examples were provided as part of the Boto3 documentation. The methods below are provided as part of the S3 Client, Bucket, and Object classes.
To upload files, use one of these methods.
- Use upload_file method, which accepts three parameters: file name, bucket name, and object name. This method is suitable for large files, because it can split them into smaller parts and upload different parts in parallel.
- The upload_fileobj method, which accepts a file object in binary format (not text format).
To download files, use the parallel methods to the ones shown above:
- Use the download_file method, passing the name of the bucket, name of the object to download, and local file name to save to. This is suitable for large files.
- Use the download_fileobj method, to download a writable object, provided in binary mode.
Meet Cloudian: S3-Compatible, Massively Scalable On-Premise Object Storage
Cloudian® HyperStore® is a massive-capacity object storage device that is fully compatible with the Amazon S3 API. It can store up to 1.5 Petabytes in a 4U Chassis device, allowing you to store up to 18 Petabytes in a single data center rack. HyperStore comes with fully redundant power and cooling, and performance features including 1.92TB SSD drives for metadata, and 10Gb Ethernet ports for fast data transfer.
HyperStore is an object storage solution you can plug in and start using with no complex deployment. It also offers advanced data protection features, supporting use cases like compliance, healthcare data storage, disaster recovery, ransomware protection and data lifecycle management.
Learn more about Cloudian® HyperStore®.