Blog Partners Events Support
Search
Language
日本語 Deutsch
English
Pricing
Products›
← Back

Cloudian Products  

HyperStore Object Storage
HyperStore File Services
HyperIQ Observability & Analytics
HyperCare Managed Service
HyperBalance Load Balancer
Product Specifications

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

HyperIQ Observability & Analytics

Watch 2-min Intro

Evaluator Group Webinar

Skills Shortage? Ease the Storage Management Burden.
Watch On-Demand

Scaling Object Storage with Adaptive Data Management

Get White Paper

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

Solutions›
← Back

Solutions  

Data Protection
Hybrid Cloud
Data Observability
Ransomware Protection
Kubernetes
Data Storage Security

 

Sovereign Private Cloud
File Services
Office 365 Backup
AI and Deep Learning

Industries  

Federal Government
State & Local Government
Financial Services
Telecommunications
Manufacturing
Media & Entertainment
Education
Healthcare
Life Sciences
Cloud Service Provider

2021 Enterprise Ransomware Victims Report

Don’t Be a Victim

Scalable S3-Compatible Storage, On-Prem with AWS Outposts

Learn More

Trending Topic: On-Prem S3 for Data Analytics

Watch Webinar

Ransomware 2021: A Conversation with Veeam CISO Gil Vega

Hear His Thoughts

How a Private Cloud Addresses the Kubernetes Storage Challenge

Free White Paper

Data Security & Compliance: 3s Every CIO Should Ask
Ask the Right ??s

5 Things Every MSP Should Know About Sovereign Cloud

Get Free eBook

Satellite Application Catapult Deploys Cloudian for Scalable Storage

Replaces conventional NAS, saves 75%

Read Their Story

On-Demand Webinar

Veeam & Cloudian: Office 365 Backup – It’s Essential

Watch Now

AI Workflows

Learn More

Why the FBI Can’t Stop Cybercrime and How You Can

Register Now

8 Reasons to Choose Cloudian for State & Local Government Data

Get 8 Reasons

Cloudian HyperStore SEC17a-4 Cohasset Assessment Report

Read the Assessment

Hybrid Cloud for Telecom

Learn More

Hybrid Cloud for Manufacturers

Learn More

Tape: Does It Measure Up?

Get Free eBook

Customer Testimonial: University of Leicester

Hear from Mark

Public Health England: Resilient IT Infrastructure for an Uncertain Time

Watch On-Demand

How to Accelerate Genomics Data Analysis Pipelines by 10X

Hear from Weka

How MSPs Can Build Profitable Revenue Streams with Storage Services

Get IDC’s Take

Alliances›
← Back

Technology Partners  

AWS
Commvault
Cribl
Greenplum
HPE
Kasten by Veeam
Lenovo
Microsoft
Red Hat
RNT Rausch

 

Rubrik
Snowflake
Splunk
Teradata
Veeam
Veritas
Vertica
VMware
Weka
View All >

Get Scalable Storage On-Prem for AWS Outposts

Hear from AWS

The Path to the Hybrid World: Amazon S3-Compatible Storage On-Prem for AWS Hybrid Edge

Learn from AWS

Lock Ransomware Out with Commvault & Cloudian

Watch Now

Cribl Stream with Cloudian HyperStore S3 Data Lake

Learn More

Why Object Storage is Best for Advanced Analytics Apps in Greenplum

Explore Solution

Customer Video: NTT Communications

Hear from NTT

How to Store Kasten Backups to Cloudian

Watch Demo

Klik.Solutions Delivers World-Class Backup-as-a-Service with Lenovo & Cloudian

Why They Chose Us

Modernize SQL Server with S3 Data Lake

Find Out How

How to Run Cloudian on OpenShift as a Container

Watch Demo

Immutable Object Storage for European SMBs from RNT Rausch and Cloudian

Learn More

Backup/Archive to Cloudian with Rubrik NAS Cloud Direct

Explore Solution

On-Premises Object Storage for Snowflake Analytics Workloads

Get the Details

Splunk, ClearShark, and Cloudian discuss Federal Industry Storage Trends

Watch Now

Teradata & Cloudian: Modern Data Analytics for Hybrid and Multi-Cloud

Find Out How

1-Step to Data Protection: All You Need to Know About Veeam v12 + Cloudian

Step up to Cloudian

Modernize Your Enterprise Archive Storage with Cloudian and Veritas

Read About It

Unified Analytics Data Lake Platform with Vertica and Cloudian HyperStore

Find Out How

VMware Cloud Providers: Get started in cloud storage, free.

Get Started

Weka + Cloudian: High-Performance, Exabyte-Scalable Storage for AI/ML

Read About It

Customers›
← Back

Customers  

Financial Services
Federal Government
State & Local Government
Healthcare
Higher Education

 

Manufacturing
Media & Entertainment
Retail
Service Providers
Video Surveillance / Digital Evidence

Cloudian Enables Leading Swiss Financial Institution to Retain and Analyze More Big Data

Read Case Study

Indonesian Financial Services Company Replaces NAS With Cloudian

Read Case Study

National Cancer Institute Reduces Cost and Time to Insight with Cloudian

Learn More

US Department of Defense Deploys Cloudian

Read Case Study

State of California Selects Storage-as-a-Service Offering Powered by Cloudian

Learn Why

Public Health England: Resilient IT Infrastructure for an Uncertain Time

Watch On-Demand

Australian Genomic Sequencing Leader Accelerates Research with Cloudian

Learn more

Swiss Education Non-Profit Achieves Scale and Flexibility of Public Cloud On-Prem with Cloudian

Get the Details

Indonesia Ministry of Education Deploys Cloudian Object Storage to Keep Up with Data Growth

Read Case Study

Leading German Paper Company Meets Growing Data Backup Needs with Cloudian

Read Case Study

Vox Media Automates Archive Process to Accelerate Workflow by 10X

Learn More

WGBH Boston Builds a Hybrid Cloud Active Archive With Cloudian HyperStore

Read Case Study

Large German Retailer Consolidates Primary and Secondary Storage to Cloudian

Read Case Study

How a Sovereign Cloud Provider Succeeds in Cloud Storage Services

View On-Demand

IT Service Provider Drives Business Growth with Cloudian-based Offering

Read Case Study

Calcasieu Parish Sheriff Deploys Hybrid Cloud for Digital Evidence Data

Read How

Montebello Bus Lines Mobile Video Surveillance with Cloudian Object Storage

Read Case Study

Resources›
← Back

Resources  

Case Studies
Datasheets
Demos & Videos
On-Demand Webinars
Reports
Solution Briefs
TCO Calculator
Whitepapers

Storage Guides  

Data Backup & Archive
Data Lake
Data Protection
Data Security
Disaster Recovery
Health Data Management
Hybrid Cloud
Kubernetes Storage
Ransomware Data Recovery
Splunk Architecture
VMware Storage
Veeam
S3 Storage
Object Storage
View All >

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Company›
← Back

Company  

About Us
Careers
Leadership Team
Press Releases

 

Customers
In the News
Training & Education
Awards

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Blog Partners Events Press Support
日本語Deutsch
Pricing

Enhancing Object Storage Analytics: Adding Metadata Labels to S3 Images with TensorFlow

Posted by Gary Mirfield on April 17, 2020

Gary Ogasawara
CTO, Cloudian

Object storage is known for its scalability and easy-to-use S3 APIs, but to make that object data useful for analytics, metadata about the objects sometimes needs to be added.  This article describes a case study of adding and then using metadata of S3 objects with Cloudian’s HyperStore Analytics Platform (HAP).  Starting with images stored in HyperStore object storage, we use a TensorFlow machine learning model to identify what’s depicted in the image, then attach those labels to each image as S3 metadata, and finally automatically index and search the object metadata using ElasticSearch and Kibana.

tensorflow pod diagram

INPUT
Unlabeled images stored in HyperStore S3 bucket.

OUTPUT
Images with metadata of labels of what’s in the image stored back in HyperStore and ElasticSearch.

METHOD
Use a TensorFlow deep learning model to determine labels of what’s in the image.
Use HyperStore’s ElasticSearch plugin to make metadata searchable and visualizable.

In an S3 bucket named “images,” we upload about 300 images of common items, including animals, vehicles, and household goods.  Using an object store for a collection of images, it’s very convenient to store a large amount of data easily and economically.

images bucket

HyperStore Analytics Platform (HAP) is a software package composed of Apache Spark, TensorFlow, and optional applications like this image recognition system.  HAP is managed by Kubernetes, and its Pods are typically deployed on the same hardware nodes as HyperStore.  By locating the analytics/computation processing close to the data, HAP with HyperStore takes advantage of the data locality and an edge-hub topology for efficient and timely processing.  It’s fast, with processing as close as possible to where the data is generated; cheap, with minimal network transfer costs for an upload and subsequent downloads, and secure because the data can be kept private and protected.

The image recognition process reads each object from the S3 bucket and calculates the image classifications by applying the TensorFlow model.  The S3 list-objects API is used to iterate over each object in a bucket.  For each object, checks are first done, including confirming the Content-Type is an image and the size is not above a threshold.  The image is then scaled to a fixed size, and the model is executed based on the TensorFlow’s LabelImage class.  The TensorFlow model used for image recognition is the pre-trained Inception 5h that recognizes 1,000 classes of images from ImageNet.

Below are examples of input images and the resulting classification outputs as a label and associated probability after the image recognition process runs.

red fox
2925[main] INFO com.cloudian.hap.LabelImage images/fox.jpg:
red fox (58.52% likely)
kit fox (39.54% likely)
coyote (0.73% likely)
grey fox (0.71% likely)
red wolf (0.25% likely)

image meta data

5095 [main] INFO com.cloudian.hap.LabelImage images/iphone.jpeg:
cellular telephone (41.24% likely)
hand-held computer (40.34% likely)
pay-phone (7.52% likely)
iPod (3.88% likely)
remote control (1.54% likely)

Some configurations to control the classifier:

image classifier

The image labels and their associated probabilities are added to the object using S3 user-defined metadata where the key is the prefix “imgtag_” plus the label (e.g., “red fox”) and the value is the associated probability (e.g., “0.59”).  The label is URL-encoded to ASCII to conform to the metadata key requirements, notably the <SPACE> character is converted to ‘+’.  To update an existing object’s user-defined metadata, the S3 Copy Object API is used with the x-amz-metadata-directive: REPLACE header.  The object and its metadata are now stored in HyperStore S3.  This example with a S3 GET command on bucket “images” and object “fox.jpg” shows the user-defined metadata output:

metadata output
metadata summary

HyperStore has the capability of indexing object metadata in ElasticSearch.  Once in ElasticSearch, Kibana can be used for data exploration.

elasticsearch results

Here’s an example query to find all images where the label “kit fox” has probability greater than 0.4.  The Kibana query is bucketname:images AND userMetadata.imgtab_kit+fox>0.4 that returns 2 objects:

kibana 1

If you don’t care what type of “fox” it is, you can use wildcards in the Kibana query bucketname:images AND userMetadata.imgtag\*fox\*:* that returns 13 objects:

kibana 2

S3 object stores like HyperStore have enabled storing PBs of data, and focus can turn to how to make that data usable for analytics. HAP provides a convenient way to move the compute to the data and, as in this use case, to add metadata to the object data.  In the same spirit, we are developing more use cases to enhance object storage analytics, including processing streaming data and other machine learning tasks.

 

Related Articles:

Object Storage Deployment

Object Storage Ransomware Protection

Click to rate this post!
[Total: 0 Average: 0]

Categories

  • A.I. / Machine Learning
  • Business Continuity
  • Cloud Service Providers
  • Data Analytics
  • Data Backup and Archive
  • Data Lakehouse
  • Data Protection
  • Featured
  • Hybrid and Private Cloud
  • Object Storage
  • Performance
  • Ransomware
  • S3 Storage
  • Security
  • Sovereign Cloud

Get Started With Cloudian Today

Request a Demo

Join a 30 minute demo with a Cloudian expert.

Sign Up

Download a Free Trial

Try Cloudian in your shop. Run on any VM, even your laptop.

Try Now

Pricing

Receive a Cloudian quote and see how much you can save.

Get Pricing

Products

HyperStore Object Storage
HyperStore File Services
HyperIQ Observability & Analytics
HyperCare Managed Service
HyperBalance Load Balancer
Product Specifications

Industries

Federal Government
State & Local Government
Financial Services
Telecommunications
Manufacturing
Media & Entertainment
Education
Healthcare
Life Sciences
Cloud Service Provider

Storage Guides

Customers

Financial Services
Federal Government
Government
Healthcare
Higher Education
Manufacturing
Media & Entertainment
Retail
Service Providers
Video Surveillance / Digital Evidence
©2025 All Right Reserved. Privacy Policy
Pricing
Contact Us

Please note that on our website we use cookies necessary for the functioning of our website and performance optimization. To learn more about cookies and how we use them, please read our Cookie Policy

Go to mobile version
Cloudian
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.