Chapter 8 Cloud Storage

Chapter 8. Cloud Storage: object storage

  • What is object storage?

  • What is Cloud Storage?

  • Interacting with Cloud Storage

  • Access control and lifecycle configuration

  • a globally unique name

  • Deciding whether Cloud Storage is a good fit

 

An application that involves storing an image.

Where to put that photo.?

You went with the easiest place: right in your database or on your local filesystem.

 Object storage services aim to solve.

Primary design goal of these systems is to reduce complexity of the underlying disks and data centers and instead provide a simple API for uploading and retrieving files.


Google Cloud Storage is the default object storage system on Google Cloud Platform (GCP).


8.1. Concepts



Key-value storage for large values with automatic replication and caching around the world.


8.1.1. Buckets and objects


Bucket as a container that stores your data.


The bucket has a globally unique name.


Geographical location and the storage class.


Think of buckets as “disks,” “disk” is extraordinarily large.


Replicated and spread across many physical disks to maintain high levels of durability and availability.


Object storage tends to be one of the most common and most standardized.


Bucket as a container that stores your data.


Each file in the bucket must not be larger than 5 terabytes


A bucket itself is replicated and spread across many physical disks to maintain high levels of durability and availability.


Objects are the files that you put inside a bucket 


Have a unique name inside the bucket, and as on typical file systems, slashes (/) are treated specially so that you can browse directories .




Locations
Buckets can have locations

Buckets exist either at the regional level  or spread across multiple regions.


Concerned about latency between your VMs and your data on GCS, you might want to choose a specific region.


Multiregional bucket to ensure data is always closest to where you or your customers are.


VMs can only exist in a single place, but data can be copied and live in multiple places simultaneously.


Create a multiregional bucket (for example, set the location to “United States”). A multiregional bucket is by definition replicated across several regions.


Latency between your VMs and your data on GCS, you might want to choose a specific region (for example, us-east1) for your data. 


Latency is the delay between a user's action and a web application's response to that action, often referred to in networking terms as the total round trip time it takes for a data packet to travel.


If you make a mistake and put a bucket far away from your VMs, you’ll end up paying a premium for reading your data due to cross-region network transfer fees.

8.2. Storing data in Cloud Storage

First you have to create a bucket.


Bucket names need to be globally unique.


Select storage then browser




+Create bucket


Use a globally unique name




e.g. uconnstamford












Cloud Storage currently has a separate command-line tool called gsutil. Even though it’s under a different command, it’s still installed and updated with the Cloud SDK.






create bucket




Cloud Storage currently has a separate command-line tool called gsutil


john_iacovacci1@cloudshell:~ (uconn-engr)$ gsutil ls

gs://staging.uconn-engr.appspot.com/

gs://uconn-engr.appspot.com/

gs://uconnstamford/

gs://us.artifacts.uconn-engr.appspot.com/

john_iacovacci1@cloudshell:~ (uconn-engr)$


Now upload a simple text file with gsutil


create a sample file by using linux echo command and std output redirection


echo "This is my first file!" > my_first_file.txt


cat my_first_file.txt


john_iacovacci1@cloudshell:~ (uconn-engr)$ cat my_first_file.txt

This is my first file!

john_iacovacci1@cloudshell:~ (uconn-engr)$



gsutil cp my_first_file.txt gs://uconnstamford6/


john_iacovacci1@cloudshell:~ (uconn-engr)$

gsutil cp my_first_file.txt gs://uconnstamford/

Copying file://my_first_file.txt [Content-Type=text/plain]...

/ [1 files][   23.0 B/   23.0 B]

Operation completed over 1 objects/23.0 B.

john_iacovacci1@cloudshell:~ (uconn-engr)$




The file (called an object in this context) made its way into your newly created bucket.


npm install @google-cloud/storage@0.2.0



8.3. Choosing the right storage class


Cloud Storage offers different types of buckets that you can configure in Cloud Storage.


Storage classes come with different performance characteristics (both latency and availability), as well as different prices. 

8.3.1. Multiregional storage


Multiregional storage is the most commonly used option and the one likely to fit the needs of most applications. The flip side is that it’s also the most expensive of the options available because it replicates data across several regions inside the chosen location.


8.3.2. Regional storage

The regional storage class is like a slimmed-down version of the multiregional storage class. Instead of replicating data across several regions inside an area (for example, “United States”), this class replicates the data across different zones inside a single region .


Lower availability, and latency to destinations far away from the region.

8.3.3. Nearline storage

Match the data archival use case by making a few key trade-offs that you shouldn’t notice if you’re using the data as intended.


8.3.4. Coldline storage

Extreme end of the data-archival spectrum.


Primarily in the case of a serious disaster.


All transaction logs for the past year. That data would be a much better fit for the Coldline storage.


Where you have data that you rarely need.



8.4. Access control


How to control who’s able to access or modify the data after it’s stored.


8.4.1. Limiting access with ACLs


Interacting with your data while authorized as a service account.



When you want to allow others to access your data?


How do you restrict who can do what?



Everything you create is locked down to be accessible by only those people who have access to your project.


Add someone else for other parts of your project, they also will have access to your data in Cloud Storage.



Cloud Storage allows fine-grained access control of your buckets and objects through a security mechanism called Access Control Lists (ACLs).


Table 8.2. Description of roles for Cloud Storage

ACL for your bucket in the Cloud Console. You can do this by clicking the vertical three-dot button on the far right in your list of buckets and selecting Edit bucket permissions.



You control access to your objects by assigning these roles to different actors.


Clicking the vertical three-dot button on the far right in your list of buckets.


Edit bucket permissions.


To set bucket for public access.



Edit bucket permissions





Adding access to a specific user means they’ll need to log in with Google’s traditional login, so they’ll need to have a Google account.

In addition to adding access to individuals, Cloud Storage also allows you to control access based on a few other things:

  • User allUsers, as you might expect, refers to anyone. If you give Reader access to the allUsers user entity, the resource will be readable by anyone who asks for it.

  • User allAuthenticatedUsers is similar to allUsers, but refers to anyone who’s logged in with their Google account.

  • Groups (for example, mygroup@googlegroups.com) refer to all members of a specific Google Group. This allows you to grant access once and then control further access based on group membership.

  • Domains (for example, mydomain.com) refer to a Google Apps managed domain name. If you use Google Apps, this is a quick way to limit access to only those who are registered as users in your domain.

Default object ACLs

In addition to granting permissions on both buckets and objects, Cloud Storage allows you to decide up front what ACLs should be set on newly created objects in the form of a bucket’s default object ACLs.

Predefined ACLs

As you might expect, a few common scenarios entail quite a bit of clicking (or typing) to get configured. To make this easier, Cloud Storage has predefined ACLs that you can set using the gsutil command-line tool.



ACL best practices

Now that you understand quite a bit about ACLs, it seems useful to spend a bit of time describing a few best practices of how to manage ACLs and choose the right permissions for your buckets and objects. 

When in doubt, give the minimum access possible. This is a general security guideline but definitely relevant to controlling access to your data on Cloud Storage.


The Owner permission is powerful, so be careful with it. Owners can change ACLs and metadata, which means that unless you trust someone to grant further access appropriately, you shouldn’t give them the Owner permission.


Allowing access to the public is a big deal, so do it sparingly. It’s been said before that after something is on the internet, it’s there forever. 


Default ACLs happen automatically, so choose sensible defaults. 


8.4.2. Signed URLs


It turns out that sometimes you don’t want to add someone to the ACL forever, but rather want to give someone access for a fixed amount of time.


Signed URLs take an intent to do an operation (for example, download a file) and sign that intent with a credential that has access to do the operation. This allows someone with no access at all to present this one-time pass as their credential to do exactly what the pass says they can do.


you’ll need a private key, so jump over to the IAM & Admin section, and select Service accounts.



Create a new service account, making sure to have Google generate a new private key in JSON format



No comments:

Post a Comment

Office hours tomorrow(Tuesday) 5:00pm-6:00pm, 4/26/2021, 5:13 PM, English, 4/26/2021, 5:13 PM

Your assigned language is: English Classroom blog: googleclouduconn.blogspot.com 4/26/2021, 5:13 PM Office hours tomorrow(Tuesday) 5...