Image from Google

Ever wondered how fast the recommendation system YouTube has got ?


How accurate Google Maps pin your location or helps search a location ?

Or even

How faster Google search engine returns almost accurate results for your search ?

To be what they are today, these systems need significantly large amount of data so they can run models such as pattern recognition, prediction systems, key value search etc.

Storing this huge data require resilient databases that are highly available, durable, can handle huge data writes/reads (throughput) and is faster.

Isn’t Cloud Spanner the same thing ?

Partly yes, but here are a few “Good to know” stuff when both are compared:


Cloud Spanner under the right circumstances could be a great choice but when chosen without proper investigation, a lot of things can go wrong making it a terrible choice.

Like other migrations, one cannot lift and shift databases on to Spanner.

An organization that uses traditional RDBMS will have certain features such as:

  • Sequences
  • Foreign Keys
  • UDF’s (User defined functions)
  • Triggers
  • Stored procedures
  • ACLs

and a few others implemented already. Migrating these traditional databases to Cloud Spanner requires us to understand whether Spanner offers same features and If not what are the equivalents.

Let us look at a few examples:


Google Cloud Spanner, often termed as NewSQL database for it’s ability to offer features of both relational and non relational database has come into general availability starting June 2017.

Data in Cloud Spanner is redistributed across multiple machines and zones thus by achieving high availability. Spanner has the ability to dynamically split and shuffle data chunks across regions/zones to ensure disaster tolerance.

Here’s top 5 concepts you may want to know before starting with Cloud Spanner:

Split points

Spanner is a distributed database. Which means as the database grows, data gets divided in smaller chunks based on column range. …

Whether you are developing BI dashboards or analyzing data to make critical business decisions, you need to access data from various data marts, data warehouses or transactional databases in order to perform aggregation or derive statistics as necessary for the report.

If you are working with BigQuery as data warehouse and CloudSQL as your RDBMS, you often need to identify a way to access the data for your analytical needs.

One way of doing it is by writing Federated Queries from BigQuery where CloudSQL is treated as an External Source of data and you enclose your query within a function…

Cloud SQL fall under the category of Database as a service (DBAAS). In other words, Cloud SQL is a virtual machine hosted on Google Compute Engine, which runs a MySQL image through which you get to access MySQL server without having to SSH.

Whether you are using MySQL on-prem or on cloud, Database Migration Service (DMS) provides connectivity options such as IP allowlists and VPC peering to migrate your data. Choosing a connectivity method depends on the source DB and where it resides.

Database migration has to happen in phases or waves. Outlining the order of various tasks and allocating…

If you are already familiar with FaaS (Function-as-a-Service) then Cloud Functions need no introduction.

Cloud Functions can be written using Node.js, Python, Go, Java, .NET, and Ruby programming language runtimes.


Source: Trifacta webpage

Cloud Dataprep is a managed service from Google Cloud and is a partner service developed by Trifacta. Cloud Dataprep help analysts reduce time required to prepare data for analysis. Within the console, you have tools that help explore, cleanse, and transform data without configuring or provisioning servers.

Dataprep comes in three editions. Cloud Dataprep Premium, Standard and Legacy. Dataprep jobs run on Dataflow. When the job execution completes, Dataflow creates a template file in Cloud Storage (Temp Directory).

Let us look at how we can use Dataprep to load data from GCS to BigQuery with a few transformations. …

Cloud Data Fusion is a fully managed, code free web based data integration service that helps build data pipelines quickly and efficiently. An instance in Data Fusion is composed of several Google cloud services such as Cloud Storage, Persistent disk, Cloud Key Management Service etc. These services are used to develop and manage pipelines build on Data Fusion.

When you run a data pipeline, data fusion automatically provisions Dataproc clusters, run pipelines, and tears down the clusters once the job is completed. User may choose to run pipelines on existing Dataproc clusters too.

Before we start extracting data from Oracle…

Whether you want to migrate your on-prem data to cloud or between cloud providers, teams within an Organization should come together, choose a service that best fits, estimate costs, plan and execute.

Storage Transfer Service or simply Transfer Service is one of GCP’s offering that help you move data either from On-Prem, or from a different cloud or even from one GCS bucket to other. You can seamlessly create and process transfer operations to GCP Storage in a fully managed and serverless way.

No code — No infrastructure

User may choose either to perform one-time transfer or schedule transfers.


SP Kumar Rachumallu

Lead Programmer at Novartis Healthcare Pvt Ltd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store