Working with Big Query datasets within Jupyter notebook is no different to work with any other DB using Jupyter. In 3 steps we can query data stored in BQ from Jupyter notebook.
Before we read or query data, we need to make sure to meet few prerequisites.
- You should have an active google cloud account (BQ Sandbox works too). You can use an existing project or create new project.
- You should have Jupyter notebook installed on your computer.
- You should enable Big query API either from cloud console or from Cloud Shell.
To enable Big Query API through cloud console: Go to navigation menu -> click APIs & Services. Once you are there, click + Enable APIs and Services (Highlighted below). In search bar, enter BigQuery API and click Enable.
Alternatively you can activate the API from cloud shell by using the below command.
gcloud services list #Lists the APIs that are enabled already
gcloud services enable bigquery.googleapis.com #To enable BQ API
In order to make request to the API enabled in Step#1, you need to create a service account and get an authentication file for your Jupyter Notebook. To do so, navigate to Credentials tab under APIs and Services console and click Create Credentials tab and then Service account under drop down.
Enter Service account name and description. You can use BigQuery Admin role under Grant this service account access to project. Click Done. You can now see the new service account under Credentials screen. Click pencil icon beside the new service account you have created and click Add Key to add auth key. Please choose JSON and click CREATE. It will download the JSON file with auth key info. (Download path will be used to authenticate)
You are now all set to use big query within Jupyter notebook. Before starting to query data, install the following from command prompt.
pip install google-cloud
pip install google-cloud-bigquery[pandas]
pip install google-cloud-storage
Querying public dataset (bigquery-public-data.austin_crime.crime) to get the top crimes by year:
#Set environment variables for your notebook
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:/Users/rachusr1/Downloads/bq_in_Jupyter.json'#Imports google cloud client library and initiates BQ service
from google.cloud import bigquery
bigquery_client = bigquery.Client()#Write Query on BQ
QUERY = """
COUNT(unique_key) AS Crime_Count
"""#Run the query and write result to a pandas data frame
Query_Results = bigquery_client.query(QUERY)
df = Query_Results.to_dataframe()#View top few rows of result
Data analysts who work with Jupyter notebook day in day out can leverage this form of querying data which helps them to be more productive.
Please note this is one of many ways to query bigquery data from Jupyter.