Five Key Lessons for Google Earth Engine Beginners
Hands-On Insights from a Python API userLand cover map for the Paute water bassin in Ecuador for the year 2020. Image created using Google Earth Engine Python API and Geemap. Data source: Friedl, M., Sulla-Menashe, D. (2022); Lehner, B., Grill G. (2013) and Lehner, B., Verdin, K., Jarvis, A. (2008).IntroductionAs a climate scientist, Google Earth Engine (GEE) is a powerful tool in my toolkit. No more downloading heavy satellite images to my computer.GEE primary API is Javascript, although Python users can also access a powerful API to perform similar tasks. Unfortunately, there are fewer materials for learning GEE with Python.However, I love Python. Since I learned that GEE has a Python API, I imagined a world of possibilities combining the powerful GEE’s powerful cloud-processing capabilities with Python frameworks.The five lessons come from my most recent project, which involved analyzing water balance and drought in a water basin in Ecuador. Nevertheless, the tips, code snippets and examples could apply to any project.The story presents each lesson following the sequence of any data analysis project: data preparation (and planning), analysis, and visualization.It is also worth mentioning that I also provide some general advice independent of the language you use.This article for GEE beginners assumes an understanding of Python and some geospatial concepts.Lesson 1: Get familiar with GEE functionsIf you know Python but are new to GEE (like me some time ago), you should know that GEE has optimized functions for processing satellite images. We won’t delve into the details of these functions here; you should check the official documentation.However, my advice is to check first if a GEE can perform the analysis you want to conduct. When I first started using GEE, I used it as a catalogue for finding data, relying only on its basic functions. I would then write Python code for most of the analyses. While this approach can work, it often leads to significant challenges. I will discuss these challenges in later lessons.Don’t limit yourself to learning only the basic GEE functions. If you know Python (or coding in general), the learning curve for these functions is not very steep. Try to use them as much as possible — it is worth it in terms of efficiency.A final note: GEE functions even support machine learning tasks. These GEE functions are easy to implement and can help you solve many problems. Only when you cannot solve your problem with these functions should you consider writing Python code from scratch.As an example for this lesson, consider the implementation of a clustering algorithm.Example code with GEE functions# Sample the image to create input for clusteringsample_points = clustering_image.sample( region=galapagos_aoi, scale=30, # Scale in meters numPixels=5000, # Number of points to sample geometries=False # Don't include geometry to save memory)# Apply k-means clustering (unsupervised)clusterer = ee.Clusterer.wekaKMeans(5).train(sample_points)# Cluster the imageresult = clustering_image.cluster(clusterer)Example code with Pythonimport rasterioimport numpy as npfrom osgeo import gdal, gdal_array# Tell GDAL to throw Python exceptions and register all driversgdal.UseExceptions()gdal.AllRegister()# Open the .tiff fileimg_ds = gdal.Open('Sentinel-2_L2A_Galapagos.tiff', gdal.GA_ReadOnly)if img_ds is None: raise FileNotFoundError("The specified file could not be opened.")# Prepare an empty array to store the image data for all bandsimg = np.zeros( (img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount), dtype=gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType),)# Read each band into the corresponding slice of the arrayfor b in range(img_ds.RasterCount): img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()print("Shape of the image with all bands:", img.shape) # (height, width, num_bands)# Reshape for processingnew_shape = (img.shape[0] * img.shape[1], img.shape[2]) # (num_pixels, num_bands)X = img.reshape(new_shape)print("Shape of reshaped data for all bands:", X.shape) # (num_pixels, num_bands)The first block of code is not only shorter, but it will handle the large satellite datasets more efficiently because GEE functions are designed to scale across the cloud.While GEE’s functions are powerful, understanding the limitations of cloud processing is crucial when scaling up your project.Lesson 2: Understand the limitations of GEE cloud processing capabilitiesAccess to free cloud computing resources to process satellite images is a blessing. However, it’s not surprising that GEE imposes limits to ensure fair resource distribution. If you plan to use it for a non-commercial large-scale project (e.g. research deforestation in the Amazon region) and intend to stay within the free-tier limits you should plan accordingly. My general guidelines are:Limit the sizes of your regions, divide them, and work in batches. I didn’t need to do this in my project
Hands-On Insights from a Python API user
Introduction
As a climate scientist, Google Earth Engine (GEE) is a powerful tool in my toolkit. No more downloading heavy satellite images to my computer.
GEE primary API is Javascript, although Python users can also access a powerful API to perform similar tasks. Unfortunately, there are fewer materials for learning GEE with Python.
However, I love Python. Since I learned that GEE has a Python API, I imagined a world of possibilities combining the powerful GEE’s powerful cloud-processing capabilities with Python frameworks.
The five lessons come from my most recent project, which involved analyzing water balance and drought in a water basin in Ecuador. Nevertheless, the tips, code snippets and examples could apply to any project.
The story presents each lesson following the sequence of any data analysis project: data preparation (and planning), analysis, and visualization.
It is also worth mentioning that I also provide some general advice independent of the language you use.
This article for GEE beginners assumes an understanding of Python and some geospatial concepts.
Lesson 1: Get familiar with GEE functions
If you know Python but are new to GEE (like me some time ago), you should know that GEE has optimized functions for processing satellite images. We won’t delve into the details of these functions here; you should check the official documentation.
However, my advice is to check first if a GEE can perform the analysis you want to conduct. When I first started using GEE, I used it as a catalogue for finding data, relying only on its basic functions. I would then write Python code for most of the analyses. While this approach can work, it often leads to significant challenges. I will discuss these challenges in later lessons.
Don’t limit yourself to learning only the basic GEE functions. If you know Python (or coding in general), the learning curve for these functions is not very steep. Try to use them as much as possible — it is worth it in terms of efficiency.
A final note: GEE functions even support machine learning tasks. These GEE functions are easy to implement and can help you solve many problems. Only when you cannot solve your problem with these functions should you consider writing Python code from scratch.
As an example for this lesson, consider the implementation of a clustering algorithm.
Example code with GEE functions
# Sample the image to create input for clustering
sample_points = clustering_image.sample(
region=galapagos_aoi,
scale=30, # Scale in meters
numPixels=5000, # Number of points to sample
geometries=False # Don't include geometry to save memory
)
# Apply k-means clustering (unsupervised)
clusterer = ee.Clusterer.wekaKMeans(5).train(sample_points)
# Cluster the image
result = clustering_image.cluster(clusterer)
Example code with Python
import rasterio
import numpy as np
from osgeo import gdal, gdal_array
# Tell GDAL to throw Python exceptions and register all drivers
gdal.UseExceptions()
gdal.AllRegister()
# Open the .tiff file
img_ds = gdal.Open('Sentinel-2_L2A_Galapagos.tiff', gdal.GA_ReadOnly)
if img_ds is None:
raise FileNotFoundError("The specified file could not be opened.")
# Prepare an empty array to store the image data for all bands
img = np.zeros(
(img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
dtype=gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType),
)
# Read each band into the corresponding slice of the array
for b in range(img_ds.RasterCount):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
print("Shape of the image with all bands:", img.shape) # (height, width, num_bands)
# Reshape for processing
new_shape = (img.shape[0] * img.shape[1], img.shape[2]) # (num_pixels, num_bands)
X = img.reshape(new_shape)
print("Shape of reshaped data for all bands:", X.shape) # (num_pixels, num_bands)
The first block of code is not only shorter, but it will handle the large satellite datasets more efficiently because GEE functions are designed to scale across the cloud.
While GEE’s functions are powerful, understanding the limitations of cloud processing is crucial when scaling up your project.
Lesson 2: Understand the limitations of GEE cloud processing capabilities
Access to free cloud computing resources to process satellite images is a blessing. However, it’s not surprising that GEE imposes limits to ensure fair resource distribution. If you plan to use it for a non-commercial large-scale project (e.g. research deforestation in the Amazon region) and intend to stay within the free-tier limits you should plan accordingly. My general guidelines are:
- Limit the sizes of your regions, divide them, and work in batches. I didn’t need to do this in my project because I was working with a single small water basin. However, if your project involves large geographical areas this would be the first logical step.
- Optimize your scripts by prioritizing using GEE functions (see Lesson 1).
- Choose datasets that enable you to optimize computing power. For example, in my last project, I used the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS). The original dataset has a daily temporal resolution. However, it offers an alternative version called “PENTAD”, which provides data every five days. It corresponds to the sum of precipitation for these five days. Using this dataset allowed me to save computer power by processing the compacted version without sacrificing the quality of my results.
- Examine the description of your dataset, as it might reveal scaling factors that could save computer power. For instance, in my water balance project, I used the Moderate Resolution Imaging Spectroradiometer (MODIS) data. Specifically, the MOD16 dataset, which is a readily available Evapotranspiration (ET) product. According to the documentation, I could multiply my results by a scaling factor of 0.1. Scaling factors help reduce storage requirements by adjusting the data type.
- If worst comes to worst, be prepared to compromise. Reduce the resolution of the analyses if the standards of the study allow it. For example, the “reduceRegion” GEE function lets you summarize the values of a region (sum, mean, etc.). It has a parameter called “scale” which allows you to change the scale of the analysis. For instance, if your satellite data has a resolution of 10 m and GEE can’t process your analysis, you can adjust the scale parameter to a lower resolution (e.g. 50 m).
As an example from my water balance and drought project, consider the following block of code:
# Reduce the collection to a single image (mean MSI over the time period)
MSI_mean = MSI_collection.select('MSI').mean().clip(pauteBasin)
# Use reduceRegion to calculate the min and max
stats = MSI_mean.reduceRegion(
reducer=ee.Reducer.minMax(), # Reducer to get min and max
geometry=pauteBasin, # Specify the ROI
scale=500, # Scale in meters
maxPixels=1e9 # Maximum number of pixels to process
)
# Get the results as a dictionary
min_max = stats.getInfo()
# Print the min and max values
print('Min and Max values:', min_max)
In my project, I used a Sentinel-2 satellite image to calculate a moisture soil index (MSI). Then, I applied the “reduceRegion” GEE function, which calculates a summary of values in a region (mean, sum, etc.).
In my case, I needed to find the maximum and minimum MSI values to check if my results made sense. The following plot shows the MSI values spatially distributed in my study region.
The original image has a 10 m resolution. GEE struggled to process the data. Therefore, I used the scale parameter and lowered the resolution to 500 m. After changing this parameter GEE was able to process the data.
Lesson 3: Prioritize using ready-to-use analysis products
I am obsessed with data quality. As a result, I use data but rarely trust it without verification. I like to invest time in ensuring the data is ready for analysis. However, don’t let image corrections paralyze your progress.
My tendency to invest too much time with image corrections stems from learning remote sensing and image corrections “the old way”. By this, I mean using software that assists in applying atmospheric and geometric correction to images.
Nowadays, scientific agencies supporting satellite missions can deliver images with a high level of preprocessing. In fact, a great feature of GEE is its catalogue, which makes it easy to find ready-to-use analysis products.
Preprocessing is the most time-consuming task in any data science project. Therefore, it must be appropriately planned and managed.
The best approach before starting a project is to establish data quality standards. Based on your standards, allocate enough time to find the best product (which GEE facilitates) and apply only the required corrections (e.g. cloud masking).
Lesson 4: Check the GEE catalogue thoroughly to avoid starting from scratch
If you love programming in Python (like me), you might often find yourself coding everything from scratch.
As a PhD student (starting with coding), I wrote a script to perform a t-test over a study region. Later, I discovered a Python library that performed the same task. When I compared my script’s results with those using the library, the results were correct. However, using the library from the start could have saved me time.
I’m sharing this lesson to help you avoid these silly mistakes with GEE. I will mention two examples of my water balance project.
Example 1
To calculate the water balance in my basin, I needed ET data. ET is not an observed variable (like precipitation); it must be calculated.
The ET calculation is not trivial. You can look up the equations in textbooks and implement them in Python. However, some researchers have published papers related to this calculation and shared their results with the community.
This is when GEE comes in. The GEE catalogue not only provides observed data (as I initially thought) but also many derived products or modelled datasets (e.g. reanalysis data, land cover, vegetation indices, etc.). Guess what? I found a ready-to-use global ET dataset in the GEE catalogue — a lifesaver!
Example 2:
I also consider myself a Geographic Information System (GIS) professional. Over the years, I’ve acquired a substantial amount of GIS data for my work such as water basin boundaries in shapefile format.
In my water balance project, my intuition was to import my water basin boundary shapefile to my GEE project. From there, I transformed the file into a Geopandas object and continued my analysis.
In this case, I wasn’t as lucky as in Example 1. I lost precious time trying to work with this Geopandas object which I could not integrate well with GEE. Ultimately, this approach didn’t make sense. GEE does have in its catalogue a product for water basin boundaries that is easy to handle.
Thus, a key takeaway is to maintain your workflow within GEE whenever possible.
Lesson 5: Use Geemap for plotting
As mentioned at the beginning of this article, integrating GEE with Python libraries can be incredibly powerful.
However, even for simple analyses and plots, the integration doesn’t seem straightforward.
This is where Geemp comes in. Geemap is a Python package designed for interactive geospatial analysis and visualization with GEE.
Additionally, I also found that it can assist with creating static plots in Python. I made plots using GEE and Geemap in my water balance and drought project. The images included in this story used these tools.
Summary
GEE is a powerful tool. However, as a beginner, pitfalls are inevitable. This article provides tips and tricks to help you start on the right foot with GEE Python API.
References
European Space Agency (2025). European Space Agency. (Year). Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A.
Friedl, M., Sulla-Menashe, D. (2022). MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V061 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. Accessed 2025–01–15 from https://doi.org/10.5067/MODIS/MCD12Q1.061
Lehner, B., Verdin, K., Jarvis, A. (2008): New global hydrography derived from spaceborne elevation data. Eos, Transactions, AGU, 89(10): 93–94.
Lehner, B., Grill G. (2013): Global river hydrography and network routing: baseline data and new approaches to study the world’s large river systems. Hydrological Processes, 27(15): 2171–2186. Data is available at www.hydrosheds.org
Five Key Lessons for Google Earth Engine Beginners was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.