Catalog Scan
The command requake scan_catalog is the core of Requake’s
catalog-based repeater search. It compares every event in an earthquake
catalog against its neighbours and identifies
pairs with highly similar waveforms — the building blocks of
repeating-earthquake families.
The scan is single-station: each event pair is compared using one
seismic trace at a time (catalog_trace_id). If multiple trace IDs
are configured, the station closest to the event pair is selected.
The scan is parallelized across multiple CPU cores to handle large catalogs efficiently.
This page describes how the scan works, what configuration parameters control it, and how to get the best performance out of it.
Overview
Given a catalog of \(N\) events, the scan proceeds in three stages:
Spatial grouping — events are compared only if their epicentres lie within a configurable search radius (
catalog_search_range). This reduces the naive \(\mathcal{O}(N^2)\) pair count to a much smaller set of candidates.Waveform retrieval — for each candidate pair, the required waveform windows are fetched (from FDSN web services, a local archive, or the on-disk cache) and cut around the P arrival.
Cross-correlation — the two waveforms are band-pass filtered between
cc_freq_minandcc_freq_maxHz to isolate the frequency band of interest, then cross-correlated in the time domain. The maximum correlation coefficient \(CC_\mathrm{max}\) is computed. The result, together with the optimal lag, trace ID, and inter-event distance, is stored for every candidate pair in the output database.
All the parameters mentioned above are described in the configuration file.
Spatial grouping
For each event in the catalog, Requake builds a list of neighbouring
events whose epicentral distance is within catalog_search_range
kilometres. Only these candidate pairs are passed to the waveform stage.
The spatial search uses a k-d tree (implemented via scipy.spatial.cKDTree) over the 3‑D Cartesian coordinates of the events on the unit sphere, which makes the neighbour lookup very fast even for large catalogs.
If catalog_search_range is set to zero (or negative), every
possible event pair is considered — useful for small catalogs, but
impractical for large ones.
Waveform retrieval
For each candidate pair, Requake retrieves a short waveform window
around the P‑wave arrival for one or more seismic traces
(catalog_trace_id). The window starts cc_pre_P seconds before
the theoretical P arrival and lasts cc_trace_length seconds.
Waveforms can come from three sources, in order of priority:
On-disk SQLite cache (
OUTDIR/waveform_cache.sqlite) — ifcatalog_waveform_disk_cache_enabledistrue, previously fetched waveforms are reused. Use requake wfcache prefetch to populate this cache before the scan.In-memory cache (sized by
catalog_waveform_cache_size) — waveforms for recently processed events are kept in RAM to avoid repeated fetches.FDSN web services or local archives — if a waveform is not found in either cache, it is downloaded from the configured FDSN dataselect service or read from a local SDS directory or per-event data folder.
If multiple trace IDs are configured (comma-separated), Requake selects the station closest to the event pair for each comparison.
Cross-correlation and pair selection
The cross-correlation step is controlled by the Processing parameters
section of the configuration file. The
relevant parameters are:
cc_pre_Pseconds of signal before the P arrival
cc_trace_lengthtotal waveform window length in seconds
cc_freq_min,cc_freq_maxband-pass filter corners in Hz
cc_max_shiftmaximum allowed lag in seconds
cc_allow_negativewhen
true, the largest absolute value of the correlation function is returned — be it positive (correlation) or negative (anti-correlation)
Each waveform pair undergoes the following processing:
Both traces are band-pass filtered between
cc_freq_minandcc_freq_maxHz.They are cross-correlated in the time domain using obspy.signal.cross_correlation.correlate, allowing a maximum lag of
cc_max_shiftseconds to account for travel-time differences.The maximum normalised cross-correlation coefficient \(CC_\mathrm{max}\) is extracted with obspy.signal.cross_correlation.xcorr_max.
Every candidate pair is written to the output database with its \(CC_\mathrm{max}\), optimal lag, trace ID, and inter-event distance.
When cc_allow_negative is true, the largest value — positive or
negative — of the cross-correlation function is stored, so both
correlated and anti-correlated pairs are recorded.
Parallel execution
The scan uses multiple worker processes to process pairs in parallel.
The number of workers is controlled by catalog_scan_nprocs:
set it to 0 (the default) for automatic selection — one fewer than
the number of available CPU cores, 1 to run serially, or to a
specific number.
Each worker maintains its own in-memory waveform cache (sized by
catalog_waveform_cache_size_parallel, or derived automatically)
to minimise inter-process communication.
Resuming an interrupted scan
If the scan is interrupted (e.g., by a network outage or a user
pressing Ctrl‑C), running requake scan_catalog again will detect
the existing pairs in the database and offer to resume from where it
stopped. Use --force-continue to skip the prompt in scripts.
Slurm clusters
Requake automatically detects when it is running inside a
Slurm job
(via the SLURM_JOB_ID environment variable) and adapts its
behaviour:
Worker count. When
catalog_scan_nprocsis set to0(automatic) and Slurm is detected, the number of workers is taken from the Slurm allocation (SLURM_CPUS_PER_TASK,SLURM_CPUS_ON_NODE, orSLURM_JOB_CPUS_PER_NODE, checked in that order). Unlike local runs, the full allocated count is used without subtracting one.Progress logging. Progress messages include the Slurm job ID, process ID, and node list, making it easier to monitor jobs in cluster logs.
Non-interactive mode. If a scan is interrupted on a cluster, Requake will not prompt for input (since
stdinis typically not available). Use--forceto overwrite an existing scan or--force-continueto resume it.
Note
Slurm integration has been developed and tested on the IPGP S-CAPAD platform, which we gratefully acknowledge.
Performance tips
Prefetch waveforms. For large catalogs relying on FDSN sources, run requake wfcache prefetch before the scan. This downloads all required waveform windows once and stores them in the local SQLite cache, letting the scan read from disk instead of the network.
Tune the search radius. A larger
catalog_search_rangefinds more candidate pairs but increases runtime. Choose a value that reflects the maximum expected distance between repeating events in your study area.Limit memory. If you hit memory limits, reduce
catalog_waveform_cache_sizeand let the on-disk cache handle persistence instead.