Another Propagator…
Tensorgator is a CUDA accelerated propagator written in Python.
pip install tensorgator
Tensorgator (Tensor Propagator) uses Numba-CUDA to take advantage of GPU parallelism for orbit propagation. My goal with this project is to enable students/researchers/hobbyists to model large constellations and push the frontier of constellation design. It can be installed to run in a Google Colab notebook on the free GPU tier.
As a preview, the video above shows a 40,000 satellites scenario that can be generated from the Tensorgator examples directory. The propagation only takes 0.41s while rendering the animation in matplotlib takes a few minutes more.
1. What it can do:
Large Constellations: 500,000 satellites, 500 timesteps → 21 seconds. Tested on Google Colab with the T4 GPU (15GB VRAM)
Satellite Visibility Analysis: Determine satellite coverage and visibility from the ground
Coordinate Transformations: Fast ECI/ECEF conversions with CUDA acceleration
CPU Fall Back: Support for CPU mode when CUDA is not available
2. What it can’t do:
Accurately predict real satellite positions: Only a simple analytical perturbation is modeled. Higher-order perturbations (drag, solar radiation pressure, etc.) are not included. Do not use this tool to manage a real constellation (use Tensorgator for design only)
That unique use case that you care about: This tool is a skeleton for DIY customization so if you want to model optics/comm links/maneuvers you should fork the project and add it in!
3. Show me:
Here is a simulation of a few mega constellations and 2M pieces of simulated space junk. Tensorgator can propagate this snapshot in time in <2s
Another example, generates coverage contours for a constellation of 10 satellites in randomly distributed orbits, and 10 in an evenly spaced Walker constellation T/P/F = 10/10/3. The maximum gap duration is the longest time during the sim where no satellite has access over the ground point. This simulation takes ~5s to generate gap contours on a dense grid of ground points.
The figure below shows a 25 satellite, 5 plane constellation in an evenly spaced Walker configuration. The coverage percentage metric is the fraction of the simulation time where a ground point is covered by at least one satellite.
4. Why do it?
Commercial satellite modeling software (that I won’t mention here) costs $100k/year in licensing fees. Not to mention that these commercial offerings struggle with large constellations and usually have cumbersome scripting integration.
In my experience an analytical J2 model is the sweet spot for accuracy and speed. Why chase accuracy when you are modeling a conceptual design?
Existing free alternatives had two major drawbacks:
I was not fluent in the language they were written in (C++/Rust/Fortran)
They were not performant for large constellations (>1000+) or very long time periods (>1yr)
Overall, tensorgator is designed to be high performance and simple to use. You shouldn’t need to be an expert programmer to research constellation design.
5. How does it work?
Tensorgator uses a few tricks to achieve its fast propagation times. I’ll use a factory analogy that helped me understand how to optimize for GPU performance.
To maximize the output of our factory we want to do the following
Parallel assembly lines (Satellite positions are computed in parallel for maximum GPU utilization)
Uniform manufacturing steps (All threads execute the same operations simultaneously)
Distribution centers (Preallocated input/output shapes minimize memory transactions)
Parallel Assembly Lines:
CUDA has prebuilt memory management for parallel execution on GPU hardware:
Thread - worker - CUDA core (RTX 3080: 8,704)
Block - assembly line - Streaming Multiprocessor (RTX 3080: 68)
Grid - factory - GPU (RTX 3080: 1)
Each thread computes the position of one satellite at one time step.
Each block contains a 16 x 16 grid of threads (256 satellite-time step calculations)
The grid uses 2d array of blocks to calculate all satellite-time combinations
Uniform manufacturing steps
Tensorgator uses a contour integral method to solve Kepler's equation. This method was selected to eliminates thread divergence:
Newton-Raphson: Variable convergence time (2-15 iterations per thread) depending on orbital elements
Contour integral: Fixed computation (exactly 10 sample points per thread)
All threads complete simultaneously, so the GPU isn’t waiting for slower-converging cases.
Distribution centers
The input and outputs for the simulation are all uniform sizes and read/write from sequential addresses in memory.
Input tensors:
elements[satellite, orbital_params]
→ Shape: (N_satellites, 6)times[timestep]
→ Shape: (N_times)
Output tensor:
positions[satellite, time, xyz]
→ Shape: (N_satellites, N_times, 3)
The 2D grid organizes threads so each satellite maps to a column and each time step maps to a row. This means neighboring threads work on consecutive satellites, allowing the GPU to bundle their memory requests together into larger, more efficient data transfers.
Misc Compiler Optimizations
Fast math: Aggressive floating-point optimizations
FP32 precision: Default single-precision (FP64 available for higher accuracy)
No-Python mode: JIT compilation without Python overhead (CPU fallback)
6. Want to improve it?
Feel free to fork or add to https://github.com/ApoPeri/TensorGator.
In building this library I’ve found inspiration from the following projects:
Hapsira GPU (Poliastro Fork), ESA’s dsgp4, oSTK (C++), nyxspace (Rust)
This is awesome. Future features of higher-order perturbations will be crazy.
Brilliant work, works really fast as intended! Assuming that this can be extended to large sims. Curious to know how do position errors grow relative to SGP4 propagators versus the number of days. What's the longest duration you reckon that the simulator work for in free tier Colab?