Process Update: June 2021
For the past many months I've been working to improve the recall of my naive moving object processing system. Said another way, I'm trying to make sure I'm detecting all known objects that are detectable in a set of observation frames. I figure I'm not in a good position to detect anything unknown until I'm consistently detecting everything already known that is visible with TESS' capabilities. Since I haven't written anything up in a while, I thought it'd be a good time for me to outline the changes that have accumulated since I wrote the naive MOPS post. So here's what's new — at least the bigger changes that stand out and that I remember.
1. 16 frame composite of 5σ detections. Links with 6 detection subset OD solves that match known objects are shown. Unique detection counts that solve per link in parenthesis in the image. These are all the known objects during this observation window at magnitude 19.5 or brighter.
First a brief discussion of the results in the images above
The two underlying images above are the same. They are 16 thresholded difference images calculated from single TESS observations of the southern hemisphere sky off the ecliptic and plotted on top of one another. The difference between them is that image 1 highlights known objects I've detected and solved orbits for while image 2 shows candidate links I've found without known object matches that have at least 6 detection OD solutions. I don't expect any of these image 2 solves are real objects. They're just the output of the process that I use to begin to validate candidate links. If they were real objects they'd have more than one 6 detection OD solution per link, which they do not.
So let me explain that last statement a little further. My process doesn't OD solve all detections in a candidate link at once. Instead it selects subset combinations of 6 detections from the link and tries to solve those. If you can find numerous different 6 detection subset solutions for a candidate link then you can begin to take that candidate link seriously. The unique number of detections that are in one solution or another are shown in the parenthesized value in the images above. For image 1 (known objects) you see links with 11s and 12s in parenthesis. In image 2 it's all 6s in parenthesis which means that I probably just discovered noise that happens to solve for a few short-ish candidate links. I should note that these dozen or so 'solves' in image 2 result from OD testing just over ten thousand candidate 6 detection subsets across five thousand plus total candidate links which have a minimum of 6 detections in them. So spurious solutions are still quite rare given the amount of candidate links you're testing.
Now let's talk about the evolution of the moving object processesing system that created these images and results.
Standard deviation filter for thresholding
The obvious and simple way to threshold a difference image is to calculate the standard deviation for the image and take as candidate detections pixel values that are greater than some multiple of that standard deviation. That's what I was doing previously. Now, for an NxN image I'm calculating N2 standard deviations around the center pixel in a sliding 200x200 (for example) pixel window. This is just a simple standard deviation filter that returns a local area standard deviation for each and every pixel in the difference image. Then, instead of thresholding based on a global difference image standard deviation, I threshold based on the local standard deviation I've calculated for each pixel.
Why do this? The short answer is it seems to work better. The more descriptive answer is that even after background subtraction and bright source masking you can still have noisier and quieter regions in your difference image. If you use a global standard deviation, the noisier regions (the regions with larger variance) will effectively bury the signal in the quieter regions. The standard deviation filter, while it takes quite a bit longer to compute, nicely handles this problem. I've also found I can push out to 5σ more easily without getting 'dead zones' where there's no signal (i.e. detections).
Longer candidate link length constraint with looser velocity clustering constraint
Previously I required linking to connect 4 detections to create a candidate link and then relinked those candidate links with other candidate links when they had 3 overlapping detections to extend the candidate link to more detections. I now require 6 detections to build a link (meaning that at least 6 detections must be moving at approximately the same velocity with respect to one another to qualify as a candidate). And relinking now requires 5 overlapping detections. At the same time I'm loosening the velocity clustering constraint. Basically I'm just turning the dials available to me to see what works best to create a set of candidate links to test with orbit determination. This technique seems to be working better than shorter links with tighter velocity clustering constraints.
Linking the ISPY database first
How do I know what's working better? Instead of running the linker on my entire difference image detection set, I first run it on the detections that match up to ISPY's object database for the region of sky I'm looking at. This does two things. First, it allows me to see what known objects are translating to individual detections in my difference images. Second, it allows me to run the linker on the substantially reduced dataset of matched detections. This way I can adjust my thresholding and linking constraints to make sure I detect and link all the objects ISPY is already aware of. It's much faster than tweaking parameters and iterating on the full detection dataset which includes many detections that will turn out to be noise.
Linking by velocity and span sets
This is an implementation optimization; it's not a change in how linking works but in how I call the linker. Instead of searching over all potential velocities at once with a long pixel span constraint, I break the search out into sets of velocities (e.g: 5-25 pixels per day; 25-50 pixels per day, etc.). That way I can intelligently constrain the maximum span I need to search over for each set. For example, if you're looking at the 5-25 ppd range over 4 days of observation, you can exclude checking detections that are more than 4x25=100 pixels beyond the detection pixel you're calculating velocities relative to because they're too far away to have moved that much in that amount of time for the given velocity range you're searching. Previously I wasn't excluding these spans and it's just needless extra computation that slowed linking down.
Relinking with overlaps when it adds a new frame
Relinking consolidates candidate links with N or more overlapping detections. Occasionally, however, it blows up. By that I mean if you start combining candidate links that overlap without any other constraints you occasionally create a link that grows to include all remaining detections. It's less likely to happen for larger N's, but it can still happen if you're testing for combinations over a large enough set of detections. I implemented a simple fix for this that only relinks candidate links into new longer links when the newly added detection (or detections) are in an observation frame not already included in the existing candidate link. That way the link can't grow much beyond the total number of observation frames.
OpenOrb solves with TESS position
It's kind of ridiculous how long it took me to discover this error. I had always had problems with OpenOrb not being able to solve links with more than 4 or 5 detections. Fast moving objects were harder to solve than slow moving ones. I understood the problem was most likely because I couldn't specify the observer location (i.e. TESS) to OpenOrb. I thought I was stuck solving and combining very short arc link solves. It turns out I was just specifying the TESS position incorrectly to OpenOrb the whole time! I was using a capital 'S' in column 15 in the observer position line in my MPC file when I needed to use a lower case 's'. What a dumb oversight! Since I fixed that I've been able to solve links of arbitrary length.
OpenOrb RA/Dec standard deviation tuning
OpenOrb requires the specification of a whole host of parameters to solve orbits efficiently and accurately. I cheated a bit and just grabbed Rubin's oorb.conf file from their Github OpenOrb repo. Of course Rubin's astrometric precision is way better than TESS', so the RA and Dec standard deviation estimate in the configuration file was a bit too tight. Rubin specified 1 arcsecond, but when I used that value some detections that matched known objects didn't solve. If I dial it up to 2 arcseconds known objects solve, but I get solves that appear non physical too. By stepping down the standard deviation 0.1 arcseconds at a time I was able to find the cutoff point where known object detections begin to have trouble solving - 1.5 arcseconds. That value is lower than I would have guessed, but that's probably a good thing as it will keep the number of solves that are non physical down.
Sampled combination OD solves to reduce compute
Once I've built up a number of candidate links to test with Orbit Determination, I begin validating them by selecting 6 detection subsets of the candidate link and testing those subsets with OpenOrb. But if the candidate link you're testing has say 20 detections in it that means there's C(20,6)=38,760 subset links of length 6 to test. That's way too many — especially when you've got thousands of other candidate links to test after it.
To reduce the number of orbits that I need to try to solve to validate the candidate link I do two things. First, if there are too many detections in the candidate link (e.g. the detection counts is much greater than the number of observation times), I re-run linking on the candidate link detections alone. I start with loose constraints and iteratively tighten them up until I've reduced the combination count of the candidate link set to a reasonable value to sample from. Once the number of possible combinations has been reduced, I choose the maximum number of samples I want to take from that set of combinations and test those. For example, say there are 1,000 combinations of 6 detections for a candidate link and I specify that I want to test a maximum of 500 orbits. I then randomly sample 500 selections of those 1,000 6 detection subsets. In practice this works quite well in finding solvable orbits.
The searching continues and the tools evolve. I should also mention the reasoning for directing the search at the part of the sky I'm generally looking at now. I'm working through more of these off-ecliptic regions of the southern hemisphere for two reasons. One is that they have low residual background levels. The detection process is easier when you don't have a significant background signal competing with faint objects. The second reason is speculative, but I believe there are fewer southern hemisphere surveys so the likelihood of stumbling upon a new object might be higher? That's definitely wishful thinking, but it's a stab at trying to "look where others aren't already looking." It's probably worth actually trying to calculate the least observed regions of sky available in the TESS dataset, but until I think of a good way to do that I'll probably continue to explore this quieter region of sky to see what I can detect.