Picking parameters
/One of the reasons I got interested in programming was to get smarter about broken workflows like this one from a generic seismic interpretation tool (I'm thinking of Poststack-PAL, but does that even exist any more?)...
- I want to make a coherence volume, which requires me to choose a window length.
- I use the default on a single line and see how it looks, then try some other values at random.
- I can't remember what I did so I get systematic: I try 8 ms, 16 ms, 32 ms, and 64 ms, saving each one as a result with _XXms appended so I can remember what I did
- I display them side by side but the windows are annoying to line up and resize, so instead I do it once, display them one at a time, grab screenshots, and import the images into PowerPoint because let's face it I'll need that slide eventually anyway
- I can't decide between 16 ms and 32 ms so I try 20 ms, 24 ms, and 28 ms as well, and do it all again, and gaaah I HATE THIS STUPID SOFTWARE
There has to be a better way.
Stumbling towards optimization
Regular readers will know that this is the time to break out the IPython Notebook. Fear not: I will focus on the outcomes here — for the real meat, go to the Notebook. Or click on these images to see larger versions, and code.
Let's run through using the Canny edge detector in scikit-image
, a brilliant image processing Python library. The algo uses the derivative of a Gaussian to compute gradient, and I have to choose 3 parameters. First, we'll try to optimize 'sigma', the width of the Gaussian. Let's try the default value of 1:
Clearly, there is too much noise in the result. Let's try the interval method that drove me crazy in desktop software:
Well, I think something between 8 and 16 might work. I could compute the average intensity of each image, choose a value in between them, and then use the sigma that gives that result. OK, it's a horrible hack, but turns out to be 10:
But the whole point of scientific computing is the efficient application of informed human judgment. So let's try adding some interactivity — then we can explore the 3D parameter space in a near-parallel instead of purely serial way:
I finally feel like we're getting somewhere... But it still feels a bit arbitrary. I still don't know I'm getting the optimal result.
What can I try next? I could try to extend the 'goal seek' option, and come up with a more sophisticated cost function. If I could define something well enough — for edge detection, like coherence, I might be interested in contrast — then I could potentially just find the best answers, in much the same way that a digital camera autofocuses (indeed, many of them look for the highest contrast image). But goal seeking, if the cost function is too literal, in a way begs the question. I mean, you sort of have to know the answer — or something about the answer — before you find it.
Social machines
Social machines are the hot new thing in computing (Big Data is so 2013). Perhaps instead I can turn to other humans, in my social and professional networks. I could...
- Ask my colleagues — perhaps my company has a knowledge sharing network I can go to.
- Ask t'Internet — I could ask Twitter, or my friends on Facebook, or a seismic interpretation group in LinkedIn. Better yet, Earth Science Stack Exchange!
- What if the software I was using just told me what other people had used for these parameters? Maybe this is only one step up from the programmer's default... especially if most people just use the programmer's default.
- But what if people could rate the outcome of the algorithm? What if their colleagues or managers could rate the outcome? Then I could weight the results with these ratings.
- What if there was a game that involved optimizing images (OK, maybe a bit of a stretch... maybe more like a Mechanical Turk). Then we might have a vast crowd of people all interested in really pushing the edge of what is intuitively reasonable, and maybe exploring the part of the parameter space I'm most interested in.
What if I could combine the best of all these approaches? Interactive exploration, with guided optimization, constrained by some cost function or other expectation. That could be interesting, but unfortunately I have absolutely no idea how that would work. I do think the optimization workflow of the future will contain all of these elements.
What do you think? Do you have an awesome way to optimize the parameters of seismic attributes? Do you have a vision for how it could be better? It occurs to me this could be a great topic for a future hackathon...