Shooting into the dark

Part of what makes uncertainty such a slippery subject is that it conflates several concepts that are better kept apart: precision, accuracy, and repeatability. People often mention the first two, less often the third.

It's clear that precision and accuracy are different things. If someone's shooting at you, for instance, it's better that they are inaccurate but precise so that every bullet whizzes exactly 1 metre over your head. But, though the idea of one-off repeatability is built in to the concept of multiple 'readings', scientists often repeat experiments and this wholesale repeatability also needs to be captured. Hence the third drawing. 

One of the things I really like in Peter Copeland's book Communicating Rocks is the accuracy-precision-repeatability figure (here's my review). He captured this concept very nicely, and gives a good description too. There are two weaknesses though, I think, in these classic target figures. First, they portray two dimensions (spatial, in this case), when really each measurement we make is on a single axis. So I tried re-drawing the figure, but on one axis:

The second thing that bothers me is that there is an implied 'correct answer'—the middle of the target. This seems reasonable: we are trying to measure some external reality, after all. The problem is that when we make our measurements, we do not know where the middle of the target is. We are blind.

If we don't know where the bullseye is, we cannot tell the difference between precise and imprecise. But if we don't know the size of the bullseye, we also do not know how accurate we are, or how repeatable our experiments are. Both of these things are entirely relative to the nature of the target. 

What can we do? Sound statistical methods can help us, but most of us don't know what we're doing with statistics (be honest). Do we just need more data? No. More expensive analysis equipment? No.

No, none of this will help. You cannot beat uncertainty. You just have to deal with it.

This is based on an article of mine in the February issue of the CSEG Recorder. Rather woolly, even for me, it's the beginning of a thought experiment about doing a better job dealing with uncertainty. See Hall, M (2012). Do you know what you think you know? CSEG Recorder, February 2012. Online in May. Figures are here. 

A mixing board for the seismic symphony

Seismic processing is busy chasing its tail. OK, maybe an over-generalization, but researchers in the field are very skilled at finding incremental—and sometimes great—improvements in imaging algorithms, geometric corrections, and fidelity. But I don't want any of these things. Or, to be more precise: I don't need any more. 

Reflection seismic data are infested with filters. We don't know what most of these filters look like, and we've trained ourselves to accept and ignore them. We filter out the filters with our intuition. And you know where intuition gets us.

Mixing boardIf I don't want reverse-time, curved-ray migration, or 7-dimensional interpolation, what do I want? Easy: I want to see the filters. I want them perturbed and examined and exposed. Instead of soaking up whatever is left of Moore's Law with cluster-hogging precision, I would prefer to see more of the imprecise stuff. I think we've pushed the precision envelope to somewhere beyond the net uncertainty of our subsurface data, so that quality and sharpness of the seismic image is not, in most cases, the weak point of an integrated interpretation.

So I don't want any more processing products. I want a mixing board for seismic data.

To fully appreciate my point of view, you need to have experienced a large seismic processing project. It's hard enough to process seismic, but if there is enough at stake—traces, deadlines, decisions, or just money—then it is almost impossible to iterate the solution. This is rather ironic, and unfortunate. Every decision, from migration aperture to anisotropic parameters, is considered, tested, and made... and then left behind, never to be revisited.

Linear seismic processing flow

But this linear model, in which each decision is cemented onto the ones before it, seems unlikely to land on the optimal solution. Our fateful string of choices may lead us to a lovely spot, with a picnic area and clean toilets, but the chances that it is the global maximum, which might lie in a distant corner of the solution space, seem slim. What if the spherical divergence was off? Perhaps we should have interpolated to a regularized geometry. Did we leave some ground roll in the data? 

Seismic processing mixing boardLook, I don't know the answer. But I know what it would look like. Instead of spending three months generating the best-ever migration, we'd spend three months (maybe less) generating a universe of good-enough migrations. Then I could sit at my desk and—at least with first order precision—change the spherical divergence, or see if less aggressive noise attenuation helps. A different migration algorithm, perhaps. Maybe my multiples weren't gone after all: more radon!

Instead of looking along the tunnel of the processing flow, I want the bird's eye view of all the possiblities. 

If this sounds impossible, that's because it is impossible, with today's approach: process in full, then view. Why not just do this swath? Ray trace on the graphics card. Do everything in memory and make me buy 256GB of RAM. The Magic Earth mentality of 2001—remember that?

Am I wrong? Maybe we're not even close to good-enough, and we should continue honing, at all costs. But what if the gains to be made in exploring the solution space are bigger than whatever is left for image quality?

I think I can see another local maximum just over there...

Mixing board image: iStockphoto.

The map that changed the man

This is my contribution to the Accretionary Wedge geoblogfest, number 43: My Favourite Geological Illustration. You can read all about it, and see the full list of entries, at In the Company of Plants and Rocks. To quote Hollis:

All types of geological illustrations qualify — drawings, paintings, maps, charts, graphs, cross-sections, diagrams, etc., but not photographs.  You might choose something because of its impact, its beauty, its humor, its clear message or perhaps because of a special role it played in your life.  Let us know the reasons for your choice!

The map that changed the man

In 1987, at the age of 16, I became a geologist wannabe. A week on Rùm (called Rhum at the time) with volcanologist Steve Sparks convinced me that it was the most complete science of nature, being a satisfying stew of physics, chemistry, geomorphology, cosmology, fluid dynamics, and single malt whisky. One afternoon, he showed me cross-beds in the Torridonian sandstones on the shore of Loch Scresort, and identical cross-beds in the world-famous layered gabbros in the magma chamber of a Palaeogene volcano. 

View of Rum image by Southside Images, see below for credit.

But I was just a wannabe. So I studied hard at school and went off to the University of Durham. The usual studying and non-studying ensued, during which I discovered which parts of the science drew me in. There were awesome field trips, boring crystallography lectures, and tough structural geology labs. And at the end of the second year, there was the 6-week independent mapping project

As far as I know, independent mapping projects sensu stricto are a British phenomenon. I hope they still exist. Two groups decided the UK, while offering incredible basemaps and rich geological literature, was too soggy. One group went to the French Alps, where carbonates legend Maurice Tucker would be vacationing and available for advice, the other group decided that was too easy and went off to the wild mountains of northern Spain and the thrust front of the Pyrenees, where no-one was vacationing and no-one would be available for anything. Guess which group I was in. 

To say we were green would be like saying geologists think beer is OK. I hitchhiked there (but only had one creepy ride). We lived in tents (but in a peach orchard). It was July, and 35 degrees Celsius on a cool day (but there was a lake). We had no money (but lots of coloured pencils). It wasn't so bad. We all fell in love with Spain. 

Anyway, long story short, I made this map. It's no good, but that's not the point. It's my map. It's the map that turned me from wannabe into actual (if poor). It doesn't really need any commentary. It took hours and hours of scratching with Rotring Rapidographs on drawing film, then colouring the Diazo print by hand. This sounds like ancient history, but the methods I used to create it were already on the verge of extinction—the following year I started using Adobe Illustrator for draughting, and now I use Inkscape. And while some field tools have changed (of course we were not armed with laptops, Google Earth, GPS, or digital cameras), others are pure and true and timeless. Whack, whack,...

The ring of my hammer on Late Cretaceous limestones is still echoing through the Pyrenees. 

Geological map of the Embaase de Santa Ana, Alfarras, Spain; click to enlarge.

My map of the geology around the Embalse de Santa Ana. Hand-drawn by me in 1992, though I admit it looks like it's from 1892. Click for a larger view. View of Rùm by flickr user Southside Images, licensed CC-BY-NC-SA.

Please sir, may I have some processing products?

Just like your petrophysicist, your seismic processor has some awesome stuff that you want for your interpretation. She has velocities, fold maps, and loads of data. For some reason, processors almost never offer them up — you have to ask. Here is my processing product checklist:

A beautiful seismic volume to interpret. Of course you need a volume to tie to wells and pick horizons on. These days, you usually want a prestack time migration. Depth migration may or may not be something you want to pay for. But there's little point in stopping at poststack migration because if you ever want to do seismic analysis (like AVO for example), you're going to need a prestack time migration. The processor can smooth or enhance this volume if they want to (with your input, of course). 

Unfiltered, attribute-friendly data. Processors like to smooth things with filters like fxy and fk. They can make your data look nicer, and easier to pick. But they mix traces and smooth potentially important information out—they are filters after all. So always ask for the unfiltered data, and use it for attributes, especially for computing semblance and any kind of frequency-based attribute. You can always smooth the output if you want.

Limited-angle stacks. You may or may not want the migrated gathers too—sometimes these are noisy, and they can be cumbersome for non-specialists to manipulate. But limited-angle stacks are just like the full stack, except with fewer traces. If you did prestack migration they won't be expensive, get them exported while you have the processor's attention and your wallet open. Which angle ranges you ask for depends on your data and your needs, but get at least three volumes, and be careful when you get past about 35˚ of offset. 

Rich, informative headers. Ask to see the SEG-Y file header before the final files are generated. Ensure it contains all the information you need: acquisition basics, processing flow and parameters, replacement velocity, time datum, geometry details, and geographic coordinates and datums of the dataset. You will not regret this and the data loader will thank you.

Processing report. Often, they don't write this until they are finished, which is a shame. You might consider asking them to write up a shared Google Docs or a private wiki as they go. That way, you can ensure you stay engaged and informed, and can even help with the documentation. Make sure it includes all the acquisition parameters as well as all the processing decisions. Those who come after you need this information!

Parameter volumes. If you used any adaptive or spatially varying parameters, like anisotropy coefficients for example, make sure you have maps or volumes of these. Don't forget time-varying filters. Even if it was a simple function, get it exported as a volume. You can visualize it with the stacked data as part of your QC. Other parameters to ask for are offset and azimuth diversity.

Migration velocity field (get to know velocities). Ask for a SEG-Y volume, because then you can visualize it right away. It's a good idea to get the actual velocity functions as well, since they are just small text files. You may or may not use these for anything, but they can be helpful as part of an integrated velocity modeling effort, and for flagging potential overpressure. Use with care—these velocities are processing velocities, not earth measurements.

The SEG's salt model, with velocities. Image:Sandia National Labs.Surface elevation map. If you're on land, or the sea floor, this comes from the survey and should be very reliable. It's a nice thing to add to fancy 3D displays of your data. Ask for it in depth and in time. The elevations are often tucked away in the SEG-Y headers too—you may already have them.

Fold data. Ask for fold or trace density maps at important depths, or just get a cube of all the fold data. While not as illuminating as illumination maps, fold is nevertheless a useful thing to know and can help you make some nice displays. You should use this as part of your uncertainty analysis, especially if you are sending difficult interpretations on to geomodelers, for example. 

I bet I have missed something... is there anything you always ask for, or forget and then have to extract or generate yourself? What's on your checklist?

Bring it into time

A student competing in the AAPG's Imperial Barrel Award recently asked me how to take seismic data, and “bring it into depth”. How I read this was, “how do I take something that is outside my comfort zone, and make it fit with what is familiar?” Geologists fear the time domain. Geology is in depth, logs are in depth, drill pipe is in depth. Heck, even X and Y are in depth. Seismic data relates to none of those things; useless right? 

It is excusable for the under-initiated, but this concept of “bringing [time domain data] into depth” is an informal fallacy. Experienced geophysicists understand this because depth conversion, in all of its forms and derivatives, is a process that introduces a number of known unknowns. It is easier for others to be dismissive, or ignore these nuances. So early-onset discomfort with the travel-time domain ensues. It is easier to stick to a domain that doesn’t cause such mental backflips; a kind of temporal spatial comfort zone. 

Linear in time

However, the unconverted should find comfort in one property where the time domain is advantageous; it is linear. In contrast, many drillers and wireline engineers are quick to point that measured depth is not nessecarily linear. Perhaps time is an even more robust, more linear domain of measurement (if there is such a concept). And, as a convenient result, a world of possibilities emerge out of time-linearity: time-series analysis, digital signal processing, and computational mathematics. Repeatable and mechanical operations on data.

Boot camp in time

The depth domain isn’t exactly omnipotent. A colleague, who started her career as a wireline-engineer at Schlumberger, explained to me that her new-graduate training involved painfully long recitations and lecturing on the intricacies of depth. What is measured depth? What is true vertical depth? What is drill-pipe stretch? What is wireline stretch? And so on. Absolute depth is important, but even with seemingly rigid sections of solid steel drill pipe, it is still elusive. And if any measurement requires a correction, that measurement has error. So even working in the depth domain data has its peculiarities.

Few of us ever get the privilege of such rigorous training in the spread of depth measurements. Sitting on the back of the rhetorical wireline truck, watching the coax-cable unpeel into the wellhead. Few of us have lifted a 300 pound logging tool, to feel the force that it would impart on kilometres of cable. We are the recipients of measurements. Either it is a text file, or an image. It is what it is, and who are we to change it? What would an equvialent boot camp for travel-time look like? Is there one?

In the filtered earth, even the depth domain is plastic. Travel-time is the only absolute.

More than a blueprint

blueprint_istock.jpg
"This company used to function just fine without any modeling."

My brother, an architect, paraphrased his supervisor this way one day; perhaps you have heard something similar. "But the construction industry is shifting," he noted. "Now, my boss needs to see things in 3D in order to understand. Which is why we have so many last minute changes in our projects. 'I had no idea that ceiling was so low, that high, that color, had so many lights,' and so on."

The geological modeling process is often an investment with the same goal. I am convinced that many are seduced by the appeal of an elegantly crafted digital design, the wow factor of 3D visualization. Seeing is believing, but in the case of the subsurface, seeing can be misleading.

Not your child's sandbox! Photo: R Weller.

Not your child's sandbox! Photo: R Weller.

Building a geological model is fundamentally different than building a blueprint, or at least it should be. First of all, a geomodel will never be as accurate as a blueprint, even after the last well has been drilled. The geomodel is more akin to the apparatus of an experiment; literally the sandbox and the sand. The real lure of a geomodel is to explore and evaluate uncertainty. I am ambivalent about compelling visualizations that drop out of geomodels, they partially stand in the way of this high potential. Perhaps they are too convincing.

I reckon most managers, drillers, completions folks, and many geoscientists are really only interested in a better blueprint. If that is the case, they are essentially behaving only as designers. That mindset drives a conflict any time the geomodel fails to predict future observations. A blueprint does not have space for uncertainty, it's not defined that way. A model, however, should have uncertainty and simplifying assumptions built right in.

Why are the narrow geological assumptions of the designer so widely accepted and in particular, so enthusiastically embraced by the industry? The neglect of science keeping up with technology is one factor. Our preference for simple and quickly understood explanations is another. Geology, in its wondrous complexity, does not conform to such easy reductions.

Despite popular belief, this is not a blueprint.We gravitate towards a single solution precisely because we are scared of the unknown. Treating uncertainty is more difficult that omitting it, and a range of solutions is somehow less marketable than precision (accuracy and precision are not the same thing). It is easier because if you have a blueprint, rigid, with tight constraints, you have relieved yourself from asking what if?

  • What if the fault throw was 20 m instead of 10 m?
  • What if the reservoir was oil instead of water?
  • What if the pore pressure increases downdip?

The geomodelling process should be undertaken for the promise of invoking questions. Subsurface geoscience is riddled with inherent uncertainties, uncertainties that we aren't even aware of. Maybe our software should have a steel-blue background turned on as default, instead of the traditional black, white, or gray. It might be a subconscious reminder that unless you are capturing uncertainty and iterating, you are only designing a blueprint.

If you have been involved with building a geologic model, was it a one-time rigid design, or an experimental sandbox of iteration?

The photograph of the extensional sandbox experiment is used with permission from Roger Weller of Cochise College. Image of geocellular model from the MATLAB Reservoir Simulation Toolbox (MRST) from SINTEF applied mathematics, which has been recently released under the terms of the GNU General public license! The blueprint is © nadla and licensed from iStock. None of these images are subject to Agile's license terms.

Open up

After a short trip to Houston, today I am heading to London, Ontario, for a visit with Professor Burns Cheadle at the University of Western Ontario. I’m stoked about the trip. On Saturday I’m running my still-developing course on writing for geoscientists, and tomorrow I’m giving the latest iteration of my talk on openness in geoscience. I’ll post a version of it here once I get some notes into the slides. What follows is based on the abstract I gave Burns.

A recent survey by APEGBC's Innovation magazine revealed that geoscience is not among the most highly respected professions. Only 20% of people surveyed had a ‘great deal of respect’ for geologists and geophysicists, compared to 30% for engineers, and 40% for teachers. This is far from a crisis, but as our profession struggles to meet energy demands, predict natural disasters, and understand environmental change, we must ask, How can we earn more trust? Perhaps more openness can help. I’m pretty sure it can’t hurt.

Many people first hear about ‘open’ in connection with software, but open software is just one point on the open compass. And even though open software is free, and can spread very easily in principle, awareness is a problem—open source marketing budgets are usually small. Open source widgets are great, but far more powerful are platforms and frameworks, because these allow geoscientists to focus on science, not software, and collaborate. Emerging open frameworks include OpendTect and GeoCraft for seismic interpretation, and SeaSeis and BotoSeis for seismic processing.

If open software is important for real science, then open data are equally vital because they promote reproducibility. Compared to the life sciences, where datasets like the Human Genome Project and Visible Human abound, the geosciences lag. In some cases, the pieces exist already in components like government well data, the Open Seismic Repository, and SEG’s list of open datasets, but they are not integrated or easy to find. In other cases, the data exist but are obscure and lack a simple portal. Some important plays, of global political and social as well as scientific interest, have little or no representation: industry should release integrated datasets from the Athabasca oil sands and a major shale gas play as soon as possible.

Open workflows are another point, because they allow us to accelerate learning, iteration, and failure, and thus advance more quickly. We can share easily but slowly and inefficiently by publishing, or attending meetings, but we can also write blogs, contribute to wikis, tweet, and exploit the power of the internet as a dynamic, multi-dimensional network, not just another publishing and consumption medium. Online readers respond, get engaged, and become creators, completing the feedback loop. The irony is that, in most organizations, it’s easier to share with the general public, and thus competitors, than it is to share with colleagues.

The fourth point of the compass is in our attitude. An open mindset recognizes our true competitive strengths, which typically are not our software, our data, or our workflows. Inevitably there are things we cannot share, but there’s far more that we can. Industry has already started with low-risk topics for which sharing may be to our common advantage—for example safety, or the environment. The question is, can we broaden the scope, especially to the subsurface, and make openness the default, always asking, is there any reason why I shouldn’t share this?

In learning to embrace openness, it’s important to avoid some common misconceptions. For example, open does not necessarily mean free-as-in-beer. It does not require relinquishing ownership or rights, and it is certainly not the same as public domain. We must also educate ourselves so that we understand the consequences of subtle and innocuous-seeming clauses in licences, for example those pertaining to non-commerciality. If we can be as adept in this new language as many of us are today in intellectual property law, say, then I believe we can accelerate innovation in energy and build trust among our public stakeholders.

So what are you waiting for? Open up!

Ten things I loved about ScienceOnline2012

ScienceOnline logoI spent Thursday and Friday at the annual Science Online unconference at North Carolina State University in Raleigh, NC. I had been looking forward to it since peeking in on—and even participating in—sessions last January at ScienceOnline2011. As soon as I had emerged from the swanky airport and navigated my way to the charmingly peculiar Velvet Cloak Inn I knew the first thing I loved was...

Raleigh, and NC State University. What a peaceful, unpretentious, human-scale place. And the university campus and facilities were beyond first class. I was born in Durham, England, and met my wife at university there, so I was irrationally prepared to have a soft spot for Durham, North Carolina, and by extension Raleigh too. And now I do. It's one of those rare places I've visited and known at once: I could live here. I was still basking in this glow of fondness when I opened my laptop at the hotel and found that the hard drive was doornail dead. So within 12 hours of arriving, I had...

Read More

The filtered earth

Ground-based image (top left) vs Hubble's image. Click for a larger view. One of the reasons for launching the Hubble Space Telescope in 1990 was to eliminate the filter of the atmosphere that affects earth-bound observations of the night sky. The results speak for themselves: more than 10 000 peer-reviewed papers using Hubble data, around 98% of which have citations (only 70% of all astronomy papers are cited). There are plenty of other filters at work on Hubble's data: the optical system, the electronics of image capture and communication, space weather, and even the experience and perceptive power of the human observer. But it's clear: eliminating one filter changed the way we see the cosmos.

What is a filter? Mathematically, it's a subset of a larger set. In optics, it's a wavelength-selection device. In general, it's a thing or process which removes part of the input, leaving some output which may or may not be useful. For example, in seismic processing we apply filters which we hope remove noise, leaving signal for the interpreter. But if the filters are not under our control, if we don't even know what they are, then the relationship between output and input is not clear.

Imagine you fit a green filter to your petrographic microscope. You can't tell the difference between the scene on the left and the one on the right—they have the same amount and distribution of green. Indeed, without the benefit of geological knowledge, the range of possible inputs is infinite. If you could only see a monochrome view, and you didn't know what the filter was, or even if there was one, it's easy to see that the situation would be even worse. 

Like astronomy, the goal of geoscience is to glimpse the objective reality via our subjective observations. All we can do is collect, analyse and interpret filtered data, the sifted ghost of the reality we tried to observe. This is the best we can do. 

What do our filters look like? In the case of seismic reflection data, the filters are mostly familiar: 

  • the design determines the spatial and temporal resolution you can achieve
  • the source system and near-surface conditions determine the wavelet
  • the boundaries and interval properties of the earth filter the wavelet
  • the recording system and conditions affect the image resolution and fidelity
  • the processing flow can destroy or enhance every aspect of the data
  • the data loading process can be a filter, though it should not be
  • the display and interpretation methods control what the interpreter sees
  • the experience and insight of the interpreter decides what comes out of the entire process

Every other piece of data you touch, from wireline logs to point-count analyses, and from pressure plots to production volumes, is a filtered expression of the earth. Do you know your filters? Try making a list—it might surprise you how long it is. Then ask yourself if you can do anything about any of them, and imagine what you might see if you could. 

Hubble image is public domain. Photomicrograph from Flickr user Nagem R., licensed CC-BY-NC-SA. 

What do you mean by average?

I may need some help here. The truth is, while I can tell you what averages are, I can't rigorously explain when to use a particular one. I'll give it a shot, but if you disagree I am happy to be edificated. 

When we compute an average we are measuring the central tendency: a single quantity to represent the dataset. The trouble is, our data can have different distributions, different dimensionality, or different type (to use a computer science term): we may be dealing with lognormal distributions, or rates, or classes. To cope with this, we have different averages. 

Arithmetic mean

Everyone's friend, the plain old mean. The trouble is that it is, statistically speaking, not robust. This means that it's an estimator that is unduly affected by outliers, especially large ones. What are outliers? Data points that depart from some assumption of predictability in your data, from whatever model you have of what your data 'should' look like. Notwithstanding that your model might be wrong! Lots of distributions have important outliers. In exploration, the largest realizations in a gas prospect are critical to know about, even though they're unlikely.

Geometric mean

Like the arithmetic mean, this is one of the classical Pythagorean means. It is always equal to or smaller than the arithmetic mean. It has a simple geometric visualization: the geometric mean of a and b is the side of a square having the same area as the rectangle with sides a and b. Clearly, it is only meaningfully defined for positive numbers. When might you use it? For quantities with exponential distributions — permeability, say. And this is the only mean to use for data that have been normalized to some reference value. 

Harmonic mean

The third and final Pythagorean mean, always equal to or smaller than the geometric mean. It's sometimes (by 'sometimes' I mean 'never') called the subcontrary mean. It tends towards the smaller values in a dataset; if those small numbers are outliers, this is a bug not a feature. Use it for rates: if you drive 10 km at 60 km/hr (10 minutes), then 10 km at 120 km/hr (5 minutes), then your average speed over the 20 km is 80 km/hr, not the 90 km/hr the arithmetic mean might have led you to believe. 

Median average

The median is the central value in the sorted data. In some ways, it's the archetypal average: the middle, with 50% of values being greater and 50% being smaller. If there is an even number of data points, then its the arithmetic mean of the middle two. In a probability distribution, the median is often called the P50. In a positively skewed distribution (the most common one in petroleum geoscience), it is larger than the mode and smaller than the mean:

Mode average

The mode, or most likely, is the most frequent result in the data. We often use it for what are called nominal data: classes or names, rather than the cardinal numbers we've been discussing up to now. For example, the name Smith is not the 'average' name in the US, as such, since most people are called something else. But you might say it's the central tendency of names. One of the commonest applications of the mode is in a simple voting system: the person with the most votes wins. If you are averaging data like facies or waveform classes, say, then the mode is the only average that makes sense. 

Honourable mentions

Most geophysicists know about the root mean square, or quadratic mean, because it's a measure of magnitude independent of sign, so works on sinusoids varying around zero, for example. 

The root mean square equation

Finally, the weighted mean is worth a mention. Sometimes this one seems intuitive: if you want to average two datasets, but they have different populations, for example. If you have a mean porosity of 19% from a set of 90 samples, and another mean of 11% from a set of 10 similar samples, then it's clear you can't simply take their arithmetic average — you have to weight them first: (0.9 × 0.21) + (0.1 × 0.14) = 0.20. But other times, it's not so obvious you need the weighted sum, like when you care about the perception of the data points

Are there other averages you use? Do you see misuse and abuse of averages? Have you ever been caught out? I'm almost certain I have, but it's too late now...

There is an even longer version of this article in the wiki. I just couldn't bring myself to post it all here.