Société de Calcul Mathématique, SA
Robust mathematical modeling
We continue here the description of our joint Research Program with several Companies, Institutions and Universities.
A model is a set of rules, or formulas, which try to represent the behavior of a given phenomenon. For instance, if you throw an object upwards, you may wish to know how long it will take before it hits the ground and where it will fall: this will be given by mathematical formulas.
Another example is the propagation of a disease: you may wish to know how many people will be infected after a certain number of days. This is likely to depend upon a large number of parameters: type of disease, category of population, habits, temperature, and so on.
Usually, there is already a good deal of empirical knowledge around any given phenomenon: mankind was not born yesterday. So why should we build mathematical models ? There are three reasons:
Three classes appear :
These questions should be addressed in this order. Indeed, "what do we want ?" comes first, because it will decide of the whole structure of the model: should it be very precise? should it be coarse ? For the fall of an object, for instance, you will not build the same model at all if you want to predict the arrival point with a precision of 100 m or with a precision of 1 mm. In the first case, a rough gravity model will be enough, and it will take you five minutes to complete, in the second, you will likely need very precise properties of the atmosphere (pressure, temperature at various heights, speed of wind) which you will never obtain: you might spend years at it, and it won't work ; no model, at present, is capable to predict with a precision of 1 mm the fall of an object thrown, say, from 1 km away.
Then, the law comes second: once the objectives have been defined, one tries to figure out what are the parameters that interfere. It might be the wind, for a falling object, the age for a disease, and so on. How do these parameters interfere ? Under what laws ?
Finally, the data come third. In order to build and validate a model, numerical data are of course necessary, but these data should not be collected until the first two questions are answered : what objectives ? what laws ?
Indeed, if you start collecting data from scratch, without thinking of what you want to do, very likely you will always find that you do not have enough, so you will never start thinking. And you will finish with an enormous amount of useless data.
Anyone should, and indeed anyone does. When you make some simulations about your taxes, and find out you have better donate something to your children, or pretend you are taking care of your grand parents, this is genuine mathematical modeling, and you can legitimately be proud of yourself.
It's just like plumbing : you can buy a big drill and start making some big holes. But in some cases, better leave it to professionals.
When the model is built in theory, then the numerical part comes : for instance, for the propagation of a pollution, you would divide the zone into squares of 1 km each, and find how each square receives some amount of pollution, adds to it, and passes it to its neighbors. Then, the whole thing is put into a computer, which will allow some visualization : you might see a map on the screen, showing how the pollution propagates over a whole country.
These two steps : numerical implementation, computer implementation, are just as important as the initial theoretical model. They should receive exactly the same amount of attention. If one of them is poorly made, the whole process will be affected. For instance, if the numerical implementation is too coarse, it won't reveal some details, locally important, that might be required. On the other hand, if it is too thin, if for instance the zone is divided into squares of 1 m instead of 1 km, the computer will take hours, for a result which will not be more precise, if the laws do not permit this precision (what do we know about contamination ?).
So the whole procedure is an art : the art of mathematical modeling, as Don Knuth said about "the art of computer programming". It is far from being a science.
If one wants a mathematical model to be effective, one cannot afford to be lousy on any of its aspects. Let's say it very clearly : Nature is always very complicated, and even if we do our best all the way through, being very careful at each place, we hardly succeed in producing anything satisfactory. Let's always remember to be modest.
There is a natural tendency to build precise models, which can end up as theorems. A theorem is a well-proven edifice: If the assumptions are exactly this, I can prove that the outcome will be exactly that. For instance, if I can prove that the solutions to a problem, such as the propagation of a blast, tend to zero at infinity, I will not have to worry about getting a protection if I am far enough: this is satisfactory, intellectually speaking. But what were the assumptions and what do I mean by "far enough" ?
In real life, the three requirements we mentioned earlier are never correctly fulfilled: the objectives are unclear or contradictory, the laws are unknown, the data are missing or corrupted.
A robust mathematical model (in short RMM) is, by definition, a model which takes these uncertainties into account. It will work, it will give something, even if the objectives are unclear, even if the laws are uncertain, even if the data are corrupted.
You might think of garbage collecting, and see it as a variation of the "traveling salesman" problem: find the shortest path through all houses in a city. So you would try to locate precisely each container (perhaps using a GPS), each road which is closed for repair (which requires real-time information), find the present position of each truck, and then you would launch some gigantic algorithm, in order to find the shortest path, or perhaps the quickest : this would take hours. And this would be totally useless, because the true problem of the companies which do garbage collecting is the total cost of ownership of the trucks over a year.
You might want to pack oranges in a clever way: a well-known problem to the National Science Foundation in the US ! Then you would measure precisely each orange (they are not spherical and they are not equal), and you would launch some gigantic algorithm, which would tell you, within 12 hours of computation, that indeed you can put 68 oranges in a box where some illiterate immigrant puts 67 in 10 seconds. That's great mathematics.
No, precisely, this is not what robust mathematical models will do. A robust mathematical model will tell you in seconds :
They may, for more refined models, give probabilities. This is more refined than just an interval. An interval is built with extreme values, but you may be happy to know, for instance, that in 95 % of the cases, a smaller interval will suffice : this is the probabilistic approach.
To continue the description of our program, please click here