In a recent judgement the English Court of Appeal has not only rejected the Sherlock Holmes doctrine shown above, but also denied that probability can be used as an expression of uncertainty for events that have either happened or not.
The court is probably doing something useful. While Bayesian methods are the foundation of thinking and reasoning about the world, we do these types of inferences intuitively. Paradoxically, because our intuition for this is so developed, the mathematically untrained person can do it better without any math than with!
So for example, suppose I see find on a murder scene a red cap, a Camel cigarette, a shoe-print of size 14, and a blurry picture which shows the guy is 6'4. I can say in court, as a prosecuter: 4% of people smoke Camels. Only 8% of men have size 14 shoes, only .1% own a red cap, and only .1% of men are 6'4! Multiply, and you see the probability is 1 in a million. This is your guy!
It's very hard for a defense attorney to argue with this, even though we all know in our gut that this crap, that the evidence above is ridiculously weak. It is hard to explain why it is ridiculously weak without a long explanation of the selection process, the biases for keeping certain evidence and not others, and the ability of police to attach irrelevant stuff to the event that match, after finding the suspect. All these things render the 1 in a million more like 1 in 2, or 1 in 3.
Here is you see the main problem with Baysian reasoning in a courtroom (rather than a science laboratory), the method is being applied with a political end, and so it is not done honestly. The right way is to say how you selected the evidence you are presenting, and how you found your suspect. If you found your suspect from a million follks, by matching these traits, you have no evidence at all. This is exactly what a jury's intuition is from seeing this ridiculous evidence: you don't know anything. But using Baysian multiplication (inappropriately) a lawyer can try to persuade the jury that the evidence is much much better, since only about 5 pieces of evidence each with 10% prevalence and which match to the suspect are required for scientific certainty, or certainty beyond reasonable doubt. Yet precisely these kinds of vague-evidence are the easiest to spuriously attach to a case.
So the use of Bayesian probability in courtrooms is almost always a way of lying with statistics, a way of making weak evidence seem stronger by multiplying likelihoods inappropriately. It's another version of the conspiracy theorist's "What are the chances of THAT??" Often the chances are very good, because THAT is very fungible, it could be a billion coincidences.
The fact is that these probabilities are very hard to estimate by folks unskilled at statistics, and the manipulation of the statistics by attorneys is easy to do and hard to counter.
So I would argue that if you have good evidence, present it in such a way that the evidence looks good intuitively as well. I agree with this judge's judgement, or rather, I defer to their experience.
I agree with your point that probability can be used to mislead nonexperts and should be avoided when possible in court, but that isn't the reason the justices give. From one of the quotes in the linked article,
"The chances of something happening in the future may be expressed in terms of percentage. Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. But you cannot properly say that there is a 25 per cent chance that something has happened: ... Either it has or it has not."
Many, if not most, statisticians would consider that statement incorrect.
The 1 in a million figure from your example actually looks more like a p value than a posterior probability of guilt. If you look at the evidence as a Bayesian with a uniform prior on the population of potential suspects, you just end up narrowing that population down to the subset that are 6'4", wear red caps, smoke camels and wear shoe size 14. Which is a reasonable thing to do, if you've previously proven that the perpetrator left all those pieces of evidence. If you know the perpetrator lives in California, and there are say 30 6'4", shoe-size-14, red-cap-wearing, camel-smoking people in the state, then the posterior probability that this is your guy is only 1/30 even though the p-value "the probability that a randomly-selected Californian would have left all these things" is really small.
While I agree with your main point, I'm not convinced that the court is inadvertently doing something helpful because some types of evidence like DNA fingerprints just can't be presented without using statistics. And, as the example above shows, non-Bayesian measures of evidence like p-values can be even harder to understand than posterior probabilities.
The court opinion does say what I said about misleading with statistics (although I read it only after writing the answer, just to verify what I said). Their analysis is that it is just wrong to use Baysian reasoning to conclude "We have eliminated all other causes for the fire, therefore it was a cigarette", in order to give liability to the person who ostensibly smoked the cigarette--- you need positive evidence it was a cigarette! Their complaint is not with the principles of Baysian reasoning, but about the un-careful use of this reasoning in the real world, where you have to make determinations of fault.
In the case they examined, the "careless cigarette" assumption was just unwarranted, and they correctly rejected it, even though it had a bogus Baysian argument making it seem sure.
In cases such as fingerprint analysis, the court doesn't need to judge the Baysian reasoning, the expert is called to do this for them, and then I am sure it is done properly and scientifically (at least, it's supposed to be possible to call impartial experts to evaluate the statistics in court testimony, and then it's not lawyers making up figures, but technical people arguing over how good the evidence is, and this is somewhat more trustworthy). So I don't think the court is making any statement at all about how to do Baysian reasoning in fingerprint identification, or DNA identification, they are just barring the type of heuristic Baysianism that is just used so often to mislead and misdirect.