Combining multiple theories with 5 $\sigma$ confidence level

Sadly I am not a physicist but I am interested in the topic. Please have mercy with me if you find my question trivial or dumb. Here it comes:

As far I understand physicist express their certainty about something with multiples of "sigma". The Higgs Boson exists with a confidence level of 5 sigma. I tried to understand what 5 $\sigma$ means. Here is what I think it means:

If you have a theory and you try to prove it you need an experiment. You then repeat the experiment over and over again and measure what you see. If your measurements tell you that what you observed is what you expected with a probability of 99.9999426697% (5 $\sigma$) then you can declare your theory a correct and you have a discovery. It that correct so far?

In science you are almost always using the work of others. What if your 5 sigma theory depends on a lot of other 5 $\sigma$ theories? Won't this reduce the probability of correctness? Let's say the "Higgs Boson Theory" depends on 10000 other 5 $\sigma$ theories. Wouldn't that reduce the probability of correctness by a factor of 10000? Isn't that a problem? Wouldn't it be better to vary the needed confidence level depending on how many different theories/discoveries your own theory depends on?

This is not what sigma means, sigma means the opposite. It's a measure of how likely it is that the known background stuff produced the signal that you see by random chance.

1-sigma means that the probability that the event is due to random chance is 33%, 2 sigma means 5%, 3 sigma means .3%, 4 sigma means .006% and 5 sigma means .00005%, so less than 1 in a million chance that it was a fluke.

To understand this, suppose you are looking at a blue chair in a white room, but the room is dim, so you only see a few photons. The white floor sends out all colors of photons, while the chair only sends out blue. The chance that the floor sends out a blue photon is 10%. You see a blue photon, then another, then another, all coming from a spot in the room. You then see a red photon coming from another spot and a blue photon from another spot.

You reach 5 sigma confidence when you see about 10 blue photons from the same chair-shaped region in the room, and no red photons. Supposing it is categorically impossible for the chair to emit non-blue photons, you reach 5 sigma certainty that a chair does not occupy a spot when you see one non-blue photon coming from that spot (actually, infinite sigma certainty).

Why do I say you need 10 blue photons and not just 6? Six photons is 1 in a million after all, isn't it? It's because of selection bias--- there might be 10,000 different qualitative places where the chair can be, or there might be no chair, and you are selecting through the photons, looking for clumps of blue. If you look at 10,000 clumps of 4 photons, you will find 4 blues just by random chance. So these 4 photons don't count. The next six give you the 5 sigma certainty.

The reason scientists set the bar for a discovery at 5 sigma is because experimental groups sift through mountains of data, and they are looking for interesting deviations. Just by random chance, like finding 4 blue photons from a place where there is no chair, they find things that are superficially 2 or 3 sigma significant all the time, and these things are nothing at all. These things are not really 3 sigma significant if the probability were accounted for correctly, because of all the searching and sifting that went on to find them.

If experimentalists could easily quantify human tendency to find patterns, we could probably get by with a "real honest-to-goodness 3-sigma" confidence level. But given that there are so many possible patterns: you didn't see the blue photons but you saw 5 red photons--- maybe the chair is painted red! You saw red and yellow photons--- maybe the paint is orange! You have a lot of theory freedom. So you easily can get 3 sigma evidence for some theory you make up to fit the data. So if this is your theory, you need to wait for 5 more red photons before declaring the result, just to be sure you really see a red chair.

The way it works in practice is that you need to make a prediction based on 3 sigma data that is then confirmed by independent 3 sigma data, and then you are confident. But it always helps to replicate the result. The 5 sigma threshhold is pretty safe--- people have been burned by 4 sigma and 3 sigma events many times, but not so much by 5 sigma.

This sort of thing is very useful for the general public. If there is a catastrophic isolated event, like a financial meltdown, a terrorist act, or a string of crimes, it is good to get a sense of what the sigma is on the event before changing policy drastically. It could have been an unlucky coincidence.