The short answer (tl;dr)
The relevant historical context to answer your question is the long quest of logic to provide foundations for all mathematics. Zermelo's axiomatic set theory displaced contenders like type theory and won the race early on, because logicians in this tradition developed metalogical tools (model theory, proof theory) to investigate axiom systems. Zermelo's contribution came around 1908 and, through the work of Fraenkel, Skolem and others in the mid 20s (before Gödel's completeness result), it quickly became the standard set theory we know today, Zermelo–Fraenkel set theory + Choice (ZFC)
That ZFC became a pure first-order theory is due to David Hilbert's early work on a subsystem of logic which he called restricted functional calculus (effectively today's first-order logic) and Thoralf Skolem, who in 1923 gave the original first-order axiomatization of Zermelo set theory. Axiomatic set theory effectively became a dominant first-order theory in the mid 30s and is first-order up to this day. The majority of set theorists like the properties of first-order logic (completeness, compactness, etc.) a lot. The fact that first-order set theory deviates from mathematical practice is actually seen as a feature, not as a bug.
The long answer
It seems trivial today, but in order to adopt first-order logic (FOL), one has to be able to isolate it from second-order logic (SOL) or higher-order logics. And this possibility was itself an important conquest. Up to Principia Mathematica, several versions of second-order logic (including infinitary logic) were commonly employed by logicians without much care. FOL and SOL were not really distinguished (distinguishable?) until the relative merits and vices of each one were investigated. Or, to state it the other way around: It was the foundationalist quest which lead to study not only the expressibility, but also the properties of various fragments and it was through this venue that FOL and SOL were discerned.
1. Two traditions in logic
It seems to me that it was Russell's discovery of the famous paradox in 1901 in Cantor's naive set theory (discovered independently by Zermelo, who communicated it to Hilbert) that started it all. Since the paradox also appeared in Frege's formalized version of naive set theory, logicians started to devise various ways of avoiding the problem and built new set theoretic approaches. The two most important proposals which fixed these (and other) paradoxes were Russell's type theoretic set theory and Zermelo's set theory.
These two solutions are commonly regarded as expressions of two different "strains" within logic, the Peano-Frege-Whitehead-Russell tradition and the (Peirce)-Schröder-Hilbert-Zermelo tradition.
2. Metalogical research
The important point is that these two traditions scored unevenly in the above mentioned task of investigating logical fragments and their properties. Of the two, the latter was in Lakatosian terms the progressive research program because it was interested from the start in metalogical questions, while the former was not.
To understand why this is the case, it helps to remember that the first tradition is commonly identified with logicism, a conception which defines the raison d'être of logic as the task of giving foundation for all of mathematics. For most logicists this implied that it was impossible to "stand outside" of logic and thereby to study it as a system (in the way that one might, for example, study the real numbers). This had severe consequences: Russell and Whitehead
- lacked any conception of a metalanguage
- explicitly denied the possibility of independence proofs for their axioms
- believed it impossible to prove that substitution is generally applicable in type theory
- insisted that the Principle of mathematical induction cannot be used to prove theorems about their system of logic
It is hard nowadays to understand what doing logic means without a language-metalanguage distinction and a syntax-semantics distinction!
3. The emergence of FOL
It is not a coincidence, I think, that it was in this metatheoretical setting, and in the Schröder-Hilbert-Zermelo tradition, that logic was put on the operating table to be dissected, so to speak. It was a direct consequence of the need to investigate metalogically the properties (soundness, completeness, compactness, consistency, categoricity etc.) of various logical fragments: In 1918 Bernays gave the first rigorous proof of completeness of such a subsystem of logic, i.e. propositional logic. Another fragment turned out to be what we now call FOL. Specifically, it was first developed in 1917 under the name restricted functional calculus by Hilbert as a subsystem of his functional calculus (effectively a ramified type theory), but was only published in the classic textbook coauthored with Ackermann Grundzüge der theoretischen Logik in 1928, where it was still treated as a subsystem.
In fact, the case was far from settled. While the original Zermelo set theory can be interpreted as being a FO theory (with the separation axiom replaced by an axiom scheme with an axiom for each FO formula), according to Zermelo's own conception it was a second-order theory (with the separation axiom as a single axiom). Zermelo remained a strong proponent of a second-order set theory. Indeed, most logicians did use different fragments of logic for different task and they did not shy to employ second-order theories or anomalous FO theories (i.e. including infinitary operations).
The fact that set theory is today a FO theory is probably due to Thoralf Skolem. In 1923 Skolem presented the original FO axiomatization of Zermelo set theory. Now, there is a standard view that sees a Skolem-Gödel "axis" as urging to adopt a FO set theory and being responsible for the ultimate establishment of FO set theory. It is, however, not clear at all that this was the case, i.e. that Skolem (and Gödel) were pushing an FO object language in set theory. While Skolem was critical of second-order set theory as a foundation of mathematics, he proved his (downward) theorem to hold in FOL. The result - in a countable model it is true that there is a uncountable set - he called a philosophical (albeit not formal) paradox in order to argue that also FOL could not serve as a foundation of mathematics:
I believed that it was so clear that axiomatization in terms of sets was not a satisfactory ultimate foundation of mathematics that mathematicians would, for the most part, not be very much concerned with it. But in recent times I have seen to my surprise that so many mathematicians think that these axioms of set theory provide the ideal foundation for mathematics; therefore it seemed to me that the time had come for a critique. (Skolem 1922)
(There is even the suspicion that Skolem's axiomatization was FO by chance…! There is some good evidence that a lot of the finest minds of their times - like Fraenkel and von Neumann - had trouble developing a real understanding of the difference between FOL and SOL in the mid 20s!).
And Gödel, although he argued for a FO *meta*language, used a variant of type theory in his famous paper in 1931!
To be sure, it is undeniable that both Skolem and Gödel contributed important theorems to help establish FOL, but they did not actually argue for it. The truth seems that there is no simple, success story with heroes to be told here. A more correct account would probably involve multiple causal factors.
The OP's statement that
the completeness of first-order logic was only proved by Kurt Gödel at a time when first-order logic had already displaced second-order logic
is however incorrect. FO axiomatizations of set theory became only dominant starting in the mid 30s. There is an hypothesis to the effect that this timeline should be correlated with Tarski's important contribution to model theory (truth, logical consequence). On this view, FOL became standard not (only) because of its intrinsic qualities, but because it was shown to have a particularly nice model theory.
4. Why (not) first-order set theory?
Nevertheless, I still wonder why […] why the inability of first-order logic to characterize infinite structure is not considered a problem.
Well, a pragmatic answer is that it is not considered a problem because of FOL's inability ;)
As it is not possible to characterize (i.e. axiomatize categorically) infinite structures in FOL, as you say, set theorists simply work with the intended model and they care about non-standard models only when they're needed. That's as good as it gets, I worry.
The more general dispute is about pondering the merits and vices of FOL and SOL. At first glance it seems that
-
FOL
+ complete, compact, nice model theory
- deviating from mathematical practice
-
SOL
+ adherent to mathematical practice
- completeness does not hold
Since nobody would dispute the merits of FOL, it comes down to the question in which way adherence to mathematical practice is really a good thing and how the loss of the merits of FOL is evaluated. From my experience with logicians
-
the supporters of FOL would deem a language without FOL's merits as too much of a loss. In addition, the don't see the deviation from mathematical practice as a vice, but as a feature. This might be a remaining of the finitary tradition in logic: Logic is required to be more strict than mathematics (and its practice) in order to serve as a foundation for it.
-
the supporters of SOL wouldn't deem the loss of completeness etc. as fatal. They see set theory not so much as a foundation of mathematics. Instead, set theory should be more a description of mathematics, i.e. the more adherent to mathematical practice, the better (=more precise) the description becomes.
-
some see a middle-way between the two by adopting another FO set theory like Morse–Kelley set theory. MK, which allows proper classes along withs sets, is syntactically almost identical to second-order ZFC, but differs quite in its semantics.
Pick your choice :)
Sources and further readings: