Learning, Meta-Learning, and the Foundation of Logic

Mar 8, 2024 - 22 minute read
Philosophy
English - logic - empiricism - meta-thinking - learning - machine learning

In my previous discussion on the hard problem of consciousness, I mentioned that to handle the hard problem, we should not really try to solve the problem. Instead, we ascend to a meta-level of thinking to see that the way of thinking about consciousness which leads to the hard problem is not a fruitful way of thinking. At the end of another of my article on logic, I also hinted at the possibility of justifying logic itself from experience, which can be thought of as a kind of meta-learning. Nevertheless, I have not thoroughly explained the idea of meta-thinking or meta-learning. Therefore, in this article, I will dive deeper into the meaning of meta-thinking and explain how logic can be justified from experience pragmatically.

What is meta?

A visual representation of meta thinking.

Although the word “meta” does not have a single meaning across all uses, we can find many use cases where it has a similar meaning. Meta-thinking, simply stated, means thinking about thinking. Meta-cognition, similarly, means the cognition of one’s cognitive process. Meta-languages are languages used to discuss another language. In meta-mathematics, one studies mathematics itself with mathematics. Common to these cases is that by going “meta”, we are often leaving our usual level of thinking and doing, so the “ways” of thinking and doing now become the discussed subject.

Going up one level of reasoning can be very helpful in some cases. Some nice examples are Gödel’s incompleteness theorems. The study of pure mathematics, instead of calculation, usually focuses on proving theorems, the art of deriving statements from statements. Thus, it is just a matter of time before mathematicians start to ask questions like “Can we prove all the true statements?” or “Does a certain set of axioms lead to contradiction?” This is when meta-mathematics come into play. By going up one level, the structure formed by statements and their derivations now becomes a mathematical structure (a formal system) that can be studied. Gödel’s incompleteness theorems are thus theorems in a meta-theory that describe properties of the studied formal systems.

.  .  .

What does “meta” have to do with justifying logic itself through experience? I would like to start by examining a famous problem in philosophy, infinite regression, to state the kind of problems I want to solve. Then, I will point out some related phenomena happening in the field of machine learning, as concrete examples of meta-learning, and show how this idea can be applied to logic.

What the Tortoise Said to Achilles

In an article titled “What the Tortoise Said to Achilles,” the author Lewis Carroll demonstrated that the inference rule modus ponens can lead to an infinite regression. The argument is presented as a dialogue between Achilles and the Tortoise, in which Achilles is trying to convince the Tortoise that a segment of the proof for Euclid’s first proposition is valid. If you are not familiar with the first proposition and its proof, the following video would help:

The discussion between Achilles and the Tortoise focuses on the argument presented in timestamp 1:33-1:49 of the video, which can be expressed as follows:

(A) If two things are equal to the same thing, then they are equal to each other.
(B) The two sides of this triangle are equal to the same side.
(Z) The two sides of this triangle are equal to each other.

Achilles, using his intuition similarly to everyone, claims that it should be obvious that Z follows from A and B. However, the Tortoise now acts as a skeptic who is not convinced that accepting A and B should lead to acceptance of Z. The Tortoise then requests Achilles to change his mind. Achilles’s solution to this is to require the Tortoise to grant him this additional proposition in addition to A and B:

(C) If A and B are true, Z must be true.

Achilles thinks that if the Tortoise accepts A, B, and C, then Z should follow logically from them. However, the Tortoise argued that it is another hypothetical to derive Z from A, B, and C, and thus another proposition must be granted:

(D) If A, B and C are true, Z must be true.

Now the reader should be able to see where this is going. The list goes on indefinitely, adding one proposition after another, while the Tortoise remains skeptical of whether the accepted propositions derive Z. This infinite regression roots in the Tortoise’s rejection of the inference rule modus ponens:

$$ \frac{P, P\to Q}{\therefore Q} $$

If we take modus ponens for granted (and also that the sides of a triangle are “things”), then accepting A and B would allow us to conclude that Z is true. The core problem to be solved is then that if someone were to reject modus ponens, what reasons or justifications can we use to convince the person otherwise?

More generally, we can ask, if someone were to reject some rules of inferences, specific kinds of formal logic, or any framework of thinking, how can we justify these devices that form the foundations of our thinking?

I think the reason why Achilles failed to convince the Tortoise is that he effectively did nothing more than re-stressing modus ponens. However, I think justifying a way of thinking requires starting from something more fundamental. We have to depart from discussions of pure abstraction and focus on the role logic plays in the full, rich reality.

To answer the abovementioned question, I would like to start with some observations from machine learning (ML) and artificial intelligence (AI). I think machine learning and artificial intelligence researches are great resources for philosophers due to their close relation to traditional philosophical fields like Epistemology and Ontology. Moreover, in AI research, researchers do not simply talk about ideas but put them to work. The designed agents do not simply operate on a trivial example like deriving “therefore, Socrates is mortal” through a simple syllogism, but go through the full process of perceiving, reasoning, and acting to achieve a practical goal. Observing this process helps us escape discussions of pure abstraction and instead see how logic (or other ideas) can be practically applied. By distancing ourselves from the thing that is thinking (the robot agent), we are also naturally brought to an easier position for meta-thinking.

In the next section, I will first introduce some ideas from ML and AI. Then, I will generalize them to give us an answer to the aforementioned question.

Many Levels of Learning

A modern, typical machine learning procedure goes as follows:

  1. Get a dataset. Usually, a dataset would contain many pieces of information, between which there are relations we are interested to learn. Our goal is then to learn this relation based on the examples presented. For example, in supervised learning, we are given many pairs of examples $(x_i,y_i)$, and we are trying to learn the relation between the input variables $x$ and the target variable $y$.
  2. Choose a model. Such models usually have a fixed form (such as $y(x) = w^Tx$ in the simplest case) with parameters to be learned based on the dataset (which would be $w$ in $y(x)$). Particularly, if deep learning is used, this would correspond to choosing an architecture for the neural network.
  3. A training algorithm is applied to the model and dataset to choose the parameters best for the dataset, producing a model that can be used for inferences (i.e., making predictions of $y$ from $x$).

The “learning” part is mostly in the third step, where the training algorithm repeatedly evaluates the current model on the target task, and the parameters are updated to gradually improve the performance on the target task. Abstractly and roughly speaking, the word “learning” here means an iterative process of acting or observing, during which information (or say, experience) about the target task is accumulated to allow improvement in performance. The chosen architecture, together with the training algorithm, can then be seen as an epistemological framework, within which the machine is working to learn the given task.

Nevertheless, just as one may not be sure whether we are justified in applying an inference rule or working according to some formal logic in our reasoning process, we can not be sure that the chosen architecture or training algorithms give the best result for the chosen task. In these architectures or training algorithms, there are usually “hyperparameters” that are chosen by hand and fixed throughout the training process. Changes to these hyperparameters may change the architecture that is put into the training algorithm or change the behavior of the training algorithm. These hyperparameters make whatever version of the algorithm we are using merely a member of a family of algorithms, which is further just a family out of many families of algorithms. What then justifies using a specific algorithm with a specific hyperparameter?

In the early days, these algorithms or hyperparameters were chosen based on the practitioner’s experience. The most naive approach would be to blindly try many algorithms and many hyperparameters and select the best one. Smarter approaches may involve (intuitively or explicitly) modeling the relation between a hyperparameter and the target performance through some experiments so that the performance can be optimized according to this model. In some cases, it can also be assisted by some theoretical discussion on the properties of these algorithms and hyperparameters. Overall, the practitioner is often performing a “learning” process, in which the practitioner gradually gains information about the target task so that better architectures or algorithms can be selected. This learning is a form of meta-learning since the practitioner is learning about which ways of learning (architectures and algorithms) are better. Such a process of meta-learning not only can be implemented by a single person but also can be implemented by a community. The academic system, consisting of people attending conferences and publishing articles, implements a computing machine where algorithms for learning are compared and selected. The popular algorithms that survived, having proved their usefulness in terms of performance, speed, simplicity, interpretability, or easiness of access, are the results of this giant optimizing machine. The popular default choices of hyperparameters for many algorithms can also be seen as a result of meta-learning. They are settings that have been found to work for many tasks tried by the machine learning community. The above describes one of the ways humans perform meta-learning. As for meta-learning discussed in the context of machine learning, it usually concerns techniques like grid search, neural architecture search, Bayesian optimization, etc. These algorithms are formalized and automated versions of those meta-learning originally performed by humans.

The first takeaway is that, generally, we can say any algorithm that performs learning, is itself learned. This includes the algorithm that performs meta-learning. Therefore, in addition to algorithms that learn how to learn, we can also have algorithms that learn how to learn how to learn, and so on. In other words, the adoption of algorithms is justified by its previous success in performing a task. They are empirical and pragmatic decisions.

Another takeaway is that to say that one algorithm is better than another, these algorithms must be evaluated and compared relative to a goal, utility or performance measure. All levels of learning can proceed with respect to this utility. Of course, one is free to adopt any framework for their learning. However, it should be clear that some frameworks would fail or perform worse with respect to a chosen goal. Therefore, as long as a goal or utility is chosen, the choice of algorithms or frameworks would not be arbitrary.

.  .  .

Meta-Learning for Reasoning

If the frameworks for learning are themselves learned, why couldn’t frameworks of reasoning like logic also be learned (and thus justified empirically and pragmatically)? That is, why not look at it as a tool that is gradually sharpened through a process of applying it, seeing how well it performed, and updating based on its performance?

Prior to the deep learning era, researchers of AI have tried using various frameworks to build agents that can reason. These frameworks include propositional logic, first-order logic, fuzzy logic or probabilistic reasoning. An agent’s adopting any of these for reasoning can also be thought of as having a particular ontological and epistemological commitment [1] in fancy philosophical terms.

Of course, these frameworks for thinking are usually not one-size-fits-all. Each of these frameworks has its features and limits. Among the previously mentioned frameworks, propositional logic is the simplest. In propositional logic, we only talk about propositions (such as “the weather is rainy”) and compound propositions formed by propositions and logical connectives (e.g., logical AND and OR). The propositions are assumed to take on only values of true or false. Due to its simplicity, an obvious limit of such a framework is its lack of quantifiers (i.e., $\forall, \exists$), which prevent it from expressing general patterns (such as physical laws that apply for all time $t$) concisely. First-order logic builds on the idea of objects and relations instead of just propositions, allowing modeling of the world in more detail. It also allows the use of quantifiers, which solves the aforementioned issue of propositional logic. However, these two frameworks still do not have direct support for reasoning under uncertainty, which is where probabilistic reasoning comes into play. Just as algorithms for learning are themselves learned, researchers here are also performing a kind of meta-learning for selecting frameworks of thinking for these agents. Different frameworks for thinking are tested in different scenarios and selected for their effectiveness or efficiency. Although the meta-learning presented here seems to be limited to selecting between well-developed frameworks like propositional logic or first-order logic, in principle, we can also have meta-learning that learns and develops the forms of logic. For example, I can picture an evolutionary algorithm consisting of multiple agents implementing different inference rules, and the algorithm selects them based on their success in certain tasks. As AI progresses, I believe we can see better proof of my claim when we get past “language modeling” and get to agents that can naturally invent new language to achieve their goals creatively.

Current AIs (as far as I know) cannot learn logic by themselves. Instead, they learn logic through modeling our language (which contains logical sentences) or by researchers directly adopting certain formal logic in designing an agent. In other words, the knowledge for these useful forms of thinking is first developed by humans and then transferred to machines. Therefore, if we wish to trace the origin of logical thinking, we still have to discuss how humans could have learned to reason logically.

Innate or Learned?

To discuss how logical thinking (or any thinking framework) started in humans, it seems impossible to avoid the question of learned v.s. innate. Is logical thinking learned after birth, or is it directly built into newborns? Some philosophers seem to love these questions. However, I think these questions are not really that interesting.

The reason is that even if something is innate, it is still “learned”, just in a different way. Whatever is innate, is passed down through our genes. Whatever “knowledge” or structure the gene is encoding, it is a product of evolution, which is a learning algorithm that has operated for billions of years. The subject of learning here is not a person, but rather the gene pool that includes the person as a temporary trial. Such learning is done by the gene pool keeps adding mutated members (trial) at each generation and removing members that died off (error). In this repeated trial and error, the gene pool gradually gains information about the environment, and meanwhile, the distribution of the gene pool is iteratively updated toward solutions suitable for surviving in the current environment. It is not just humans that perform learning, but any system that possesses this adaptive and optimizing character. For each of us, learning does not “start” after birth but simply “continues” from what has been (meta-)learned through evolution. Once one understands this general idea of learning, the learned v.s. innate debate instantly dissolves. It seems no more interesting than the question “When does the river end and the sea begin?”

Reasonably, we should not expect the origin of logical thinking to be purely innate or purely learned, as if these two are mutually exclusive. Instead, the latter continues from the former. Their effects combine to give rise to logical thinking. This is similar to the previous case in machine learning. The “learning” does not begin during the usual “training” stage. Instead, many meta-learning processes have already started when the researchers are choosing the algorithms or architectures.

A Speculative History of Human Thinking

Tracing the origin of logical thinking is not trivial work. As concluded in the previous section, we should look at both the innate (i.e., results learned prior to one’s birth) and learned factors. The former corresponds to the bodily structures and mechanisms that we inherit from our genes, and the latter corresponds to the knowledge we learn after birth. Studying both of these factors, again, is highly non-trivial. Even after all the efforts of neuroscientists, our understanding of how our brain (which has an innate structure and mechanism that allows learning) performs all the amazing functions is still limited in its detail. Also, how early people developed certain concepts, passed them down, and refined them through collaborative work (i.e., the learned part) is largely a lost history. Due to these difficulties, the following discussion necessarily involves many speculations of my own.

First, let us examine the innate components that might have contributed to the eventual appearance of logical thinking. From our highly evolved genes, we inherit bodily structures proven to be useful for survival. What makes us so good at survival, partially is due to our advanced ability to model and thus understand the world around us, which allows us to predict the future and make decisions. My theory is that initially logic is primarily used to describe and model the world. Thus, innate abilities that aid us in building world models also count as contributing to the appearance of logic. These innate abilities include (but are not limited to): the perceptual devices, the initial macrostructure of the brain, and neural wiring rules (i.e., how our brain reconnects itself in response to experiences) that implement learning. I now briefly describe each of these abilities.

The first and foremost of such innate abilities is our perceptual devices. To learn anything about the world that does not come pre-installed, we must have perceptual devices that allow states of the world to be perceived and represented in us, such that important features are kept and high-level features are constructed from raw sensory data. Second, for the sensory data to be effectively used. The signals flow through regions of our brain, which has its macrostructure determined by innate developmental programs. The organization of our brain regions allows information coming from different senses to be integrated, allowing a rich and unified representation of the world. Third, for us to have memories of the world, and to learn how the world works, our brain must change in response to the outer stimuli. These neuron-modifying rules, allow us to understand how events in the world are associated with each other through repeated exposure. For example, we can predict that a cup falling to the ground may break, or that dragging a chair on the ground produces noises after experiencing similar events.

Using the abilities mentioned in the previous paragraph, we can gradually build world models implicitly (non-linguistically). The neural connections implementing these models guide our thoughts, leading us from one idea to another. It operates mostly at an unconscious level, neither spoken nor written down.

Before formal logic can appear, language has to first come into being. The first set of words spoken probably consists of words that are needed the most by our ancestors. It might include short and simple imperatives that have meanings similar to “give”, “go”, or “come”. Besides, simple declarative sentences that convey the state of the world or the existence of objects, such as “lion” or “water”, may also have been used. The initial utterances may have little or no grammar. The order of the words spoken may correspond to the order in which the events occurred, or, perhaps, simply unordered.

Such a lack of grammar limits the language’s ability to express complex relations reliably. Think about the idea expressed in the sentence “I cut Bob with a knife”. Without grammar, the sentence might have been expressed as something like “I cut Bob knife”. It is unclear how I, Bob, and the knife participated in the action of cutting. Did I cut Bob with a knife, or did I cut some stones to make Bob a knife, or did Bob cut me with a knife? Without grammar, there is just too much space for misunderstanding. Modern English solves this problem by using several rules: The person performing the cut is before the verb. The thing being cut is after the verb. Lastly, the tool used for cutting is indicated with a preposition “with”. This grammar rule utilizes both the positions of words and special markers (prepositions) to express clearly how different elements participate in an action.

The basic elements of formal logic, such as “and”, “or”, and “if…, then” probably arise from a process similar to the invention of grammar rules. They are tools invented such that the states and dynamics of the world can be described more precisely, and that complex requests and inquiries can be conveyed. The invention of increasingly complex linguistic tools should naturally follow humans' developing a richer understanding of the world. The concept of “and” is probably created to express that several things coexist, or to request several things together. On the other hand, “or” is perhaps created to present options to another person to choose, to express multiple possibilities of outcomes, or even possible states of the past when performing inferences. As for “if…, then”, it arises probably when people have an intuitive understanding of the dynamics of the world and can make predictions about what will happen. It may also be used to tell others what to do in hypothetical situations.

I apologize for laying down so many speculations seemingly out of nowhere above. However, the central idea, which I believe for certain, is that, initially, linguistic constructs are built from a lot of experiences and serve (and so are justified by) pragmatic purposes, which include coordinating behaviors between people and describing in detail the states of the world.

With these informal logics built into people (through a lot of learning) as intuitions and the corresponding linguistic tools in place, the next step is then to formalize logic. I think the formalization of logic involves another layer of learning, in which the intuitions are examined carefully and included in the formal rules if they give “correct results”. By “intuition”, I mean how the mind, out of its habit derived from experience, jumps to its conclusion in a certain way. And by “correct results”, I mean results that your mind can find no counterexamples.

The process mentioned in the last paragraph can often be performed without the philosopher collecting more empirical evidence from the external world, and thus it is often deemed as independent of experience. However, we should not forget that the intuition is ultimately built from experience. Thus, the eventual emergence of formal logic is certainly not independent of previous experiences. Furthermore, as I have mentioned in the section “Meta-Learning for Reasoning”, even after a formal logic has been developed, it continues to be inseparable from experience. Whenever a formal logic is applied, it is examined for its success in solving the problem at hand. The shortcomings of a formal logic, or more generally a thinking framework, may be discovered, and improvements can be made. A new system may also be proposed to compete with it. It is in such a learning process that logic continues to be shaped by experience and thus is not separated from it.

So, What Justifies Logic?

In order to justify something, there must be some assumptions, basic values or utility that I accept, so that any argument can start from there. For the justification of logic, such value would be to derive conclusions that stay true to reality and to regulate our thought such that consistency remains even when dealing with abstract concepts not directly related to reality. The systems of logic, or thinking frameworks generally, that we are using today have usually proven their ability to provide the mentioned values, since they grew out of, and are continually developed with experiences. For anyone feeling the need to justify logic, if you accept these basic values that I mentioned above, then I hope this lengthy article, which attempts to explain how experiences develop systems of logic with respect to these values, has served as a somewhat satisfying justification. For the rest of the people who do not simply accept the mentioned values, I would have no way to provide a justification, and honestly, I do not feel the need to do so.

Conclusions

In many of the discussions on logic I have read, either the rules of logic are written down without explanation, or that, even when explanations are given, they claim logic to be innate or simply an intuition. Thus, the rules governing our everyday and serious thoughts, are often treated as independent of experiences, or at least, it is often unclear how it can relate to experience.

This article is my attempt at connecting such ideas as logic with experience, thus providing a justification for using logic. The crucial tool for this task is a general idea of what learning is, which I introduced in the section “Many Levels of Learning”. To reiterate briefly, there are several ways that logical thinking is related to experience and learning. First, the innate capabilities that we are born with, which allow learning while we are alive, can be seen as a product of a learning process operating at an evolutionary scale rather than an individual level. Second, the rich experiences that we collect in our lives allow models of the world to be learned. Such world models are part of what we call intuitions, which direct our minds from one idea to another at a subconscious level. Formal logic is then a tool invented by selecting, refining, and packaging a set of these intuitions. Such a tool, even after its invention, will stay connected to experience and learning, for tools are always being put to use and improved if the application is not satisfactory. Tools are also compared with other tools, and the worse ones gradually fade out of sight. These two processes are, again, instances of learning.

Another thing I have done in the way, using the general idea of learning, is to unify innate and learned abilities. Now, we see that what is innate is also learned, just at a different scale. They are simply different levels of adaptivity that contribute to our success. Innate abilities need not have a special status. Rather, they are just prior knowledge we inherited. Even though some of these pieces of prior knowledge may be so good that they do not practically require further improvement. We should nevertheless maintain an open mindset that these pieces of prior knowledge, just like any knowledge acquired after birth, are always subject to new information and can be changed.

Overall, I hope to have delivered a satisfying justification for logic, or even further, an account of knowledge fully based on empiricism.

.  .  .

References
  1. Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd. ed.). Prentice Hall Press, USA.
Share this article: