If you don’t know what ChatGPT is, great! Suffice it to say it’s a kind of chat bot driven by artificial intelligence. You give it some text–for instance, a chat message–and it gives you back some text that’s meant to be a reasonable response to or elaboration of what you said to it.

There is a lot to say on this topic of whether computer software can think or write. To begin with, I voted “no” on the survey and was happy to see that so far 81% of respondents did the same. Here is the rationale I posted:

- ChatGPT does not have a conception of what is going on in the world. It is a word-emitter that tricks human minds into thinking it does. In other words, it’s a kind of complex automaton, a marionette. The fact that the action of it is complex enough to fool us into thinking it “knows” something does not mean it does
- ChatGPT is as likely to emit false information as true information (perhaps more so; has this been assessed?)
- ChatGPT does not have deductive or inductive logical reasoning capabilities; nor does it have any “drive” to follow these principles
- Human papers are for human writers to communicate to human readers. It seems to me that the only argument in favor of including ChatGPT in this process is a misguided drive to speed up the process even more than publish-or-perish has. In fact it should be
slowed downand mademore careful- The present interest in ChatGPT is almost entirely driven by investor-fueled hype. It’s where investors are running after the collapse of cryptocurrency/web3. There is a nice interview with Timnit Gebru on the Tech Won’t Save Us podcast, titled “Don’t Fall for the AI Hype” that goes into this if you’re curious. As computer scientists, we should not be chasing trends like this.

I followed the list above with a second comment when one of the respondents commented that this is [[just how things are now::https://www.nature.com/articles/d41586-023-00107-z]]:

The state of the world is what we make it. As computer scientists, we have a responsibility to hold the line against this maniacal hype. It’s insane, in an almost literal sense, for us to follow the herd on this particular issue. We know better. We know all about just-so stories. We know about mechanical Turks. We’ve been through half a dozen or more AI hype cycles where the latest thing, whether it be expert systems or case-based reasoning or cybernetics or the subsumption architecture or neural networks or Bayesian inference or or or….was going to replace the human mind. None of them have done that, to a one, and ChatGPT won’t either.

Briefly my take is this: as computer scientists in or near the field of “artificial intelligence”, we have been through this before, where some new discovery was poised to change how we think about everything. We know better, or at least we should know better. Are we really going to voluntarily do this again:

[[Who is Lucy and who is Charlie Brown in this analogy? And what is the football?::rmn]]

when we can–and I think should–opt out?

Putting aside the question of resisting hype, and putting aside that the hype is almost surely being consciously generated by rich and powerful actors who have their own reasons for forcing this conversation to happen, it's worth asking on a purely technical level what something like ChatGPT actually *is* capable of doing.

[[And if these systems can think and feel why are we abusing them to write marketing text or copyedit without pay?::lmn]] Without diving into the technical weeds, which I think obscures the forest, I’d offer what I think is a better, and older, and perhaps clearer question: is it possible for a text-to-text map derived solely from existing textual data to *think*, *feel*, *create*, or *write*?

[[These sorts of questions are manifestations of the sorites paradox and probably have no fully satisfying resolution::rmn]] If so, how do we explain the transition from a noisy lookup table–because that’s surely what these models are early in their training–to something that can *think*? Do we really believe the ability to *think* somehow emerges from massive data sets? When, exactly, does that transition arise? How much data is needed before the ability to *think* “pops”? I think we can say that at the very least these questions have barely been asked, let alone answered, and it’s premature to assert the answer before exploring the question. [[These arguments are old and not really resolved::https://plato.stanford.edu/entries/chinese-room/]].

If not–if we don’t believe ChatGPT can do anything like *think*–then we have to confess ChatGPT is only a complicated machine. Which is fine for what it is. There are lots of complicated machines that do wonderful things for us and ChatGPT surely has its uses too. But co-authoring computer science papers is not one of those uses because it is simply incapable of playing that role, and we should stop pretending it can by entertaining these questions.

I will be teaching the same seminar again this coming summer. I’ll probably change it around a bit just for fun, but the format seemed to work pretty well and I don’t want to break anything. Here is the blurb if you’re interested:

]]>"Chaos", as a mathematical term, refers to a phenomenon in highly-interconnected systems whereby they become sensitive to small perturbations in their conditions. Sometimes called "the butterfly effect", this sensitivity can make such systems difficult to understand and predict; even the act of observing them can change their future course significantly. We live our lives literally surrounded by chaotic systems, most notably the weather and climate systems. We are also increasingly turning our human-made systems, like the internet, the electric grid, and the financial system, into complex and potentially chaotic systems. In this seminar, we will explore chaos theory as a field of study, and we will explore chaotic systems in practice using a variety of computational tools, with a view towards developing a deeper appreciation of our complex world.

Several of the students pointed out they liked the simulators I collected for the class, so I thought I would share some here. They should all run in your web browser.

- Recursive drawing tool
- Evolving 2d cars
- Traffic simulator
- Birds flocking (boids) NetLogo simulation
- Ants gathering food NetLogo simulation
- Termites gathering wood chips NetLogo simulation
- Hotelling's law NetLogo simulation

A few additional sites that we did not use in the class but are similar:

- Three layer feedforward network simulator
- An IFS renderer
- Percolation example on NetLogo Web
- A Boids simulator
- Another Boids simulator
- An L-system simulator
- A reaction-diffusion model simulator

My personal favorite is the recursive drawing tool. I managed to get something reminiscent of a Hilbert curve:

…and almost managed a Sierpinski triangle:

Toby Schachman, who built this as his thesis project, has a lot of fascinating work on his web site. Recursive drawing seems to have been inspired by Bret Victor’s work e.g. on Drawing Dynamic Visualizations.

]]>Visit `https://yourdomain/yourpresentation/?print-pdf`

in a web browser. In other words, go to the URL you would use to view your Reveal.js slides, add a trailing / if there isn’t one, and put `?print-pdf`

at the end.

From there, print the file to PDF in whichever way you normally do that. I use `Ctrl-P`

, then select “Print to PDF” from the pulldown, then hit OK.

In the end you should have a PDF file that looks more or less like your slides. There will probably be formatting issues. Don’t worry about those yet.

Next, open your slides in LibreOffice. LibreOffice should convert the PDF file to a LibreOffice Draw file. File→Save As this file with the usual `.odg`

extension.

Literally change the name of the file you saved in the previous step from `yourfile.odg`

to `yourfile.odp`

Open the resulting `.odp`

file in LibreOffice again. This time it should open as an Impress presentation instead of a drawing. Now you can File→Save As a PowerPoint file, and you’re done.

Remember those formatting issues you may have noticed before? You might have to fix them now. In my case, they magically went away. With any luck you will also enjoy this magic.

]]>"Chaos", as a mathematical term, refers to a phenomenon in highly-interconnected systems whereby they become sensitive to small perturbations in their conditions. Sometimes called "the butterfly effect", this sensitivity can make such systems difficult to understand and predict; even the act of observing them can change their future course significantly. We live our lives literally surrounded by chaotic systems, most notably the weather and climate systems. We are also increasingly turning our human-made systems, like the internet, the electric grid, and the financial system, into complex and potentially chaotic systems. In this seminar, we will explore chaos theory as a field of study, and we will explore chaotic systems in practice using a variety of computational tools, with a view towards developing a deeper appreciation of our complex world.

Before I started college ??? years ago I attended a similar type of seminar and found it so engaging that I still think about it. I'm going to try to pay that forward a little.

]]>Best worst-case does not have this quality if the set of metrics over which the worst-case is being taken can increase. The simple reason is that if I have two possible solutions $\alpha$ and $\beta$, which one appears better can flip flop as new metrics arise and lower the perceived best worst case of both, as in:

\begin{array}[c | c | c | c] a & t_1 & t_2 & t_3 \\ \alpha &5 & 3 & 3 \\ \beta &4 & 4 & 2 \\ \end{array}In the context of ${t_1}$, $\alpha$ has best worst case of 5 while $\beta$ has 4 and $\alpha$ looks better. In the context of $\{t_1,t_2\}$, though, $\alpha$'s value drops to 3 while $\beta$'s remains at 4 and $\beta$ looks better. However, in the context of $\{t_1,t_2,t_3\}$, $\alpha$ looks better again because its value stays at 3 but $\beta$'s drops to 2. Thus, in a situation where we were only increasing information (adding $t_i$), we changed our minds about $\alpha$ twice–the hallmark of a non-monotonic solution concept. Flips flops like this can make the dynamics of [[coevolutionary algorithms]] difficult to understand and manage.

Work on [[coevolutionary free lunches]] directly considered best worst-case and developed techniques for working with these non-monotonic [[solution concepts]]. One dangling thread from that work is that the techniques only make sense if there are a finite number of possible fitness values. Looking at the example above, it's clear that if there were a lower bound on the fitness values $\alpha$ and $\beta$ could take, then the flip flopping would eventually stop. There'd be theorems bounding how many times flip flopping could occur in an information-increasing path, and so there might be some extended notion of monotonicity that might apply. Travis Service started into these ideas with his notion of *biased solution concepts*.

I did some work arguing that monotonic solutions concepts can/should be thought of as convex for a certain interpretation of that word, and that they relate to something called the lower order on powersets of partial orders. Here's my thinking, beginning with the lower order. Let $\partial$ be the solution concept, and write $s\in\partial C_U$ to mean that $s$ is a solution in the set of configurations buildable from $U$, $C_U$, where $U\subset S$ and $C$ is some configuration-building functor. Sevan introduced a notion he called *weak preference*, which you can define like so: solution $\beta$ is weakly preferred to solution $\alpha$ if, for every context in which $\alpha$ is a solution, there is always a larger context in which $\beta$ is a solution. Using the notation just introduced, $\alpha\in\partial C_U$ implies there is a $U\subset V$ such that $\beta\in\partial C_V$. For any solution $\alpha$, define

In other words, $\mathscr{U}_{\alpha}$ is the set of all configuration sets that have $\alpha$ as a solution.

We have that $\beta$ is weakly preferred to $\alpha$ if $\mathscr{U}_{\alpha}\leq^{\flat}\mathscr{U}_{\beta}$, where $\leq^{\flat}$ is the lower order on sets of sets coming from $\subset$ [[The lower order means that for any set in the lower guy, there exists a superset in the larger guy::rsn]]

Now about convexity and monotonicity. One way to state that $\partial$ is non-monotonic is that there is a sequence $C_U\subset C_V\subset C_W$ and a solution $\alpha$ such that $\alpha\in C_U$, $\alpha\not\in C_V$, and $\alpha\in C_W$. Thinking in terms of the map $\alpha\mapsto\mathscr{U}_{\alpha}$, $C_U$ and $C_W$ are in $\mathscr{U}_{\alpha}$ while $C_V$ is not. A convex subset of an ordered set is such that this does not happen: if $a$ and $c$ are in the set, then so are all $b$ that lie between $a$ and $b$ in the order. So, a non-monotonic solution concept has at least one $\alpha$ for which $\alpha\mapsto\mathscr{U}_{\alpha}$ is not convex, and conversely, if all $\mathscr{U}_\alpha$ are convex then the solution concept is monotonic.

I like this formulation in part because it allows us to talk about local monotonicity. Solution concepts can be monotonic in some places and non-monotonic in others. It seems likely that some amount of non-monotonicity does not affect algorithm performance appreciably.

I published this work in a paper, Thoughts On Solution Concepts in 2007. There have been a few followups of note. I especially like the papers by Travis Service and by Achim Jung and Jon Rowe. Service introduces a notion he calls *biased solution concepts* and connects those to [[coevolutionary free lunches]]. Jung and Rowe develop a suggestion I'd made that [[interactive domains]] and [[solution concepts]] could be fruitfully conceived of in domain-theoretic terms.

First, a punchline: [[coevolutionary algorithms]] should consist of the specification of an [[interactive domain::interactive-domains]], a solution concept, and some form of evolutionary dynamics over the interactive domain that aims to find solutions as specified by the solution concept.

When optimizing functions, there are a number of possibilities for specifying exactly what you're trying to accomplish. If you have some function $f\colon S\rightarrow\mathbb{R}$, "maximizing" it could mean:

- Finding the value $r\in\mathbb{R}$ that $f$ actually takes on some unspecified input that is larger than or equal to any other value the function takes on other inputs
- Finding an element $s\in S$ such that $f(s) = r$
- Finding all elements in $S$ that $f$ gives value $r$

The last, or possibly the second to last, are sometimes called arg max. All this puts aside things like suprema, or the possibility that $f$ never takes a maximum value.

When it comes to a two-input function like $p\colon S\times T\rightarrow R$, a simplest-possible [[interactive domain::interactive-domains]], a new layer of complexity is added

- $R$ may have incomparable elements
- It is not specified whether we are seeking an $s\in S$, a $t\in T$, a pair $(s,t)\in S\times T$, or something else such as an ensemble

What we do with both these points needs to be clarified before we can begin to talk about optimizing. The [[interactive domain::interactive-domains]] definition specifies who gets the value from a function like $p$ (second point), while the solution concept specifies what solutions are selected given all this information.

As with max or arg max, solution concepts are in a sense polymorphic. Rather than try to write down what they are in full generality using symbols, I'll just say that a solution concept should be such that it specifies a well-defined collection of solutions for any view/state of an [[interactive domain::interactive-domains]].

In the case of a single function $p\colon S\times T\rightarrow R$ interpreted as giving values to elements of $S$ such that higher up $R$'s order is better, then a solution concept might take a form like $\partial\colon\mathscr{P}(S)\times \mathscr{P}(T)\rightarrow\mathscr{P}(S)$. In algorithms that seek Nash equilibria, the output might be $\mathscr{P}(\Lambda^S)$ (sets of mixtures of elements of $S$) instead. In algorithms that seek Pareto non-dominated fronts, the output might be $\mathscr{P}(\mathscr{P}(S))$ (there are technical reasons why it does not suffice to output a single Pareto front). So a slight generalization is to say the solution concept maps the "raw material" from the [[interactive domain::interactive-domains]] or search algorithm to a collection of configurations; the solution concept, or some other device, will have to specify what these configurations are then.

Sevan Ficici's PhD dissertation develops a notion of monotonic solution concept and shows these have nice ratcheting properties. I've done some work on [[coevolutionary free lunches]] that develops techniques for (some) non-monotonic solution concepts.

]]>Elena Popovici and I extended and grounded the theoretical paper of Wolpert and Macready and actually exhibited pseudocode of a free lunch algorithm. Later, Elena and Ezra Winston extended that result to include algorithm mechanisms, something W&M did not consider, landing a paper about it in the *Theoretical Computer Science* journal.

A key construct in these proofs is an algorithm history or trace. We considered best worst-case optimization of a simple test-based problem, with [[interactive domains]] taking form $p\colon S\times T\rightarrow R$, where $S$ and $R$ are finite, $R$ is totally ordered and higher up the order is better, and the value of $p$ goes to elements of $S$. The solution concept was expected to output a single element of $S$ or a set of singletons of $S$ to be more precise. A history or trace then is a finite sequence $((s_1,t_1),(s_2,t_2),\dots,(s_n,t_n))$ of pairs where no pair ever appears twice, but individual $s_i$ might repeat (similarly for $t_i$), together with the output of $p$ on each pair. We followed W&M in being algorithm agnostic and focused only on the output mechanism, the salient question being: given such a history, what element of $S$ should the algorithm output as its "answer"? Their answer, which is developed in their paper, was a kind of Bayes optimality property, where the expectations are taken with respect to all possible extensions of the problem consistent with the history observed.

What Elena and I first observed is that in a fairly wide set of circumstances (sticking with best worst case though), this complicated-sounding condition boiled down to a greedy criterion. In other words, the Bayes optimal way to choose which solution to output, given a history, is along the lines of: output an $s_i$ that has the best worst-case value according to the history among those that have been tested most, unless that best-worst case is the worst value possible, in which case output an arbitrary solution not in the history (or all of them if you're looking for all). What Elena and Ezra showed is that, again in a wide (though more restricted) set of circumstances, the optimal way to choose which solution to interact next–the algorithm mechanism–is to choose the solution you would have output from the history up to that point, and test that one with an arbitrary test that does not create a repeat in the history. These mechanisms can be implemented in a straightforward way.

I feel compelled to point out, though, that when you run them, or sit and think about them for a minute, these mechanisms are disappointing. In practice, such an algorithm will repeatedly test the same candidate solution over and over again until it is found to have the worst case value; then it will select an arbitrary new candidate, and repeat. Doing this has nice properties when considered in aggregate over all problems in the problem class, but it's unsatisfactory in practice because it does not explore.

Best worst-case is well-known to be a non-montonic solution concept, and thus stymies algorithms or theories that rely on a monotonicity condition (which I’ve argued is a form of convexity). Thus, this line of work is important not just because it exhibits provably-optimal algorithms, but also because it fills a theoretical gap.

]]>Through my PhD dissertation work I came to understand that the mechanism by which that impact is realized is that interactions function as measurements, and how any two agents compare can change fundamentally (e.g., change order) depends on the interactions in which they participate. I developed a way of capturing that information in what I termed *coordinate systems*, a notion that led to DECA as a published algorithm.

- each $i$ with $1\leq i\leq n$ is a
*domain role* - an element $x\in X_i$ is an
*entity*(playing the domain role $i$) - a tuple $(x_1,\ldots,x_n)\in X_1\times\cdots\times X_n$ is an
*interaction* - the value $p(x_1,\ldots,x_n)\in R$ is an
*outcome*(of the interaction) - the ordered set $R$ is the
*outcome set*

These are really only interesting from the perspective of coevolutionary algorithms when $n\geq 2$. When $n=1$, you have one or more single-variable functions, meaning something that looks like an optimization problem or a multi-objective optimziation problem as opposed to a co-optimization problem.

It is important to recognize that an interactive domain does not specify a solution concept–in other words, what one might want to find–only the structure of its information. As an analogy, a function $f\colon S\rightarrow\mathbb{R}$ does not specify enough information to be optimized; you'd also have to know whether you're trying to minimize or maximize, whether you're seeking one or all solutions, whether you're looking for an argument or a value, etc. The function is like an interactive domain; the arg max (for instance) is the solution concept.

- The book chapter titled Coevolutionary Principles in the Handbook of Natural Computing goes into depth about interactive domains and their role in coevolutionary algorithms