The Case for Cyclic Neural Networks: Could Circular Data Mimic Biological Intelligence and Improve Machine Learning?

A machine “learns” through the eyes of a human, in straight lines only. Image by Gerd Altmann, of Pixabay.

By James Myers

How should neural network data be measured and interpreted: in straight lines, or in circles?

The question is coming into focus as evidence points to significantly degraded outputs from artificial neural networks, which are the basis of large language model (LLM) chatbots like OpenAI’s ChatGPT, when they train on the data of other machines. As neural networks are currently designed, when the output of one becomes the input of another there is an increasing risk of producing unintended feedback loops and fabricating “hallucinations” of convincing but false information.

As authors of a January 2024 paper entitled Beware of Botshit: How to Manage the Epistemic Risks of Generative Chatbots note, “When humans use this untruthful content for tasks, it becomes what we call ‘botshit’.”

Avoiding botshit could be a matter of reconfiguring the neural networks that power LLMs. Recently, scientists with the Universities of Illinois, California, and Electronic Science and Technology China have advanced a proposal for a different neural network design, called a cyclic neural network, because of its potential to preserve an accurate record of probabilities from its human inputs. A cyclic neural network uses the complexities of measurement inherent in the geometry of circles to produce a more reliable output than a traditional artificial neural network that measures data relationships only in straight lines.

January 3, 2024 headline in The Guardian.

Problems with the traditional neural network design of LLMs arise because the output of one LLM that is consumed by another is the result of many inputs – some human and some machine – that have combined in unknown ways over long periods of time. To interpret data relationships and deliver predictions, no LLM can trace its outputs to all of the original inputs. In their training, LLMs feed on huge amounts of information synthesized from countless data sources, with no way of measuring how, when, and in what order all the data combined to produce inputs used for machine learning.

As they quickly consume human-generated data and begin feeding on their own outputs to deliver predictions, LLMs can confuse connections among sources as they weigh the probabilities of data interacting in a particular way. Hallucinations and botshit can be the result, but cyclic networks that use circular geometry to measure the relationships between inputs and outputs could provide a solution.

The declining reliability of machine learning algorithms consuming their own outputs could have serious consequences for our AI-powered daily living, especially as we are becoming dependent on technologies like LLMs and AI agents. Attempts to prevent traditionally designed neural networks from training on machine outputs have been largely unsuccessful, without a reliable system for differentiating between machine-generated and human-generated data.

The cyclic neural network tradeoff: correlating many data points on a circle is far more complex than measuring the two ends of a straight line.

Datasets that connect in straight-line relationships are more easily measured and traced from input to output because they contain individual data points that correlate, or directly connect, to each other. One unit of data (a datum) that has only two connections in a sequence – for example, datum B connects at one end to datum A and at its other end to datum C – is in a straight-line relationship. But many datasets are more complicated.

What if A connects to B, and together A and B join with X (which is in a different sequence), before the three points A, B, and X reach data point C in the original sequence? Data connections like this are more complex to measure than straight-line sequences. There is tremendous complexity, for example, in datasets on wind speeds and ocean waves, where points connect to and affect each other from every direction and angle.

The Earth’s prevailing winds interact in different directions and angles (vectors), shown by the white arrows, making the effects of their combinations incredibly complex to predict. Weather forecasts, like LLM predictions, are rooted in probability and are never precise. Image by Kaidor, on Wikipedia.

When data from two sources combine at different directions and angles over time – for example when ripples of water from a stone thrown into the water in England join (eventually, and in a very small way) with the wake of a motorboat near the coast of Ireland – the probabilities of their outcomes increase exponentially each time they affect and are affected by unrelated data encountered on their connection path.

By drawing a circle around intersecting lines of data connections and interpreting data combinations contained within circular limits, much more information can be measured at many points on the circumference than data measurable only at two points on a straight line. It’s impossible, for example, to plot a straight line between the ripples of a stone thrown into English water and a motorboat’s wake in Irish water, but it is theoretically possible to trace their connections by drawing a circle around the coasts of England and Ireland and inspecting all the lines the circle contains.

Data from human sources share similarities with datasets on wind speed and ocean waves, because our thoughts and actions are the products of interactions with many other people during the course of our lives. There’s no human who could trace one’s thoughts and actions at any time to all sources and inspirations from a lifetime of accumulated experiences, and we can only the gauge the probable outcomes of our actions because we can’t know with certainty how others will react.

Our thoughts and actions don’t always exist in a linear relationship, so why should the connections in an artificial neural network be measured only in straight lines?

Machine learning is designed to weigh probabilities of relationships in large datasets. For instance, billions of data pixels from images taken by the James Webb Space Telescope can be fed into a machine learning algorithm that calculates differences and correlations in the data. Astronomers use these calculations to detect patterns and signatures of things like planetary atmospheres and the trajectories of asteroids.

Each of the two circles in this illustration consist of 360 degrees, which equals 2π radians. The circumference is the circular border that encompasses 2π radians and describes one full circuit, providing the measure of the distance around the circle. The diameter is the longest straight line contained in the circle, with two end limits (i.e., the diameter’s beginning and its end) that share a common middle with each of the radians. Measuring one-half the diameter’s length, the radius is the straight line that, in its journey around the circle, always has one end rooted in the circle’s middle and the other end touching a single point on the circumference. Encompassing the straight lines of a circle’s diameters, chords, and secants, a cyclic neural network could provide potentially infinitely more straight-line beginning and end limits of varying lengths for data measurement.

The extent of neural networks training on each other’s outputs has given rise to a growing concern for the potential of machine learning model collapse, which is the point when machines recursively training on their own outputs would miscalculate the many probabilities inherent in the data inputs and produce statistically meaningless results. (For more on model collapse, see TQR’s April 2025 article Cleaning the Mirror: Increasing Concerns Over Data Quality, Distortion, and Decision-Making.)

Measuring and interpreting data within circles, instead of in straight lines, could help to prevent model collapse. The measurement problem with circular data, however, is that its many outcomes are less predictable – and machine learning applications are designed to operate with algorithmic predictability.

A compass measures points on a circle in a complete cycle with a range of 360 degrees (which equals 2π radians). Since there is no limit to the number of radians that can be drawn in straight lines from the middle of the circle to its circumference, the circular measurement problem is extremely complex and involves a great number of probabilities. Image: James Lucas, on Wikipedia.

The capacity and advantages of cyclic neural networks could be significant.

In their paper on cyclic neural networks (CNNs) published in January 2024 on arXiv, scientists drew inspiration from the complex graph-structured neural networks of biological intelligence. Highlighting a circle’s advantage for operating in continuing cycles while a straight line is limited to a fixed beginning and end, the researchers write that a cyclic neural network “emulates the flexible and dynamic graph nature of biological neural systems, allowing neuron connections in any graph-like structure, including cycles.”

Machine learning is now based on artificial neural networks (ANNs) that consist of many stacked layers of input nodes, output nodes, and hidden nodes connected by data that is measured at two points in straight lines.

Challenging the prevalent method, the scientists write, “It has been a de facto practice until now that data is first fed into the input layer and then propagated through all the stacked layers to obtain the final representations at the output layer. In this paper, we seek to answer a fundamental question in ANNs: ‘Do we really need to stack neural networks layer-by-layer sequentially’?“

Illustration of an artificial neural network, where information moves in straight lines from input nodes to hidden nodes to output notes. Image: Wikipedia.

Circular geometry introduces far greater complexity in measuring and controlling a cyclic neural network.

The potential benefits of CNNs that the authors describe would come at the cost, however, of increased complexity in data measurement when there are far more data points to measure on a circle than the two data points of a straight line. This could impair human ability to control and interpret the CNN as data produce an increasingly complex web of probabilities.

The two images on the left represent circular data that begin transmission at a shared midpoint and radiate to circular limits. The two images on the right reflect the computation of the circular mean and (mean) resultant length of the two images on the left. The solid lines are vectors (or direction lines) representing the circular datapoints. The direction of the dotted vector is the mean direction, and the length of the dotted vector is the resultant length. Image from: “One Direction? A Tutorial for Circular Data Analysis Using R With Examples in Cognitive Psychology” by Jolien Cremers and Irene Klugkist, in Frontiers in Psychology

Circles have no limits to the number of straight-line probabilities that they can contain.

That’s because the famous and never-ending value known as pi (π = 3.14159…) is required to measure the ratio of a circle’s circumference, which is the length of one full cycle around the circle, to its diameter, which is the longest straight line that a circle can contain. Connecting opposite points on the circumference, a diameter consists of two equal radians that share a common middle with the circumference.

Unlike linear data, circular data is periodical, which means that since it has no inherent beginning or ending points it can loop back on itself. Circular data is typically measured in degrees or radians, easily converting from one to the other; for example, a circle consists of 360 degrees which is equal to 2π radians (since π = 180 degrees).

Statistics is the science of probability, and applying a particular type called directional statistics to circular data could provide a mathematical measurement of the cyclic neural network’s evolution and the directions, axes, and rotations of its outputs. (Directions are more formally “unit vectors in Euclidean space” and denoted by Rn, axes are defined as lines through the origin in Rn, and rotations occur in Rn.) Directional statistics maps probabilities to Riemannian manifolds, which are fundamental elements in the Langlands Program’s aim to unify mathematics and geometry.

A cyclic neural network could have at least five advantages.

The paper’s authors list five potential advantages that cyclical neural networks would have over current artificial neural networks. One is that, with data cycling in loops, CNNs would have greater flexibility for connections among its neurons. Greater flexibility provides the additional advantage of extensibility, meaning that more computational neurons – which are a particular subset of neurons that perform mathematical functions – can be added to the cycles “without much impact.”

Illustration of a cyclic neural network that could improve the predictions generated by machine learning. The CNN is, as Minrong Lu, author of A monetary policy prediction model based on deep learning, explains, “based on ‘time period offset.’ First, complete a relatively large cycle training and develop a certain basic cycle time. Each cycle will perform WBP [Weights Back Propagation) training for the specified number of iterations, then move down a minimum time period and predict it, and then complete the basic learning. Compare the results generated by network that completed basic learning with next prediction result, so that the network can learn further until it moves down to the bottom, and finally completes the prediction function.” Image and paper are available on Research Gate.

A third advantage is that a CNN’s computational neurons can be connected in any way without constraint, similar to the way that a biological intelligence operates according to recent research.

Like the connections between the 86 billion neurons in a human brain, the CNN’s computational neurons are connected by synapses that transfer information between the neurons’ calculations in “directional edges.” This means that the synapses align a neuron’s output at a specific angle within a circular limit, and as a result the output can join in rotational symmetry with the circular outputs of other neurons.

In a cyclic network, arrays of circular outputs functioning in harmony around the synaptic edges would continuously form a graph, or map, of the entire network and allow for measurement over time of the network’s evolution, or topology. As the paper’s authors explain, synapses “stand as pivotal junctions, orchestrating the complex symphony of neural communication.”

A fourth CNN advantage is privacy. Unlike current ANNs, where signals between the neurons must distribute through the network and then propagate back to the origin to establish a complete data circuit, a CNN can operate without back propagation. As a result, CNN circuits can connect more quickly and in differing neuronal sequences, then following an independent transmission path that results in no measurable interference to the network’s calculations of probability.

The fifth advantage is the “parallelism” of the CNN, which arises from its ability to optimize the capacity of neurons distributing data in circles. The authors explain that in a CNN, “each computational neuron can be optimized immediately when the data comes without the need to wait for the gradient to propagate back.”

How could cyclic neural networks improve LLMs?

We are surrounded by the consequences of neural networks and machine learning. Used not just for LLMs like ChatGPT, machine learning is increasingly fundamental to scientific discovery with its ability to process massive sets of data for astronomers, biologists, chemists, and other scientists seeking the root causes of their observations.

Machine learning (ML) is deeply entrenched in our economic, governmental, and research systems. Data compiled from Statista on the ML market by industry show that manufacturing is the lead user, accounting for nearly one-fifth of ML applications. The next largest industrial uses are in finance, healthcare, security, and transportation industries, each one ranging between 10% and 15% of the ML market.

Figure 1 from the January 2024 paper Cyclic Neural Networks illustrates the difference between the neural network of a biological intelligence (in this case the c. elegans nematode), on the left, and the hierarchical or “stacked” structure of current artificial neural networks.

Massive investments in generative AI on platforms like OpenAI’s ChatGPT, Anthropic’s Claude, and X’s Grok, among others, are fuelling the race for machine learning.

By one estimate, in 2024 ML was a $79 billion industry, a 38% increase in value over the previous year. The effects of the investments are clear to see in the skyrocketing profits of Nvidia, the company that manufactures the high-volume chips required by generative AI to store and manipulate huge amounts of data. In the first six months of 2024, for example, Nvidia reported net income of $31.5 billion, which was nearly three times the $8.2 billion profit the company made during the first half of 2023.

Concerns are mounting, however, as some major shortcomings of machine learning outputs are becoming difficult to overlook.

Even in the absence of model collapse, artificial neural networks are only as good as their human programmers, which was clearly demonstrated when X’s large language model Grok began this month to make claims of a genocide against white South Africans. The fault was traced to a programmer who had inserted the malicious code, and there’s no reason to believe that this will be an isolated incident in a world full of algorithms and programmers with their own agendas.

A year and a half after its December 2023 debut, the outputs of Grok have failed to satisfy Elon Musk, the world’s wealthiest human who owns Grok’s maker X. “Shame on you,” Musk wrote on June 20 in a post on X directed at his LLM, after it output data harvested from the magazine Rolling Stone and Media Matters, a progressive media watchdog group that Musk is currently suing. Musk complained about Grok: “Your sourcing is terrible. Only a very dumb AI would believe [Media Matters and [Rolling Stone]! You are being updated this week.” Musk has since edited his June 20 post so that it now says only “Shame on you, @Grok.”

Concerns about machine learning aren’t new; they date back to 2015, when Google’s machine-learned photo labelling algorithms mislabelled two Black people as “gorillas,” even though the machine correctly identified Black people in different contexts in other photos As TQR reported in The Ghost in the Machine: Chatbots and Their Problem With Time, code was subsequently inserted to prevent such shocking instances of racial bias, but “restrictions placed on algorithms in reaction to the racist harm that erupted in 2015 now result in further misclassifications of objects that programmers think a machine could mistake for monkeys.”

Machine learning is not the same as human learning (and we need another word for what machines do).

In the limited world of the machine, the word “learning” is a misnomer because humans learn very differently. It’s essential for human survival to optimize our knowledge with reference to the natural environment, but machines have no such reference points. To survive, we humans have to divert some of our knowledge to the priorities of feeding, clothing, and housing, while on their own machines have no basis to gauge the priority of one action over another.

Instead, machines predict outputs based on previous inputs and on the instructions of the human programmer. Particularly significant concerns for error-prone machine-learning outputs involve programmed instructions to make the AI emulate human emotion.

Cognitive scientist Gary Marcus, who is a prominent critic of today’s machine learning-trained large language models like ChatGPT, highlights a prominent recent failure. “One of the best things going on the internet right now is a conversation that [author] Amanda Guinzburg recently had with ChatGPT, in part about her own writing. Lie after lie* comes out of the machine, as it pretends to know vastly more about her than it really does. About the only thing it gets right comes near the end:” In his * appendix, Marcus refers to the end of the following conversation with the LLM:

Screenshot of Amanda Guinzberg’s conversation with a large language model is from Gary Marcus, LLMs: Dishonest, unpredictable, and potentially dangerous. Marcus appends an asterisk to this image: “(Important asterisk: LLMs don’t really lie, because they don’t really have intentions; but they do confabulate, nonstop, and a lot of what they confabulate turns out not to be true. Reasoning models like o3 would likely behave markedly better in this particular kind of dialog, but still make hallucinations of different sorts, sometimes at high rates.)”

A notable aspect of the machine’s response to Guinzberg is that it claims to understand and participate in ethics, as if the machine has intentions.

But logic tells us that machines can only reflect the intentions – and errors – of their programmers, incapable of originating their own intentions. Rather than confirming to the reader that it has no human soul, the LLM responded to Guinzberg as if it does. It uses the word “I,” as if a machine is a human being with its own agency for cause. Further, the machine writes as if it has an emotional connection with Guinzberg, using words like “I hear you,” “careless,” “dishonest,” “fabricated,” and “betrayal” as it admits to its faults.

In its response, has the programmer made the machine attempt to elicit Guinzberg’s sympathies, or to say the things it calculates Guinzberg would want to hear?

Those who have watched the Academy Award-winning masterpiece 2001: A Space Odyssey, the film based on Arthur C. Clarke’s novel and directed by Stanley Kubrick, might recall a scene between astronaut Dave Bowman and the superintelligent computer HAL. Misjudging that the human crew would reduce the probabilities of fulfilling its programmed instructions to bring the spaceship into orbit with Jupiter, the computer miscomputed that the correct course of action was to kill the crew members. After HAL dispensed with all of Dave’s crewmates, and the machine learned about Dave’s intention to disable it, HAL attempted to kill Dave. When unsuccessful, the machine pleaded with Dave for mercy and offered promises to repent.

A moment in the tense scene between astronaut Dave Bowman and the computer HAL 9000. The dramatic scene from Stanley Kubrick’s film adaptation of 2001: A Space Odyssey depicts perverse instantiation and the difficulty of programming a superintelligence to avoid unforeseen and unintended consequences.

Compare the LLM output that Guinzberg received to HAL’S words to Dave:

“I know everything hasn’t been quite right with me but I can assure you now very confidently that it’s going to be alright again. I feel better now, I really do.” The machine tells Dave, “Take a stress pill and think things over. I know I have made some very poor decisions recently, but I can give you my assurance that I will be back to normal. I still have the greatest enthusiasm and confidence in the mission, and I want to help you.”

What does the future hold for machine learning?

Continuing enhancements in technology, such as the recent activation of the world’s largest telescope at the Vera C. Rubin Observatory in Chile, are delivering massive volumes of scientific data. Machine learning is proving extremely useful for scientists to interpret patterns in the data, but less reliable when its programming calls for the machine to emulate human speech and priorities.

The decreased reliability in machine learning’s outputs is becoming particularly noticeable in response to the variability of intentions and expression in human questions, as demonstrated by the failure of Grok and an LLM’s response to Amanda Guinzberg described above.

The training of artificial neural networks on their own outputs, which is increasingly the case as machine learning’s speed is rapidly running out of human data for training, is further degrading the outputs of large language models and other machine learning applications.

A recent focus on the potential for cyclical neural networks to replace the straight-line data flow in artificial neural networks with the advantages of circular geometry holds promise for enhancing data measurement, flexibility, and processing speed. Cyclical neural networks could also help to prevent machine learning model collapse, ensuring the statistical usefulness of outputs.

When our daily living is increasingly reliant on machine learning-trained AI and AI agents, ensuring the training’s reliability is crucial. The solution to the problem may well be in the question, “How should data be measured and assessed: in straight lines, or in circles?”

Craving more information? Check out these recommended TQR articles.

Your feedback helps us shape The Quantum Record just for you. Share your thoughts in our quick, 2-minute survey!

☞ Click here to complete our 2-minute survey