AI Compute At The Speed Of Light
Optics Scale Better
If you make a 700-watt GPU 100x faster, but do nothing about its energy efficiency, you'd be creating a chip that melts immediately after turning on.
To make AI hardware 100x faster, you need a radical rethink of the fundamentals:
TPU’s, based on a concept from the late 1970s, were a great leap forward. They are systolic arrays that reduce memory bandwidth issues by processing batches of data in parallel. This approach cuts down on memory accesses and improves efficiency. However, traditional digital systolic arrays hit a wall; scaling them up brings diminishing returns in energy efficiency. At a certain point, their energy consumption becomes the limiting factor
Researchers and startups are overcoming this limit by building analog systolic arrays. Analog circuits scale their energy consumption with the perimeter of a chip rather than the area, making them more efficient as they grow. However, the resistance and capacitance of the analog circuits delay the signal. As you scale up, signal propagation takes longer and longer, slowing down the clock speeds more and more.
Neurophos has a different solution: Optical systolic arrays. Instead of relying on electrons moving through silicon, Neurophos uses light, which moves at, well, the speed of light. This completely removes the latency issues and allows these arrays to be clocked at 100 GHz or more –regardless how big they get.
Our breakthrough lies in our metasurface technology—a new kind of optical modulator that is 8,000 times smaller than the standard ones used in silicon photonics. The result is an optical systolic array that scales speed and efficiency without the usual trade-offs. Our chips will be ultra fast, and come with the energy efficiency that keeps them from melting.
Imagine a future where we could deliver the power of 100 GPUs in the footprint and power consumption of a single GPU. That’s what Neurophos is building.
The goal is to develop AI accelerators that operate at the speed of light, support multiple numerical formats while remaining model-agnostic; adapting seamlessly to new advancements without retraining. This isn’t just about boosting performance now; it’s about creating a versatile, future-proof chip that supports an ever-evolving landscape of AI models.
By breaking free from the constraints of traditional electronics and harnessing the power of light, Neurophos is on track to make AI a hundred times faster, affordable, and more energy-efficient. The future of AI hardware isn’t just bright—it’s blinding.
The Tensor Core Wishlist
-
Dense. Small
-
Volatile. Transformer compatible
-
Analog. High energy efficiency
-
Photonic. Zero signal propagation latency
-
Black box operation. No need for hardware specific training
-
Supports both fixed and floating point. Flexible
-
Easily manufacturable and co-packaged. Affordable
Neurophos' Metasurface
The beating heart
Metasurface
Neurophos completely reinvented optical modulators, to make them 8000 times smaller and CMOS manufacturable at scale.
Our metasurface consists of over a million optical modulators "pixels" whose states we can arbitrarily reset within one µs.
Using The Metasurface To Revive An Old Concept
The Goodman Engine
In short
The Goodman Engine is a purely optical way of doing vector-matrix multiplications.
The cylindrical lenses manipulate light to perform matrix operations optically, projecting and multiplying the input vector with the matrix, and then summing the results via optical focus.
In Detail; From Left to Right
1D Vector of Light and Cylindrical Lens: Start with a 1D vector of light that passes through a cylindrical lens. This lens spreads the light across another dimension of space, effectively creating a 2D representation of the vector.
Projection onto Holographic Film: The spread-out light is then projected onto a two-dimensional holographic film. This film represents your matrix, with each point on it corresponding to a different component of the matrix.
Point-Wise Multiplication: When the light hits the holographic film, a point-wise multiplication is performed between the copied input vector and the matrix represented in the holographic film.
Second Cylindrical Lens for Focus and Sum: After the point-wise multiplication, the light passes through a second cylindrical lens. This lens is rotated in the other dimension and is used to focus the light. The focusing of the light corresponds to a summation operation.
Result of Vector-Matrix Multiplication: The end result is the vector-matrix multiplication of the input vector with the matrix represented in the 2D holographic film.
Improving Efficiency
Fold it in half
Folding the Setup: The first fold brings the 1D vector of light, cylindrical lenses, and 2D holographic film closer, placing them on the same plane. The second fold arranges the input vectors, output vectors, and 2D matrix representation very close together, minimizing physical distance and communication overhead.
3D Stacking on CMOS Chip: This folded configuration is then 3D stacked on a standard CMOS chip, connected by a dense through-silicon via (TSV) array, enabling high-bandwidth communication.
Benefits: The folded design compresses the setup into a compact, efficient layout, reducing latency, enhancing scalability, and allowing integration with current semiconductor processes.
A Folded Goodman Engine With a Metasurface at Its Heart