More

Archit3ch · 2025-12-20T20:59:02 1766264342

I'm tempted to put together an FPAA with Tiny Tapeout, but it likely won't fit in the allocated area.

Taniwha · 2025-12-21T00:35:57 1766277357

TT allows you to pay more and build multi-block designs

Joel_Mckay · 2025-12-21T12:56:16 1766321776

Check the switching speed specification, and shared i/o bank configuration.

The project has a narrow scope of use-cases. =3

Archit3ch · 2025-12-21T14:21:11 1766326871

Switching speed: should be good enough for audio in the kHz range, even for off-chip control.

Analog i/o pins: definitely limited, even if you purchase the highest option available (6).

Archit3ch · 2025-12-18T00:19:23 1766017163

> Problem is that NVIDIA literally makes the only sane graphics/compute APIs.

Hot take, Metal is more sane than CUDA.

m-schuetz · 2025-12-18T04:44:58 1766033098

I'm having a hard time taking an API seriously that uses atomic types rather than atomic functions. But at least it seems to be better than Vulkan/OpenGL/DirectX.

Archit3ch · 2025-12-09T00:40:17 1765240817

Same, Metal is a clean and modern API.

Is anyone here doing Metal compute shaders on iPad? Any tips?

Archit3ch · 2025-12-08T23:51:14 1765237874

Are you sure? I had not used Windows for years and assumed "Run Anyway" would work. Last month, I tested running an unsigned (self-signed) .MSIX on a different Windows machine. It's a 9-step process to get through the warnings: https://www.advancedinstaller.com/install-test-certificate-f...

Perhaps .exe is easier, but I wouldn't subject the wider public (or even power users) to that.

So yeah, Azure Trusted Signing or EV certificate is the way to go on Windows.

Archit3ch · 2025-12-07T19:46:34 1765136794

While this is the "standard" macOS App structure, it is not the only one that works.

IIRC, you can put stuff in arbitrary subfolders as long as you configure the RPATHs correctly. This works and passes notarization. I came across libname.dylib in the nonstandard location AppName.App/Contents/Libraries . Not to be confused with /Library or the recommended /Frameworks location. However, there are basically no benefits compared to using the recommended directory structure, and none of the 100+ macOS apps installed in my system have a /Libraries directory.

secretsatan · 2025-12-07T20:17:11 1765138631

AFAIK, and not technically relevant, but iOS is very strict on this when submitting to the app store, and they’re not at all clear about it either, i had some very confusing and frustrating errors with self built frameworks with dynamic libraries. You also seem to be forbidden to use .dylib and must use the .framework format.

It’s picked up on submission automatically and not at review, but is a completely undocumented requirement.

Archit3ch · 2025-12-03T01:58:35 1764727115

MicroPython? Are you doing digitally-controlled analog? :)

Archit3ch · 2025-11-30T20:03:29 1764533009

> this includes the solving of dense systems of equations

Is there even dedicated hardware for LU?

adrian_b · 2025-12-01T10:28:02 1764584882

There is no need for dedicated hardware for LU, because for big matrices LU can be reduced to matrix-matrix multiplications of smaller submatrices.

LU for small matrices and most other operations with small matrices are normally better done in the vector units.

imtringued · 2025-12-01T11:15:00 1764587700

There is a mild lack of context here. If you have a single vector and want to solve LUx=b, you actually have matrix vector multiplication. It's the batched LUX=B case, where X and B are matrices where you need matrix matrix multiplication.

For those who don't know. One of the most useful properties of triangular matrices is that the block matrices in the diagonal blocks are triangular matrices themselves. This means you an solve a subset of the x using the first triangular block. Since the sub-x vector is now known, you can now do a forward multiplication against the non-triangular blocks that take your sub-x vector as input and subtract them from the b vector. This is the same as if you removed one of the columns or rows in the triangular matrix. The remaining matrix stays triangular, which means you can just keep repeating this until the entire matrix is solved.

Archit3ch · 2025-11-30T14:52:19 1764514339

> The fp64 and fp32 performance is needed for physical simulations

In the very unlikely case where

1) You need fp64 Matrix-Matrix products for physical simulations

2) You bought the MI355X accelerator instead of hardware better suited for the task

you can still emulate it with the Ozaki scheme.

stonogo · 2025-12-01T09:54:19 1764582859

What hardware is better suited for the task? FLOPS per dollar, nvidia is in retreat just as much as AMD is when it comes to fp64.

Archit3ch · 2025-12-01T13:53:11 1764597191

ARMv9 Scalable Matrix Extension (SME). Apple had outer-product matrix hardware (AMX) since 2019, but you cannot buy the chips by themselves.

stonogo · 2025-12-02T01:30:26 1764639026

Yeah, I saw the presentations at SC25, but I wasn't able to get anyone to commit to being able to buy them in the next year or three. Right now I have two open RFPs and nobody is bidding ARM.

Archit3ch · 2025-11-30T14:00:08 1764511208

Anyone doing this in OpenGL?

legends2k · 2025-11-30T15:38:54 1764517134

I'm not sure I understand this. Most puzzles are number-crunching but very little to do with graphics (maybe one or two), so no usually OpenGL isn't used AFAIK.

Of course, folks may use it to visualise the puzzles but not to solve them.

ben-schaaf · 2025-11-30T18:07:57 1764526077

You definitely could do it all in shaders. People have done crazier things.

Archit3ch · 2025-11-27T15:21:45 1764256905

Among all the other problems with this... They describe [1] their contributions as "steering the AI" and "keeping it honest", which evidently they did not.

[1] https://discourse.julialang.org/t/ai-generated-enhancements-...