Hacker Newsnew | past | comments | ask | show | jobs | submit | Archit3ch's commentslogin

I'm tempted to put together an FPAA with Tiny Tapeout, but it likely won't fit in the allocated area.

TT allows you to pay more and build multi-block designs

Check the switching speed specification, and shared i/o bank configuration.

The project has a narrow scope of use-cases. =3


Switching speed: should be good enough for audio in the kHz range, even for off-chip control.

Analog i/o pins: definitely limited, even if you purchase the highest option available (6).


> Problem is that NVIDIA literally makes the only sane graphics/compute APIs.

Hot take, Metal is more sane than CUDA.


I'm having a hard time taking an API seriously that uses atomic types rather than atomic functions. But at least it seems to be better than Vulkan/OpenGL/DirectX.

Same, Metal is a clean and modern API.

Is anyone here doing Metal compute shaders on iPad? Any tips?


Are you sure? I had not used Windows for years and assumed "Run Anyway" would work. Last month, I tested running an unsigned (self-signed) .MSIX on a different Windows machine. It's a 9-step process to get through the warnings: https://www.advancedinstaller.com/install-test-certificate-f...

Perhaps .exe is easier, but I wouldn't subject the wider public (or even power users) to that.

So yeah, Azure Trusted Signing or EV certificate is the way to go on Windows.


While this is the "standard" macOS App structure, it is not the only one that works.

IIRC, you can put stuff in arbitrary subfolders as long as you configure the RPATHs correctly. This works and passes notarization. I came across libname.dylib in the nonstandard location AppName.App/Contents/Libraries . Not to be confused with /Library or the recommended /Frameworks location. However, there are basically no benefits compared to using the recommended directory structure, and none of the 100+ macOS apps installed in my system have a /Libraries directory.


AFAIK, and not technically relevant, but iOS is very strict on this when submitting to the app store, and they’re not at all clear about it either, i had some very confusing and frustrating errors with self built frameworks with dynamic libraries. You also seem to be forbidden to use .dylib and must use the .framework format.

It’s picked up on submission automatically and not at review, but is a completely undocumented requirement.


MicroPython? Are you doing digitally-controlled analog? :)


> this includes the solving of dense systems of equations

Is there even dedicated hardware for LU?


There is no need for dedicated hardware for LU, because for big matrices LU can be reduced to matrix-matrix multiplications of smaller submatrices.

LU for small matrices and most other operations with small matrices are normally better done in the vector units.


There is a mild lack of context here. If you have a single vector and want to solve LUx=b, you actually have matrix vector multiplication. It's the batched LUX=B case, where X and B are matrices where you need matrix matrix multiplication.

For those who don't know. One of the most useful properties of triangular matrices is that the block matrices in the diagonal blocks are triangular matrices themselves. This means you an solve a subset of the x using the first triangular block. Since the sub-x vector is now known, you can now do a forward multiplication against the non-triangular blocks that take your sub-x vector as input and subtract them from the b vector. This is the same as if you removed one of the columns or rows in the triangular matrix. The remaining matrix stays triangular, which means you can just keep repeating this until the entire matrix is solved.


> The fp64 and fp32 performance is needed for physical simulations

In the very unlikely case where

1) You need fp64 Matrix-Matrix products for physical simulations

2) You bought the MI355X accelerator instead of hardware better suited for the task

you can still emulate it with the Ozaki scheme.


What hardware is better suited for the task? FLOPS per dollar, nvidia is in retreat just as much as AMD is when it comes to fp64.


ARMv9 Scalable Matrix Extension (SME). Apple had outer-product matrix hardware (AMX) since 2019, but you cannot buy the chips by themselves.


Yeah, I saw the presentations at SC25, but I wasn't able to get anyone to commit to being able to buy them in the next year or three. Right now I have two open RFPs and nobody is bidding ARM.


Anyone doing this in OpenGL?


I'm not sure I understand this. Most puzzles are number-crunching but very little to do with graphics (maybe one or two), so no usually OpenGL isn't used AFAIK.

Of course, folks may use it to visualise the puzzles but not to solve them.


You definitely could do it all in shaders. People have done crazier things.


Among all the other problems with this... They describe [1] their contributions as "steering the AI" and "keeping it honest", which evidently they did not.

[1] https://discourse.julialang.org/t/ai-generated-enhancements-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: