A new game working on latest revision , It is Cursed Castilla
First working game
After some hard work on shader recompiler , shadps4 can now run sonic mania . As you can see in the video there are still minor issues but we working out to sort them 🙂
Added constant buffers
Latest pr 147 added constant buffers and a new demo is now working
Our gpu engine started to work
We have exciting news today , It appears that our shader recompiler started to work . Although our results are still minimal , it is a good step on getting more things to work.
Below you can see a sample which uses vertex and fragment shader
Stay tuned for more updates soon 🙂
more v0.0.4 progress
More interesting pr’s came to our git these days
Firstly we got Rewrite thread local storage implementation from The Turtle
t’s not uncommon for ps4 guest applications to launch and use many threads, which also necessitates handling thread local storage properly. In x86 thread local accesses are performed by loading the pointer in the fs segment register. This is a problem as Windows doesn’t allow you to change the value of this register to what the guest expects. Not quite true, see first reply
On master this is handled with a simple exception handler that will patch the value of the destination register with a thread_local buffer. This works fine but will be a problem later on. Obviously the performance impact is pretty large for any access. In addition, the new texture cache that does fault tracking also needs a custom exception handler, so they end up conflicting. Also, guest apps can use negative offsets when accessing the buffer, so the current implementation would trigger UB in these cases.
This PR attempts to fix all of the above, by using assembly trampolines instead of the exception handler. For storing the TLS image pointer, a new TLS slot is allocated from the parent process and the logic from wine’s TlsGetValue is used to retrieve the value. This means we also don’t have to rely on undefined/unused spaces in TEB structure to store our data. Each mov instruction from FS segment is patched with a jump to a trampoline that loads the actual pointer.
While at it, also fixed a problem with fault tracking that caused crashing in pngdec demo. The tracking was being performed in the texture cache page size, when it should be on 4KB boundary like the host/guest. Also bumped the cache page size to vastly reduce the amount of page table accesses.
Secondly comes a pr gnmdriver: basic functionality extension from psucien
This adds implementation for the next commonly used driver functions:
sceGnmComputeWaitOnAddress
sceGnmDispatchDirect
sceGnmDispatchIndirect
sceGnmDrawIndexOffset
sceGnmInsertPopMarker
sceGnmInsertPushMarker
sceGnmUpdatePsShader350
sceGnmUpdateVsShader
Functions, related to HW state initialization and indirect draw calls, are subject to the next updates of this PR.
Submission related functionality will be re-worked in a separate PR as required changes in the GPU frontend.
Another pr for Psf info + stack allocation from shadow
Fix stack allocation : Currently we have a lot of crashes with the default stack allocation , the /stack flag increase the stack and commit area so let’s hope it will solve all relative crash issues
Print param.sfo at startup : We can print game id , title , fw version required , app version at the startup of the log file. We also will need the following info for savedata and sceAppContent module at future ( savedata pr is on it’s way)
Even on more pr for Sonicmania work from shadow which address the following issues
Flexible memory : some dummy mostly implementation of flexible memory mapping but allows games to go further
CreateThread : it appears some time threads are nameless
sceUserServiceGetEvent : implemented a fake login event , but should be enought atm
And latest one more pr for dummy np* modules and screenshot module from shadow
which add stubs for np* functions
Stay tuned for more updates soon 🙂
Shadps4 v0.0.4 progress (continue)
We have some interesting pr’s this week .
First we got Rewrite videoout library and bringup new vulkan backend from The Turtle
On master the video/graphics is relatively hard to understand as it’s split into multiple folders and directories without much cohesion on what every does. In addition the texture cache is extremely basic and works based on hashing: it will track changes to memory regions by computing its hash. This is fine for simple demos but when real games are put to the test, hashing large blocks of memory every draw call isn’t going to be fun. The vulkan code was also a bit fragile and broken under wayland, needing a hack to make it synchronize properly.
So this PR does 3 things, it reworks video_out to be more accurate based on my reverse engineering, fully reworks the vulkan backend side of things to have better abstractions that will make the 3d engine implementation easier and fully reworks the texture caching system to be based on fault tracking.
The first part is mostly self explanatory, the implementation has been split into a separate class for easier state management and some additional error codes have been added, but the result isn’t all the different from before. Presentation now occurs in the game thread instead of the window thread, which makes things a bit easier. In the future there should be a separate gpu thread to handle all the extra work but that isn’t needed here.
The new vulkan backend is based on the Citra one and uses vulkan-hpp instead of raw vulkan as it’s a little bit less verbose and solves the previous license problem, as the C headers are licensed under Apache which is incompatible with GPLv2. Vulkan-Hpp on the other hand is licensed with MIT as well which is ok. Like Citra, initialization is handled by the Instance class where all extensions are also loaded, the Scheduler has been ported as well as it will prove useful for parallel shader building in the future and makes validation layer performance a bit less miserable.
The main change here however is the texture cache. When a new image is stored in the cache, the region it owns is marked as protected using an mprotect call. This means that any reads or writes from the guest will go through the texture cache’s exception handler, which will allow it to decide on the appropriate action (either invalidation or flush). When the image is requested again, it is validated with an upload and reprotected. In general that’s a relatively clean way to handle readbacks or related accesses emulating a UMA system entails. In the future this can be tuned to be better suited for the PS4s memory model, but it’s a good base.
Second we got video_core: Add basic command list processing from The Turtle
Implements a few PM4 commands and gnm submit call. This means that guest application will no longer be stuck waiting for a command buffer label. Right now commands don’t do anything, actual functionality will be added in future PRs. Gnmdriver functions that write private packets have been implemented as close to real module disassembly as possible
what all the above means in general? . Shadps4 is getting ready to progress some real graphics from gpu .( currently we have only framebuffer demos working). So what’s the next steps?
Probably the next to follow will be
Shader compiler
Rendering code
Stay tuned for more updates soon!
Shadps4 v.0.0.4 progress
We have a very interesting progress on our new W.I.P. version .
Most of retail games comes with bundled libc and libfios2 libraries . Until version 0.0.3 shadps4 was HLEd part of libc library . From now on if game provides a bundled libc and libfios2 libraries we load it natively. That means we can be more accurate on libc emulation (we saved the need of more than 3000 hle functions) which can make us progress faster!
Stay tuned for more progress report soon!
Shadps4 v0.0.3 released
A new release for shadps4 , the date is not decided by luck is 21 years after first pcsx2 release , (23 march 2002 ) and by accident mine (shadow) birthday.
The most important features of this release is linux support and running of few OpenOrbis demos (helloword , graphics , pngdec , sound)
A more detailed list above :
-Switching to std::thread
-Use unique_ptr where possible
-Replace printf/scanf with type safe fmt
-Implemented sceKernelGetProcessTime
-Implemented sceKernelGetProcessTimeCounter , sceKernelGetProcessTimeCounterFrequency
-Pause emu with P button
-Timers rewrote with std::chrono
-Added sceSystemServiceGetStatus
-Initial FileSystem implementation
-Initial TLS work
-New logging implementation
-Some functions implemented for userService,systemService
-Added sceAudioOut module and output using sdl audio
Third release of shadps4 . Several OpenOrbis demos working