You are viewing our Forum Archives. To view or take place in current topics click here.
What is the difference between a JTAG and an RGH?
Posted:
What is the difference between a JTAG and an RGH?Posted:
Status: Offline
Joined: May 05, 201113Year Member
Posts: 863
Reputation Power: 39
Status: Offline
Joined: May 05, 201113Year Member
Posts: 863
Reputation Power: 39
I've always wondered: what is the difference between a JTAG and an RGH?
#2. Posted:
Status: Offline
Joined: Feb 17, 201113Year Member
Posts: 989
Reputation Power: 42
Status: Offline
Joined: Feb 17, 201113Year Member
Posts: 989
Reputation Power: 42
TunaSalad wrote I've always wondered: what is the difference between a JTAG and an RGH?
RGH uses a glitch chip to run unsigned code and they take time to boot (ranges from instant to minutes)
JTAG uses wires to run unsigned code and they boot instantly
Check out the sticky for more info
- 0useful
- 0not useful
#3. Posted:
Status: Offline
Joined: Oct 03, 201311Year Member
Posts: 1,409
Reputation Power: 64
Technical Explanation:
RGH:
THE EXPLOIT:
Gligli was able to get the CPU to get past the CD authentication by pulsing the reset at just the right time, for just the right length. The idea is the successful authentication eventually boils down to a 'branch-if-register-equals-zero' instruction on a certain CPU register. If this register has been reset, it will think the check passed. The problem is other CPU registers have likely been reset by the glitch. If any of these registers contain values that are needed for continued booting, the system will crash. If no 'critical' registers got reset, the system will successfully continue loading the CD.
HOW ITS DONE:
Cjack's research discovered some methods of slowing down the CPU, including asserting CPU_PLL_BYPASS. Asserting this signal slows CPU execution down by a factor of 128. Gligli used this to his advantage to slow the CPU execution down enough that he could successfully exploit the system, originally using an external microcontroller. In order to make it more user-friendly, the exploit was ported to a CPLD, that ran off the 48 MHz SMC standby clock. This clock was chosen simply because it was convienient. The clock is fast enough that it provides sufficient temporal resolution to pulse the reset appropriately for the exploit.
HOW ITS DONE (SLIMS):
Unfortunately, Gligli and Tiros were unable to locate CPU_PLL_BYPASS on the Trinity system, and decided to take a different approach. Instead of using the CPU_PLL_BYPASS to slow down the core directly, they changed the frequency of the CPU reference clock (from the HANA) going into the CPU PLL which generates the core clock. So, if you slow down the CPU reference clock from 100 MHz to 100/128 ~= 781 Khz, then you've accomplished the same thing, the core will run the same speed as if you had used CPU_PLL_BYPASS. Unfortunately this doesn't work. Tiros found the HANA PLL wasn't stable at arbitrarily slow frequencies. It would wander and drift randomly around the target frequency. This meant the CPU execution speed was constantly changing, and when you tried to time the reset pulse, the CPU was rarely at the correct instruction anymore.
JTAG:
To understand this new hack, let's first look at what made the KK exploit possible: A fatal bug in the Hypervisor's Syscall Handler, introduced in the 4532 kernel update. For more details, take a look at [ Register or Signin to view external links. ] which explains the problem in great detail.
The KK exploit exploited the kernel bug by modifying an unsigned shader to do a series of so-called memory exports, an operation where the GPU can write the results of a pixel or vertex shader into physical memory. The shader was written to overwrite the Idle-thread context to make the kernel jump at a certain position in memory, with some registers under our control. In order to control all registers, a second step was necessary, this time by jumping into the interrupt restore handler. This finally allows all CPU general purpose registers to be filled with determined values. The program counter could be restored to a syscall instruction in the kernel, with register values prefilled so that they would trigger the exploit.
The exploit basically allows jumping into any 32-bit address in hypervisor space. To jump into an arbitrary location, we just used a "mtctr, bctr"-register pair in hypervisor, which would redirect execution flow into any 64-bit address. This is important, since we need to clear the upper 32bit (i.e., set the MSB to disable the HRMO), since the code we want to jump to is in unencrypted memory.
This code would usually load a second-stage loader, for example XeLL, into memory, and start it. XeLL would then attempt to catch all cpu threads (because just the primary thread is affected by our exploit), and load the user code, for example from DVD.
So, the following memory areas are involved:
- Idle Thread context, at 00130360 in physical memory
This stores the stack pointer (and some other stuff) when the idle thread was suspended. By changing the stack pointer, and then waiting for the kernel to switch to the idle thread, the stack pointer can be brought into our control. Part of the context switch is also a context restore, based on the new stack pointer.
- Context restore, part 1, arbitrary location, KK expl. uses 80130AF0
The thread-context restore doesn't restore all registers, but let's us control the NIP (the "next instruction" pointer). We setup NIP to point to the interrupt context restore, which does a SP-relative load of most registers.
- Context restore, part 2, same base location as part 1
We just re-use the same stack pointer, because the areas where the first context restore and the interrupt context restore load from do not overlap. The second context restore allows us to pre-set all registers with arbitrary 64 bit values.
- The HV offset, at 00002080 for syscall 0x46 on 4532
Because of the HV bug, we can write this offset into unencrypted memory, giving us the possibility to jump into any location in the hypervisor space (i.e. with a certain "encryption prefix"). We usually write 00000350 here, which points to a "mtctr %r4; bctr" instruction pair in hypervisor, which lets us jump to %r4.
- Our loader code, at an arbitrary location
This code will be executed from hypervisor. It's the first of our code which will be executed. %r4 on the syscall entry has to point to this code.
Only the the idle thread context and the HV offset have fixed addresses. It's easily possible to merge this so that only two distinct blocks needs to be written into memory, but it's not possible to merge this into a single block.
Fortunately, the NAND controller allows doing DMA reads where the payload data is split from the "ECC"-data. Each page has 512 bytes of payload, and 16 bytes of ECC data. Thus, a single DMA read can be used to load all required memory addresses. We chose the Payload to read the Idle Thread Context, the Context Restores and the loader code. The ECC data will carry the HV offset.
To to a DMA read, the following NAND registers need to be written:
ea00c01c Address for Payload
ea00c020 Adresss for ECC
ea00c00c address inside NAND
ea00c008 command: read DMA (07)
The System Management Controller (SMC) is a 8051 core inside the Southbridge. It manages the power sequencing, and is always active when the Xbox 360 has (standby or full) power applied. It controls the frontpanel buttons, has a Realtime clock, decodes IR, controls temperatures and fans and the DVDROM tray. It talks with the frontpanel board to set the LEDs. When the system is running, the kernel can communicate with the SMC, for example to query the realtime clock, open the dvd-tray etc. This happens over a bidirectional FIFO (at ea001080 / ea001090). See the XeLL SMC code for details.
The SMC can read the NAND, because it requires access to a special NAND page which contains a SMC config block. This block contains calibration information for the thermal diodes, and the thermal targets etc. The 8051 core has access to NAND registers, which are mapped into the 8051 SFRs. It uses the same protocol as the kernel uses, so it writes an address, does a "READ" command, and then reads the data out of the "DATA" registers.
It could also do a "READ (DMA)"-command. So by hacking the SMC, we could make the box do the exploit, without any shader - the SMC can access the NAND controller all the time, even when the kernel is running (though it will likely interfere with the kernel). So, just we just trigger the DMA read when the kernel has been loaded, and everything is fine.
Right?
Well, that would be too easy. While most NAND registers are mapped, tyhe DMA address registers (1c, 20) are not. We can DMA, but only to the default address of zero (or wherever the kernel last DMAed into). Fail.
The GPU, the (H)ANA (the "scaler" - which in fact doesn't scale at all, it's "just" a set of DACs, and, since Zephyr, a DVI/HDMI encoder), the Southbridge and the CPU have their JTAG ports exposed on the board. They are unpopulated headers, but the signals are there. CPU JTAG is a different (complex) story, and SB JTAG doesn't offset much funcationality. ANA JTAG is boring since the ANA doesn't sit on any interesting bus. That leaves GP
JTAG.
GPU JTAG was reverse-engineered until a point where arbitrary PCI writes are possible, up to a certain point. So that makes it possible to talk to each PCI device in the system, including the NAND controller. So we can simply use THAT instead of the SMC to start the DMA?
Right?
Well, not quite. The problem is that the "VM code", the code which does a lot of system initialization, like the memory (that code is also responsible for generating the 01xx "RROD"-Errors), sets a certain bit in some GPU
register, which disables the JTAG interface. The VM code is executed way before the kernel is active. So this is fail, too.
But the combination works - by programming the DMA target address via JTAG, and launching the attack via SMC. The attack can be launched as soon as the kernel is running, and quite early, it does query the SMC for the RTC. We abuse this call to start the attack instead, which is a perfect point for us.
But how do we run an exploitable kernel at all? Most machines are updated already. Let me refresh your knowledge about the boot process again:
1BL (Bootrom)
Buried deep inside the CPU die, this ~32kb of ROM code is responsible for reading the 2BL from NAND-flash and decrypts it into the embedded SRAM in the CPU. It verifies the hash of the decrypted image with a signed block at the beginning of the 2BL, and will stop execution of this hash mismatches. This code also contains a number of test functions, which can be activated by pulling the 5 "POST IN"-pins, which are available on the backside of the PCB. None of these tests looks particulary interesting (from an exploitation perspective) - they mostly seem to be related to the FSB (the bus between CPU and GPU). This code is fixed, and all systems use identical code here.
2BL ("CB")
This code is usually located at 0x8000 in NAND flash. It's decrypted by 1BL, and runs from internal SRAM.
It does a basic hardware initialization, and contains the "fuse check code", which verifies the "2BL version". The fuses store the expected version. The 2BL stores a "Version" and a "AllowedMask" (=bitfield), and this is usually stored at address 0x3B1 / 0x3B2..0x3B3.
Xenon Zephyr Falcon Jasper
2 0003 1888, 1901, 1902
4 1920 "new zeropair code"
5 0010 1921 4558 5760,5761,5770 6712 TA-fixed
It then verifies the pairing information stored in the 2BL header. Part of this verification is a checksum check of the NAND area which was used to load the SMC code from.
It also contains a virtual machine and some code to run on this machine. The virtual machine code, which is pretty complicated, does the following things:
- Initialization of the PCI-Bridge
- Disable the GPU PCIE JTAG test port
- initialize the serial port
- talk to the SMC to clear the "handshake"-bit
- initialize memory
- hopefully not: generate RROD if memory init fails
After that, the external (512MB) memory will be initialized and usable. 2BL then decrypts the 4BL into this memory. Memory encryption will already be enabled - no executable code is *ever* written unencrypted.
4BL ("CD")
This code is responsible for checking and unpacking 5BL, as well as applying update patches. First, the fuses are read to determine the console "Update Sequence", a number which basically counts the number of updates installed. Since updates are, in the same way as 2BL, paired to a console, this allows to configure the console in a way that no old update will be used. So each update slot stores the maximum value of burned fuses (well, essentially the exact value). The base kernel also has an associated value, usually zero, but this can be changed in the 2BL pairing data block. This is what the timing-attack increments, in order to revert to the 1888 kernel.
5BL ("HV/Kernel")
The HV and kernel are merged into a single image, which is compressed with a proprietary algorithm (LDIC).
6BL ("CF"), 7BL ("CG")
This is part of a system upgrade. Each console has a so-called "Base Kernel", which is the 1888 kernel which was available on launch back in 2005. Then there are two "update slots" - areas of 64k each (128k on Jasper), which contain a 6BL and 7BL. 6BL is code which applies the update, using a clever delta-compression. 7BL is the actual delta-compressed update, essentially a binary diff.
Oh, updates are >64k. So only the first 64k are actually stored in the update slots, the rest is stored in the filesystem as a special file. Since 6BL doesn't contain a filesystem parser, a blockmap is added in 6BL which points to the sectors which contain the rest of the update.
Zero-Pairing
Now there is a special situation: If the 2BL pairing block is all-zero, the pairing block will not be checked. However, a bit is set so that the kernel doesn't boot the dashboard binary, but a special binary called "MfgBootLauncher", where "Mfg" probably stands for "Manufacturing". So this is a leftover of the production process, where the flash image is used on all hardware, probably also before any CPU-key has been programmed.
By abusing this feature, this allows us easily to produce a flash image which runs on all hardware. However, 4BL won't look at update slots when it detects this mode, so we end up in the 1888 base kernel. And we can't run the dashboard, so it's impossible to escape this mode.
Previously, this has been deemed very uninteresting, because first the 1888 isn't exploitable by the KK exploit, and second because it's impossible to run the KK game anyway.
However, starting with 2BL version 1920, an interesting thing happened: The encryption key for 4BL is generated with the help of the CPU-key now. That means that without the CPU-key, it's not possible to decrypt the 4BL anymore. Note that each 2BL has exactly a single valid 4BL binary - 2BL contains a hardcoded hash for the 4BL, and doesn't use RSA.
However, zero'ed pairing data is detected, the CPU-key is NOT used in this process, like it was previously. That also means that you cannot just zero-out the pairing data anymore - the 4BL would be decrypted with the wrong key then. Instead you need to decrypt the 4BL (which requires knowing the CPU key), and re-encrypt it with the old algorithm.
However, 1920 was susceptible to the timing attack - so a CPU-key recovery was possible on one console, which allowed us to decrypt the 1920 4BL. That 4BL shows a very interesting change: Whenever zero-pairing is detected, the update slots are not ignored anymore. Instead, if the update-slots are zero-paired as well, they are applied.
This change allows us to boot any kernel, provided we have a (1920 and up) 2BL/4BL set which runs on that machine. This is very important, because we can build up an image now which runs into the 4532 kernel, regardless on how many update fuses are set. However, the 2BL revocation process must be passed, so we are not completely independent of the fuses, still. But since we use zero-pairing, the SMC hash doesn't matter anymore (there are other ways to work around the SMC hash problem, like the TA, but we get this for free). Still, we boot into the MfgBootLauncher (into the 4532 version now, which does a red/green blinking thingie - you'll notice once you see it, it's very unique and doesn't look like any RROD or so). But thanks to the SMC/JTAG hack described above, this allows us to launch our attack from this state.
Newer consoles (which have the TA fix) don't run 1920 anymore. They run, for example, 1921. The problem is that we cannot run HV code on these machines, so we don't know the CPU key. However, when comparing the 1921 and 1920 2BL (which we can still decrypt), the only change is the addition of the timing attack fix (i.e. replacing two memcmp instances with a memdiff function). Also, we know the expected hash value of the decrypted 4BL. Based on a 1920 4BL, and the guess what has changed functionally, and the new size of the 4BL, we were able to guess the modifications, which yields an image which passes the 2BL hash check. Note that this is not a hash collision - we did merely derive the exact image by applying the changes between 1920 2BL and 1921 2BL into 1920 4BL, yielding the 1921 4BL.
The 1921 2BL theoretically runs on all machines so far, even TA-proof ones. But it crashes on Zephyr, Falcon and Jasper. The reason is the VM code, which doesn't cover the different GPUs (Xenon has 90nm GPU, Zephyr and Falcon have 80nm, Jasper has 60nm, so there are 3 GPU revisions in total).
But the step from 1921 to, say, 4558, is even smaller. It's just the different version number, plus a slight difference in the memcpy code, which again can be ported over from 2BL.
Jasper's 67xx is a different thing, since this code adds support for the largeblock flash used in "Arcade"-Jasper units. We have used some magic to retrieve this code.
So we now have ALL 4BL versions. Isn't that great? It means that ALL machines can run the 4532 kernel. The good news is also that the 4532 kernel supports falcon consoles, and runs long enough to also work on jasper consoles (because we exploit way before the different GPU is touched at all).
Credits to Team-Xecuter.
RGH:
THE EXPLOIT:
Gligli was able to get the CPU to get past the CD authentication by pulsing the reset at just the right time, for just the right length. The idea is the successful authentication eventually boils down to a 'branch-if-register-equals-zero' instruction on a certain CPU register. If this register has been reset, it will think the check passed. The problem is other CPU registers have likely been reset by the glitch. If any of these registers contain values that are needed for continued booting, the system will crash. If no 'critical' registers got reset, the system will successfully continue loading the CD.
HOW ITS DONE:
Cjack's research discovered some methods of slowing down the CPU, including asserting CPU_PLL_BYPASS. Asserting this signal slows CPU execution down by a factor of 128. Gligli used this to his advantage to slow the CPU execution down enough that he could successfully exploit the system, originally using an external microcontroller. In order to make it more user-friendly, the exploit was ported to a CPLD, that ran off the 48 MHz SMC standby clock. This clock was chosen simply because it was convienient. The clock is fast enough that it provides sufficient temporal resolution to pulse the reset appropriately for the exploit.
HOW ITS DONE (SLIMS):
Unfortunately, Gligli and Tiros were unable to locate CPU_PLL_BYPASS on the Trinity system, and decided to take a different approach. Instead of using the CPU_PLL_BYPASS to slow down the core directly, they changed the frequency of the CPU reference clock (from the HANA) going into the CPU PLL which generates the core clock. So, if you slow down the CPU reference clock from 100 MHz to 100/128 ~= 781 Khz, then you've accomplished the same thing, the core will run the same speed as if you had used CPU_PLL_BYPASS. Unfortunately this doesn't work. Tiros found the HANA PLL wasn't stable at arbitrarily slow frequencies. It would wander and drift randomly around the target frequency. This meant the CPU execution speed was constantly changing, and when you tried to time the reset pulse, the CPU was rarely at the correct instruction anymore.
JTAG:
To understand this new hack, let's first look at what made the KK exploit possible: A fatal bug in the Hypervisor's Syscall Handler, introduced in the 4532 kernel update. For more details, take a look at [ Register or Signin to view external links. ] which explains the problem in great detail.
The KK exploit exploited the kernel bug by modifying an unsigned shader to do a series of so-called memory exports, an operation where the GPU can write the results of a pixel or vertex shader into physical memory. The shader was written to overwrite the Idle-thread context to make the kernel jump at a certain position in memory, with some registers under our control. In order to control all registers, a second step was necessary, this time by jumping into the interrupt restore handler. This finally allows all CPU general purpose registers to be filled with determined values. The program counter could be restored to a syscall instruction in the kernel, with register values prefilled so that they would trigger the exploit.
The exploit basically allows jumping into any 32-bit address in hypervisor space. To jump into an arbitrary location, we just used a "mtctr, bctr"-register pair in hypervisor, which would redirect execution flow into any 64-bit address. This is important, since we need to clear the upper 32bit (i.e., set the MSB to disable the HRMO), since the code we want to jump to is in unencrypted memory.
This code would usually load a second-stage loader, for example XeLL, into memory, and start it. XeLL would then attempt to catch all cpu threads (because just the primary thread is affected by our exploit), and load the user code, for example from DVD.
So, the following memory areas are involved:
- Idle Thread context, at 00130360 in physical memory
This stores the stack pointer (and some other stuff) when the idle thread was suspended. By changing the stack pointer, and then waiting for the kernel to switch to the idle thread, the stack pointer can be brought into our control. Part of the context switch is also a context restore, based on the new stack pointer.
- Context restore, part 1, arbitrary location, KK expl. uses 80130AF0
The thread-context restore doesn't restore all registers, but let's us control the NIP (the "next instruction" pointer). We setup NIP to point to the interrupt context restore, which does a SP-relative load of most registers.
- Context restore, part 2, same base location as part 1
We just re-use the same stack pointer, because the areas where the first context restore and the interrupt context restore load from do not overlap. The second context restore allows us to pre-set all registers with arbitrary 64 bit values.
- The HV offset, at 00002080 for syscall 0x46 on 4532
Because of the HV bug, we can write this offset into unencrypted memory, giving us the possibility to jump into any location in the hypervisor space (i.e. with a certain "encryption prefix"). We usually write 00000350 here, which points to a "mtctr %r4; bctr" instruction pair in hypervisor, which lets us jump to %r4.
- Our loader code, at an arbitrary location
This code will be executed from hypervisor. It's the first of our code which will be executed. %r4 on the syscall entry has to point to this code.
Only the the idle thread context and the HV offset have fixed addresses. It's easily possible to merge this so that only two distinct blocks needs to be written into memory, but it's not possible to merge this into a single block.
Fortunately, the NAND controller allows doing DMA reads where the payload data is split from the "ECC"-data. Each page has 512 bytes of payload, and 16 bytes of ECC data. Thus, a single DMA read can be used to load all required memory addresses. We chose the Payload to read the Idle Thread Context, the Context Restores and the loader code. The ECC data will carry the HV offset.
To to a DMA read, the following NAND registers need to be written:
ea00c01c Address for Payload
ea00c020 Adresss for ECC
ea00c00c address inside NAND
ea00c008 command: read DMA (07)
The System Management Controller (SMC) is a 8051 core inside the Southbridge. It manages the power sequencing, and is always active when the Xbox 360 has (standby or full) power applied. It controls the frontpanel buttons, has a Realtime clock, decodes IR, controls temperatures and fans and the DVDROM tray. It talks with the frontpanel board to set the LEDs. When the system is running, the kernel can communicate with the SMC, for example to query the realtime clock, open the dvd-tray etc. This happens over a bidirectional FIFO (at ea001080 / ea001090). See the XeLL SMC code for details.
The SMC can read the NAND, because it requires access to a special NAND page which contains a SMC config block. This block contains calibration information for the thermal diodes, and the thermal targets etc. The 8051 core has access to NAND registers, which are mapped into the 8051 SFRs. It uses the same protocol as the kernel uses, so it writes an address, does a "READ" command, and then reads the data out of the "DATA" registers.
It could also do a "READ (DMA)"-command. So by hacking the SMC, we could make the box do the exploit, without any shader - the SMC can access the NAND controller all the time, even when the kernel is running (though it will likely interfere with the kernel). So, just we just trigger the DMA read when the kernel has been loaded, and everything is fine.
Right?
Well, that would be too easy. While most NAND registers are mapped, tyhe DMA address registers (1c, 20) are not. We can DMA, but only to the default address of zero (or wherever the kernel last DMAed into). Fail.
The GPU, the (H)ANA (the "scaler" - which in fact doesn't scale at all, it's "just" a set of DACs, and, since Zephyr, a DVI/HDMI encoder), the Southbridge and the CPU have their JTAG ports exposed on the board. They are unpopulated headers, but the signals are there. CPU JTAG is a different (complex) story, and SB JTAG doesn't offset much funcationality. ANA JTAG is boring since the ANA doesn't sit on any interesting bus. That leaves GP
JTAG.
GPU JTAG was reverse-engineered until a point where arbitrary PCI writes are possible, up to a certain point. So that makes it possible to talk to each PCI device in the system, including the NAND controller. So we can simply use THAT instead of the SMC to start the DMA?
Right?
Well, not quite. The problem is that the "VM code", the code which does a lot of system initialization, like the memory (that code is also responsible for generating the 01xx "RROD"-Errors), sets a certain bit in some GPU
register, which disables the JTAG interface. The VM code is executed way before the kernel is active. So this is fail, too.
But the combination works - by programming the DMA target address via JTAG, and launching the attack via SMC. The attack can be launched as soon as the kernel is running, and quite early, it does query the SMC for the RTC. We abuse this call to start the attack instead, which is a perfect point for us.
But how do we run an exploitable kernel at all? Most machines are updated already. Let me refresh your knowledge about the boot process again:
1BL (Bootrom)
Buried deep inside the CPU die, this ~32kb of ROM code is responsible for reading the 2BL from NAND-flash and decrypts it into the embedded SRAM in the CPU. It verifies the hash of the decrypted image with a signed block at the beginning of the 2BL, and will stop execution of this hash mismatches. This code also contains a number of test functions, which can be activated by pulling the 5 "POST IN"-pins, which are available on the backside of the PCB. None of these tests looks particulary interesting (from an exploitation perspective) - they mostly seem to be related to the FSB (the bus between CPU and GPU). This code is fixed, and all systems use identical code here.
2BL ("CB")
This code is usually located at 0x8000 in NAND flash. It's decrypted by 1BL, and runs from internal SRAM.
It does a basic hardware initialization, and contains the "fuse check code", which verifies the "2BL version". The fuses store the expected version. The 2BL stores a "Version" and a "AllowedMask" (=bitfield), and this is usually stored at address 0x3B1 / 0x3B2..0x3B3.
Xenon Zephyr Falcon Jasper
2 0003 1888, 1901, 1902
4 1920 "new zeropair code"
5 0010 1921 4558 5760,5761,5770 6712 TA-fixed
It then verifies the pairing information stored in the 2BL header. Part of this verification is a checksum check of the NAND area which was used to load the SMC code from.
It also contains a virtual machine and some code to run on this machine. The virtual machine code, which is pretty complicated, does the following things:
- Initialization of the PCI-Bridge
- Disable the GPU PCIE JTAG test port
- initialize the serial port
- talk to the SMC to clear the "handshake"-bit
- initialize memory
- hopefully not: generate RROD if memory init fails
After that, the external (512MB) memory will be initialized and usable. 2BL then decrypts the 4BL into this memory. Memory encryption will already be enabled - no executable code is *ever* written unencrypted.
4BL ("CD")
This code is responsible for checking and unpacking 5BL, as well as applying update patches. First, the fuses are read to determine the console "Update Sequence", a number which basically counts the number of updates installed. Since updates are, in the same way as 2BL, paired to a console, this allows to configure the console in a way that no old update will be used. So each update slot stores the maximum value of burned fuses (well, essentially the exact value). The base kernel also has an associated value, usually zero, but this can be changed in the 2BL pairing data block. This is what the timing-attack increments, in order to revert to the 1888 kernel.
5BL ("HV/Kernel")
The HV and kernel are merged into a single image, which is compressed with a proprietary algorithm (LDIC).
6BL ("CF"), 7BL ("CG")
This is part of a system upgrade. Each console has a so-called "Base Kernel", which is the 1888 kernel which was available on launch back in 2005. Then there are two "update slots" - areas of 64k each (128k on Jasper), which contain a 6BL and 7BL. 6BL is code which applies the update, using a clever delta-compression. 7BL is the actual delta-compressed update, essentially a binary diff.
Oh, updates are >64k. So only the first 64k are actually stored in the update slots, the rest is stored in the filesystem as a special file. Since 6BL doesn't contain a filesystem parser, a blockmap is added in 6BL which points to the sectors which contain the rest of the update.
Zero-Pairing
Now there is a special situation: If the 2BL pairing block is all-zero, the pairing block will not be checked. However, a bit is set so that the kernel doesn't boot the dashboard binary, but a special binary called "MfgBootLauncher", where "Mfg" probably stands for "Manufacturing". So this is a leftover of the production process, where the flash image is used on all hardware, probably also before any CPU-key has been programmed.
By abusing this feature, this allows us easily to produce a flash image which runs on all hardware. However, 4BL won't look at update slots when it detects this mode, so we end up in the 1888 base kernel. And we can't run the dashboard, so it's impossible to escape this mode.
Previously, this has been deemed very uninteresting, because first the 1888 isn't exploitable by the KK exploit, and second because it's impossible to run the KK game anyway.
However, starting with 2BL version 1920, an interesting thing happened: The encryption key for 4BL is generated with the help of the CPU-key now. That means that without the CPU-key, it's not possible to decrypt the 4BL anymore. Note that each 2BL has exactly a single valid 4BL binary - 2BL contains a hardcoded hash for the 4BL, and doesn't use RSA.
However, zero'ed pairing data is detected, the CPU-key is NOT used in this process, like it was previously. That also means that you cannot just zero-out the pairing data anymore - the 4BL would be decrypted with the wrong key then. Instead you need to decrypt the 4BL (which requires knowing the CPU key), and re-encrypt it with the old algorithm.
However, 1920 was susceptible to the timing attack - so a CPU-key recovery was possible on one console, which allowed us to decrypt the 1920 4BL. That 4BL shows a very interesting change: Whenever zero-pairing is detected, the update slots are not ignored anymore. Instead, if the update-slots are zero-paired as well, they are applied.
This change allows us to boot any kernel, provided we have a (1920 and up) 2BL/4BL set which runs on that machine. This is very important, because we can build up an image now which runs into the 4532 kernel, regardless on how many update fuses are set. However, the 2BL revocation process must be passed, so we are not completely independent of the fuses, still. But since we use zero-pairing, the SMC hash doesn't matter anymore (there are other ways to work around the SMC hash problem, like the TA, but we get this for free). Still, we boot into the MfgBootLauncher (into the 4532 version now, which does a red/green blinking thingie - you'll notice once you see it, it's very unique and doesn't look like any RROD or so). But thanks to the SMC/JTAG hack described above, this allows us to launch our attack from this state.
Newer consoles (which have the TA fix) don't run 1920 anymore. They run, for example, 1921. The problem is that we cannot run HV code on these machines, so we don't know the CPU key. However, when comparing the 1921 and 1920 2BL (which we can still decrypt), the only change is the addition of the timing attack fix (i.e. replacing two memcmp instances with a memdiff function). Also, we know the expected hash value of the decrypted 4BL. Based on a 1920 4BL, and the guess what has changed functionally, and the new size of the 4BL, we were able to guess the modifications, which yields an image which passes the 2BL hash check. Note that this is not a hash collision - we did merely derive the exact image by applying the changes between 1920 2BL and 1921 2BL into 1920 4BL, yielding the 1921 4BL.
The 1921 2BL theoretically runs on all machines so far, even TA-proof ones. But it crashes on Zephyr, Falcon and Jasper. The reason is the VM code, which doesn't cover the different GPUs (Xenon has 90nm GPU, Zephyr and Falcon have 80nm, Jasper has 60nm, so there are 3 GPU revisions in total).
But the step from 1921 to, say, 4558, is even smaller. It's just the different version number, plus a slight difference in the memcpy code, which again can be ported over from 2BL.
Jasper's 67xx is a different thing, since this code adds support for the largeblock flash used in "Arcade"-Jasper units. We have used some magic to retrieve this code.
So we now have ALL 4BL versions. Isn't that great? It means that ALL machines can run the 4532 kernel. The good news is also that the 4532 kernel supports falcon consoles, and runs long enough to also work on jasper consoles (because we exploit way before the different GPU is touched at all).
Credits to Team-Xecuter.
- 3useful
- 0not useful
#4. Posted:
Status: Offline
Joined: May 05, 201113Year Member
Posts: 863
Reputation Power: 39
Status: Offline
Joined: May 05, 201113Year Member
Posts: 863
Reputation Power: 39
Thank you very much, sir.
I feel more educated
Edit: Oh wow, I'll read through that one in a minute xD
Thank you
I feel more educated
Edit: Oh wow, I'll read through that one in a minute xD
Thank you
- 0useful
- 0not useful
#5. Posted:
Status: Offline
Joined: Jan 24, 201410Year Member
Posts: 138
Reputation Power: 8
Status: Offline
Joined: Jan 24, 201410Year Member
Posts: 138
Reputation Power: 8
TGK wrote Technical Explanation:
RGH:
THE EXPLOIT:
Gligli was able to get the CPU to get past the CD authentication by pulsing the reset at just the right time, for just the right length. The idea is the successful authentication eventually boils down to a 'branch-if-register-equals-zero' instruction on a certain CPU register. If this register has been reset, it will think the check passed. The problem is other CPU registers have likely been reset by the glitch. If any of these registers contain values that are needed for continued booting, the system will crash. If no 'critical' registers got reset, the system will successfully continue loading the CD.
HOW ITS DONE:
Cjack's research discovered some methods of slowing down the CPU, including asserting CPU_PLL_BYPASS. Asserting this signal slows CPU execution down by a factor of 128. Gligli used this to his advantage to slow the CPU execution down enough that he could successfully exploit the system, originally using an external microcontroller. In order to make it more user-friendly, the exploit was ported to a CPLD, that ran off the 48 MHz SMC standby clock. This clock was chosen simply because it was convienient. The clock is fast enough that it provides sufficient temporal resolution to pulse the reset appropriately for the exploit.
HOW ITS DONE (SLIMS):
Unfortunately, Gligli and Tiros were unable to locate CPU_PLL_BYPASS on the Trinity system, and decided to take a different approach. Instead of using the CPU_PLL_BYPASS to slow down the core directly, they changed the frequency of the CPU reference clock (from the HANA) going into the CPU PLL which generates the core clock. So, if you slow down the CPU reference clock from 100 MHz to 100/128 ~= 781 Khz, then you've accomplished the same thing, the core will run the same speed as if you had used CPU_PLL_BYPASS. Unfortunately this doesn't work. Tiros found the HANA PLL wasn't stable at arbitrarily slow frequencies. It would wander and drift randomly around the target frequency. This meant the CPU execution speed was constantly changing, and when you tried to time the reset pulse, the CPU was rarely at the correct instruction anymore.
JTAG:
To understand this new hack, let's first look at what made the KK exploit possible: A fatal bug in the Hypervisor's Syscall Handler, introduced in the 4532 kernel update. For more details, take a look at [ Register or Signin to view external links. ] which explains the problem in great detail.
The KK exploit exploited the kernel bug by modifying an unsigned shader to do a series of so-called memory exports, an operation where the GPU can write the results of a pixel or vertex shader into physical memory. The shader was written to overwrite the Idle-thread context to make the kernel jump at a certain position in memory, with some registers under our control. In order to control all registers, a second step was necessary, this time by jumping into the interrupt restore handler. This finally allows all CPU general purpose registers to be filled with determined values. The program counter could be restored to a syscall instruction in the kernel, with register values prefilled so that they would trigger the exploit.
The exploit basically allows jumping into any 32-bit address in hypervisor space. To jump into an arbitrary location, we just used a "mtctr, bctr"-register pair in hypervisor, which would redirect execution flow into any 64-bit address. This is important, since we need to clear the upper 32bit (i.e., set the MSB to disable the HRMO), since the code we want to jump to is in unencrypted memory.
This code would usually load a second-stage loader, for example XeLL, into memory, and start it. XeLL would then attempt to catch all cpu threads (because just the primary thread is affected by our exploit), and load the user code, for example from DVD.
So, the following memory areas are involved:
- Idle Thread context, at 00130360 in physical memory
This stores the stack pointer (and some other stuff) when the idle thread was suspended. By changing the stack pointer, and then waiting for the kernel to switch to the idle thread, the stack pointer can be brought into our control. Part of the context switch is also a context restore, based on the new stack pointer.
- Context restore, part 1, arbitrary location, KK expl. uses 80130AF0
The thread-context restore doesn't restore all registers, but let's us control the NIP (the "next instruction" pointer). We setup NIP to point to the interrupt context restore, which does a SP-relative load of most registers.
- Context restore, part 2, same base location as part 1
We just re-use the same stack pointer, because the areas where the first context restore and the interrupt context restore load from do not overlap. The second context restore allows us to pre-set all registers with arbitrary 64 bit values.
- The HV offset, at 00002080 for syscall 0x46 on 4532
Because of the HV bug, we can write this offset into unencrypted memory, giving us the possibility to jump into any location in the hypervisor space (i.e. with a certain "encryption prefix"). We usually write 00000350 here, which points to a "mtctr %r4; bctr" instruction pair in hypervisor, which lets us jump to %r4.
- Our loader code, at an arbitrary location
This code will be executed from hypervisor. It's the first of our code which will be executed. %r4 on the syscall entry has to point to this code.
Only the the idle thread context and the HV offset have fixed addresses. It's easily possible to merge this so that only two distinct blocks needs to be written into memory, but it's not possible to merge this into a single block.
Fortunately, the NAND controller allows doing DMA reads where the payload data is split from the "ECC"-data. Each page has 512 bytes of payload, and 16 bytes of ECC data. Thus, a single DMA read can be used to load all required memory addresses. We chose the Payload to read the Idle Thread Context, the Context Restores and the loader code. The ECC data will carry the HV offset.
To to a DMA read, the following NAND registers need to be written:
ea00c01c Address for Payload
ea00c020 Adresss for ECC
ea00c00c address inside NAND
ea00c008 command: read DMA (07)
The System Management Controller (SMC) is a 8051 core inside the Southbridge. It manages the power sequencing, and is always active when the Xbox 360 has (standby or full) power applied. It controls the frontpanel buttons, has a Realtime clock, decodes IR, controls temperatures and fans and the DVDROM tray. It talks with the frontpanel board to set the LEDs. When the system is running, the kernel can communicate with the SMC, for example to query the realtime clock, open the dvd-tray etc. This happens over a bidirectional FIFO (at ea001080 / ea001090). See the XeLL SMC code for details.
The SMC can read the NAND, because it requires access to a special NAND page which contains a SMC config block. This block contains calibration information for the thermal diodes, and the thermal targets etc. The 8051 core has access to NAND registers, which are mapped into the 8051 SFRs. It uses the same protocol as the kernel uses, so it writes an address, does a "READ" command, and then reads the data out of the "DATA" registers.
It could also do a "READ (DMA)"-command. So by hacking the SMC, we could make the box do the exploit, without any shader - the SMC can access the NAND controller all the time, even when the kernel is running (though it will likely interfere with the kernel). So, just we just trigger the DMA read when the kernel has been loaded, and everything is fine.
Right?
Well, that would be too easy. While most NAND registers are mapped, tyhe DMA address registers (1c, 20) are not. We can DMA, but only to the default address of zero (or wherever the kernel last DMAed into). Fail.
The GPU, the (H)ANA (the "scaler" - which in fact doesn't scale at all, it's "just" a set of DACs, and, since Zephyr, a DVI/HDMI encoder), the Southbridge and the CPU have their JTAG ports exposed on the board. They are unpopulated headers, but the signals are there. CPU JTAG is a different (complex) story, and SB JTAG doesn't offset much funcationality. ANA JTAG is boring since the ANA doesn't sit on any interesting bus. That leaves GP
JTAG.
GPU JTAG was reverse-engineered until a point where arbitrary PCI writes are possible, up to a certain point. So that makes it possible to talk to each PCI device in the system, including the NAND controller. So we can simply use THAT instead of the SMC to start the DMA?
Right?
Well, not quite. The problem is that the "VM code", the code which does a lot of system initialization, like the memory (that code is also responsible for generating the 01xx "RROD"-Errors), sets a certain bit in some GPU
register, which disables the JTAG interface. The VM code is executed way before the kernel is active. So this is fail, too.
But the combination works - by programming the DMA target address via JTAG, and launching the attack via SMC. The attack can be launched as soon as the kernel is running, and quite early, it does query the SMC for the RTC. We abuse this call to start the attack instead, which is a perfect point for us.
But how do we run an exploitable kernel at all? Most machines are updated already. Let me refresh your knowledge about the boot process again:
1BL (Bootrom)
Buried deep inside the CPU die, this ~32kb of ROM code is responsible for reading the 2BL from NAND-flash and decrypts it into the embedded SRAM in the CPU. It verifies the hash of the decrypted image with a signed block at the beginning of the 2BL, and will stop execution of this hash mismatches. This code also contains a number of test functions, which can be activated by pulling the 5 "POST IN"-pins, which are available on the backside of the PCB. None of these tests looks particulary interesting (from an exploitation perspective) - they mostly seem to be related to the FSB (the bus between CPU and GPU). This code is fixed, and all systems use identical code here.
2BL ("CB")
This code is usually located at 0x8000 in NAND flash. It's decrypted by 1BL, and runs from internal SRAM.
It does a basic hardware initialization, and contains the "fuse check code", which verifies the "2BL version". The fuses store the expected version. The 2BL stores a "Version" and a "AllowedMask" (=bitfield), and this is usually stored at address 0x3B1 / 0x3B2..0x3B3.
Xenon Zephyr Falcon Jasper
2 0003 1888, 1901, 1902
4 1920 "new zeropair code"
5 0010 1921 4558 5760,5761,5770 6712 TA-fixed
It then verifies the pairing information stored in the 2BL header. Part of this verification is a checksum check of the NAND area which was used to load the SMC code from.
It also contains a virtual machine and some code to run on this machine. The virtual machine code, which is pretty complicated, does the following things:
- Initialization of the PCI-Bridge
- Disable the GPU PCIE JTAG test port
- initialize the serial port
- talk to the SMC to clear the "handshake"-bit
- initialize memory
- hopefully not: generate RROD if memory init fails
After that, the external (512MB) memory will be initialized and usable. 2BL then decrypts the 4BL into this memory. Memory encryption will already be enabled - no executable code is *ever* written unencrypted.
4BL ("CD")
This code is responsible for checking and unpacking 5BL, as well as applying update patches. First, the fuses are read to determine the console "Update Sequence", a number which basically counts the number of updates installed. Since updates are, in the same way as 2BL, paired to a console, this allows to configure the console in a way that no old update will be used. So each update slot stores the maximum value of burned fuses (well, essentially the exact value). The base kernel also has an associated value, usually zero, but this can be changed in the 2BL pairing data block. This is what the timing-attack increments, in order to revert to the 1888 kernel.
5BL ("HV/Kernel")
The HV and kernel are merged into a single image, which is compressed with a proprietary algorithm (LDIC).
6BL ("CF"), 7BL ("CG")
This is part of a system upgrade. Each console has a so-called "Base Kernel", which is the 1888 kernel which was available on launch back in 2005. Then there are two "update slots" - areas of 64k each (128k on Jasper), which contain a 6BL and 7BL. 6BL is code which applies the update, using a clever delta-compression. 7BL is the actual delta-compressed update, essentially a binary diff.
Oh, updates are >64k. So only the first 64k are actually stored in the update slots, the rest is stored in the filesystem as a special file. Since 6BL doesn't contain a filesystem parser, a blockmap is added in 6BL which points to the sectors which contain the rest of the update.
Zero-Pairing
Now there is a special situation: If the 2BL pairing block is all-zero, the pairing block will not be checked. However, a bit is set so that the kernel doesn't boot the dashboard binary, but a special binary called "MfgBootLauncher", where "Mfg" probably stands for "Manufacturing". So this is a leftover of the production process, where the flash image is used on all hardware, probably also before any CPU-key has been programmed.
By abusing this feature, this allows us easily to produce a flash image which runs on all hardware. However, 4BL won't look at update slots when it detects this mode, so we end up in the 1888 base kernel. And we can't run the dashboard, so it's impossible to escape this mode.
Previously, this has been deemed very uninteresting, because first the 1888 isn't exploitable by the KK exploit, and second because it's impossible to run the KK game anyway.
However, starting with 2BL version 1920, an interesting thing happened: The encryption key for 4BL is generated with the help of the CPU-key now. That means that without the CPU-key, it's not possible to decrypt the 4BL anymore. Note that each 2BL has exactly a single valid 4BL binary - 2BL contains a hardcoded hash for the 4BL, and doesn't use RSA.
However, zero'ed pairing data is detected, the CPU-key is NOT used in this process, like it was previously. That also means that you cannot just zero-out the pairing data anymore - the 4BL would be decrypted with the wrong key then. Instead you need to decrypt the 4BL (which requires knowing the CPU key), and re-encrypt it with the old algorithm.
However, 1920 was susceptible to the timing attack - so a CPU-key recovery was possible on one console, which allowed us to decrypt the 1920 4BL. That 4BL shows a very interesting change: Whenever zero-pairing is detected, the update slots are not ignored anymore. Instead, if the update-slots are zero-paired as well, they are applied.
This change allows us to boot any kernel, provided we have a (1920 and up) 2BL/4BL set which runs on that machine. This is very important, because we can build up an image now which runs into the 4532 kernel, regardless on how many update fuses are set. However, the 2BL revocation process must be passed, so we are not completely independent of the fuses, still. But since we use zero-pairing, the SMC hash doesn't matter anymore (there are other ways to work around the SMC hash problem, like the TA, but we get this for free). Still, we boot into the MfgBootLauncher (into the 4532 version now, which does a red/green blinking thingie - you'll notice once you see it, it's very unique and doesn't look like any RROD or so). But thanks to the SMC/JTAG hack described above, this allows us to launch our attack from this state.
Newer consoles (which have the TA fix) don't run 1920 anymore. They run, for example, 1921. The problem is that we cannot run HV code on these machines, so we don't know the CPU key. However, when comparing the 1921 and 1920 2BL (which we can still decrypt), the only change is the addition of the timing attack fix (i.e. replacing two memcmp instances with a memdiff function). Also, we know the expected hash value of the decrypted 4BL. Based on a 1920 4BL, and the guess what has changed functionally, and the new size of the 4BL, we were able to guess the modifications, which yields an image which passes the 2BL hash check. Note that this is not a hash collision - we did merely derive the exact image by applying the changes between 1920 2BL and 1921 2BL into 1920 4BL, yielding the 1921 4BL.
The 1921 2BL theoretically runs on all machines so far, even TA-proof ones. But it crashes on Zephyr, Falcon and Jasper. The reason is the VM code, which doesn't cover the different GPUs (Xenon has 90nm GPU, Zephyr and Falcon have 80nm, Jasper has 60nm, so there are 3 GPU revisions in total).
But the step from 1921 to, say, 4558, is even smaller. It's just the different version number, plus a slight difference in the memcpy code, which again can be ported over from 2BL.
Jasper's 67xx is a different thing, since this code adds support for the largeblock flash used in "Arcade"-Jasper units. We have used some magic to retrieve this code.
So we now have ALL 4BL versions. Isn't that great? It means that ALL machines can run the 4532 kernel. The good news is also that the 4532 kernel supports falcon consoles, and runs long enough to also work on jasper consoles (because we exploit way before the different GPU is touched at all).
Credits to Team-Xecuter.
Alot of information... would read but...
Aint nobody got time for that! xD
RGH = 10secs-5mins+ boot time
Jtag = Instant boot time cx
simple as that
(i'm aware you said "technical explanation" but Jesus! thats alot of info) o.o
- 0useful
- 0not useful
#6. Posted:
Status: Offline
Joined: Oct 03, 201311Year Member
Posts: 1,409
Reputation Power: 64
BonkersLobbies wroteTGK wrote snip
Alot of information... would read but...
Aint nobody got time for that! xD
RGH = 10secs-5mins+ boot time
Jtag = Instant boot time cx
simple as that
(i'm aware you said "technical explanation" but Jesus! thats alot of info) o.o
It's a good read if you want to get educated about the technicals of how it works.
- 0useful
- 0not useful
You are viewing our Forum Archives. To view or take place in current topics click here.