

# Intel<sup>®</sup> Open Source HD Graphics Programmers' Reference Manual (PRM)

### Volume 14: Observability Performance Counters

For the 2014-2015 Intel Atom<sup>™</sup> Processors, Celeron<sup>™</sup> Processors and Pentium<sup>™</sup> Processors based on the "Cherry Trail/Braswell" Platform (Cherryview/Braswell graphics)

October 2015, Revision 1.1



### **Creative Commons License**

**You are free to Share** - to copy, distribute, display, and perform the work under the following conditions:

- **Attribution.** You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
- **No Derivative Works.** You may not alter, transform, or build upon this work.

### **Notices and Disclaimers**

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Implementations of the I2C bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

\* Other names and brands may be claimed as the property of others.

**Copyright © 2015, Intel Corporation. All rights reserved.** 



## **Table of Contents**

| Тгасе                                                   | . 1 |
|---------------------------------------------------------|-----|
| Performance Visibility                                  | 1   |
| Motivation For Hardware-Assisted Performance Visibility | 1   |
| HW Support                                              | 1   |
| Performance Counter Registers                           | 1   |
| Performance Counter Reporting                           | 4   |
| MI_REPORT_PERF_COUNT                                    | 5   |
| Aggregating Counters                                    | 5   |
| Flexible EU Event Counters                              | 8   |



## Trace

This section contains the following contents:

• Performance Visibility

### **Performance Visibility**

### **Motivation For Hardware-Assisted Performance Visibility**

As the focus on GFX performance and programmability has increased over time, the need for hardware (HW) support to rapidly identify bottlenecks in HW and efficiently tune the work sent to same has become correspondingly important. This part of the PRM describes the HW support for Performance Visibility.

#### **HW Support**

This section contains various reporting counters and registers for hardware support for Performance Visibility.

#### **Performance Counter Registers**

The following Performance Statistics registers must be part of the power context: **OAPERF\_A0 - Aggregate Perf Counter A0** OAPERF\_A0\_UPPER - Aggregate Perf Counter A0 Upper DWord **OAPERF\_A1 - Aggregate Perf Counter A1** OAPERF\_A1\_UPPER - Aggregate Perf Counter A1 Upper DWord **OAPERF\_A2 - Aggregate Perf Counter 2** OAPERF\_A2\_UPPER - Aggregate Perf Counter A2 Upper DWord **OAPERF\_A3 - Aggregate Perf Counter A3** OAPERF\_A3\_UPPER - Aggregate Perf Counter A3 Upper DWord **OAPERF\_A4 - Aggregate Perf Counter A4** OAPERF\_A4\_UPPER - Aggregate Perf Counter A4 Upper DWord **OAPERF\_A5 - Aggregate Perf Counter A5** OAPERF\_A5\_UPPER - Aggregate Perf Counter A5 Upper DWord **OAPERF\_A6 - Aggregate Perf Counter A6** OAPERF\_A6\_UPPER - Aggregate Perf Counter A6 Upper DWord **OAPERF\_A7 - Aggregate Perf Counter A7 OAPERF\_A8 - Aggregate Perf Counter A8** OAPERF\_A8\_UPPER - Aggregate Perf Counter A8 Upper DWord **OAPERF A9 - Aggregate Perf Counter A9** 



- OAPERF\_A9\_UPPER Aggregate Perf Counter A9 Upper DWord
- OAPERF\_A10 Aggregate Perf Counter A10
- OAPERF\_A10\_UPPER Aggregate Perf Counter A10 Upper DWord
- OAPERF\_A11 Aggregate Perf Counter A11
- OAPERF\_A11\_UPPER Aggregate Perf Counter A11 Upper DWord
- OAPERF\_A12 Aggregate Perf Counter A12
- OAPERF\_A12\_UPPER Aggregate Perf Counter A12 Upper DWord
- OAPERF\_A13 Aggregate Perf Counter A13
- OAPERF\_A13\_UPPER Aggregate Perf Counter A13 Upper DWord
- OAPERF\_A14 Aggregate Perf Counter A14
- OAPERF\_A14\_UPPER Aggregate Perf Counter A14 Upper DWord
- OAPERF\_A15 Aggregate Perf Counter A15
- OAPERF\_A15\_UPPER Aggregate Perf Counter A15 Upper DWord
- OAPERF\_A16 Aggregate Perf Counter A16
- OAPERF\_A16\_UPPER Aggregate Perf Counter A16 Upper DWord
- OAPERF\_A17 Aggregate Perf Counter A17
- OAPERF\_A17\_UPPER Aggregate Perf Counter A17 Upper DWord
- OAPERF\_A18 Aggregate Perf Counter A18
- OAPERF\_A18\_UPPER Aggregate Perf Counter A18 Upper DWord
- OAPERF\_A19 Aggregate Perf Counter A19
- OAPERF\_A19\_UPPER Aggregate Perf Counter A19 Upper DWord
- OAPERF\_A20 Aggregate Perf Counter A20
- OAPERF\_A20\_UPPER Aggregate Perf Counter A20 Upper DWord
- OAPERF\_A21 Aggregate Perf Counter A21
- OAPERF\_A21\_UPPER Aggregate Perf Counter A21 Upper DWord
- OAPERF\_A22 Aggregate Perf Counter A22
- OAPERF\_A22\_UPPER Aggregate Perf Counter A22 Upper DWord
- OAPERF\_A23 Aggregate Perf Counter A23
- OAPERF\_A23\_UPPER Aggregate Perf Counter A23 Upper DWord
- OAPERF\_A24 Aggregate Perf Counter A24
- OAPERF\_A24\_UPPER Aggregate Perf Counter A24 Upper DWord
- OAPERF\_A25 Aggregate Perf Counter A25
- OAPERF\_A25\_UPPER Aggregate Perf Counter A25 Upper DWord
- OAPERF\_A26 Aggregate Perf Counter A26



- OAPERF\_A26\_UPPER Aggregate Perf Counter A26 Upper DWord
- OAPERF\_A27 Aggregate Perf Counter A27
- OAPERF\_A27\_UPPER Aggregate Perf Counter A27 Upper DWord
- OAPERF\_A28 Aggregate Perf Counter A28
- OAPERF\_A28\_UPPER Aggregate Perf Counter A28 Upper DWord
- OAPERF\_A29 Aggregate Perf Counter A29
- OAPERF\_A29\_UPPER Aggregate Perf Counter A29 Upper DWord
- OAPERF\_A30 Aggregate Perf Counter A30
- OAPERF\_A30\_UPPER Aggregate Perf Counter A30 Upper DWord
- OAPERF\_A31 Aggregate\_Perf\_Counter\_A31
- OAPERF\_A31\_UPPER Aggregate Perf Counter A31 Upper DWord
- OAPERF\_A32 Aggregate\_Perf\_Counter\_A32
- OAPERF\_A33 Aggregate\_Perf\_Counter\_A33
- OAPERF\_A34 Aggregate\_Perf\_Counter\_A34
- OAPERF\_A35 Aggregate\_Perf\_Counter\_A35
- OAPERF\_B0 Boolean\_Counter\_B0
- OAPERF\_B1 Boolean\_Counter\_B1
- OAPERF\_B2 Boolean\_Counter\_B2
- OAPERF\_B3 Boolean\_Counter\_B3
- OAPERF\_B4 Boolean\_Counter\_B4
- OAPERF\_B5 Boolean\_Counter\_B5
- OAPERF\_B6 Boolean\_Counter\_B6
- OAPERF\_B7 Boolean\_Counter\_B7



#### **Performance Counter Reporting**

When either the MI\_REPORT\_PERF\_COUNT command is received or the internal report trigger logic fires, a snapshot of the performance counter values is written to memory. The format used by HW for such reports is selected using the Counter Select field within the OACONTROL register. The organization and number of report formats vary per project and are detailed in the following section. In the following layouts, the RPT\_ID is always stored in the lowest addressed DWORD.

OA contains logic to control when performance counter values are reported to memory. This functionality is controlled using the OA report trigger and OA start trigger registers. More detailed register descriptions are included in the Hardware Programming interface. The block diagram below illustrates the logic these registers control.



Note that counters which are 40 bits wide are split in the report format into low DWORD and high byte chunks for simplicity of HW implementation as well as SW-friendly alignment of report data. The performance counter read logically done before writing out report data for these 40-bit counters is guaranteed to be an atomic operation, the counter data is simply swizzled as it is being packed into the report.



#### MI\_REPORT\_PERF\_COUNT

#### MI\_REPORT\_PERF\_Count

#### **Aggregating Counters**

The table below described the desired high-level functionality from each of the aggregating counters.

Note that there is no counter of 2x2s sent to pixel shader, this is based on the assumption that the pixel shader invocation pipeline statistics counter increments for partially lit 2x2s as well and hence does not require a duplicate performance counter.

| Counter # | Event                                         | Description                                                                  |
|-----------|-----------------------------------------------|------------------------------------------------------------------------------|
| A0        | Render Engine Busy                            | Render engine is not idle.                                                   |
|           |                                               | GPU Busy aggregate counter doesn't increment under the following conditions: |
|           |                                               | 1. Context Switch in Progress.                                               |
|           |                                               | 2. GPU stalled on executing MI_WAIT_FOR_EVENT.                               |
|           |                                               | <ol><li>GPU stalled on execution MI_SEMAPHORE_MBOX.</li></ol>                |
|           |                                               | 4. RCS idle but other parts of GPU active (e.g. only media engines active)   |
| A1        | # of Vertex Shader<br>Threads Dispatched      | Count of VS threads dispatched to EUs                                        |
| A2        | # of Hull Shader<br>Threads Dispatched        | Count of HS threads dispatched to EUs                                        |
| A3        | # of Domain Shader<br>Threads Dispatched      | Count of DS threads dispatched to EUs                                        |
| A4        | # of GPGPU Threads<br>Dispatched              | Count of GPGPU threads dispatched to EUs                                     |
| A5        | # of Geometry<br>Shader Threads<br>Dispatched | Count of GS threads dispatched to EUs                                        |
| A6        | # of Pixel Shader<br>Threads Dispatched       | Count of PS threads dispatched to EUs                                        |
| A7        | Aggregating EU counter 0                      | User-defined (details in <u>Flexible EU Event Counters</u> section)          |
| A8        | Aggregating EU counter 1                      | User-defined (details in <u>Flexible EU Event Counters</u> section)          |
| A9        | Aggregating EU counter 2                      | User-defined (details in <u>Flexible EU Event Counters</u> section)          |
| A10       | Aggregating EU counter 3                      | User-defined (details in <u>Flexible EU Event Counters</u> section)          |
| A11       | Aggregating EU counter 4                      | User-defined (details in <u>Flexible EU Event Counters</u> section)          |



| Counter # | Event                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|-----------|------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| A12       | Aggregating EU counter 5           | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A13       | Aggregating EU<br>counter 6        | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A14       | Aggregating EU counter 7           | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A15       | Aggregating EU counter 8           | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A16       | Aggregating EU counter 9           | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A17       | Aggregating EU<br>counter 10       | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A18       | Aggregating EU counter 11          | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A19       | Aggregating EU counter 12          | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A20       | Aggregating EU counter 13          | User-defined (details in <u>Flexible EU Event Counters</u> section)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| A21       | 2x2s Rasterized                    | Count of the number of samples of 2x2 pixel blocks generated from the input geometry before any pixel-level tests have been applied. (Please note that 2x2s may be in terms of pixels or in terms of samples depending on project but are consistent between A21-A27.)                                                                                                                                                                                                                                                                                                                                                                                               |
| A22       | 2x2s Failing Fast pre-<br>PS Tests | Count of the number of samples failing fast "early" (i.e. before pixel shader execution) tests (counted at 2x2 granularity). (Please note that 2x2s may be in terms of pixels or in terms of samples depending on project but are consistent between A21-A27.)                                                                                                                                                                                                                                                                                                                                                                                                       |
| A23       | 2x2s Failing Slow<br>pre-PS Tests  | Count of the number of samples of failing slow "early" (i.e. before pixel shader execution) tests (counted at 2x2 granularity). (Please note that 2x2s may be in terms of pixels or in terms of samples depending on project but are consistent between A21-A27.) If a 2x2 sample partially fails the Z/STC test (i.e some pixels fail and some pixels pass), the OA slow fail counter value will be incorrect.                                                                                                                                                                                                                                                      |
| A24       | 2x2s Killed in PS                  | Number of samples entirely killed in the pixel shader as a result of explicit<br>instructions in the kernel (counted in 2x2 granularity). (Please note that 2x2s<br>may be in terms of pixels or in terms of samples depending on project but are<br>consistent between A21-A27.)<br>Behavior of this counter changes when MSAA is enabled based on PS<br>dispatch mode (per-sample versus per-pixel). This leads to discrepancies in<br>how A24/A25 increment versus how A21-A23 and A26/A27 increment when<br>both MSAA and per-pixel PS dispatch are enabled.<br>Counter may be inaccurate when pixel shader outputs output mask (e.g.<br>DX11 oMask declaration) |



| Counter # | Event                                       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|-----------|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| A25       | 2x2s Failing post-PS<br>Tests               | Number of samples that entirely fail "late" tests (i.e. tests that can only be<br>performed after pixel shader execution). Counted at 2x2 granularity. (Please<br>note that 2x2s may be in terms of pixels or in terms of samples depending on<br>project but are consistent between A21-A27.)<br>Counter may be inaccurate when pixel shader is allowed to modify output<br>mask (e.g. DX11 oMask declaration)<br>Behavior of this counter changes when MSAA is enabled based on PS<br>dispatch mode (per-sample versus per-pixel). This leads to discrepancies in<br>how A24/A25 increment versus how A21-A23 and A26/A27 increment when<br>both MSAA and per-pixel PS dispatch are enabled. |
| A26       | 2x2s Written To<br>Render Target            | Number of samples that are written to render target.(counted at 2x2 granularity). MRT case will report multiple writes per 2x2 processed by the pixel shader. (Please note that 2x2s may be in terms of pixels or in terms of samples depending on project but are consistent between A21-A27.)                                                                                                                                                                                                                                                                                                                                                                                                |
| A27       | Blended 2x2s<br>Written to Render<br>Target | Number of samples of blendable that are written to render target.(counted at 2x2 granularity). MRT case will report multiple writes per 2x2 processed by the pixel shader. (Please note that 2x2s may be in terms of pixels or in terms of samples depending on project but are consistent between A21-A27.)                                                                                                                                                                                                                                                                                                                                                                                   |
| A28       | 2x2s Requested from<br>Sampler              | Aggregated total 2x2 texel blocks requested from all EUs to all instances of sampler logic.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| A29       | Sampler L1 Misses                           | Aggregated misses from all sampler L1 caches. Please note that the number<br>of L1 accesses varies with requested filtering mode and in other<br>implementation specific ways. Hence it is not possible in general to draw a<br>direct relationship between A28 and A29. However, a high number of sampler<br>L1 misses relative to texel 2x2s requested frequently degrades sampler<br>performance.                                                                                                                                                                                                                                                                                           |
| A30       | SLM Reads                                   | Total read requests from an EU to SLM (including reads generated by atomic operations).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| A31       | SLM Writes                                  | Total write requests from an EU to SLM (including writes generated by atomic operations).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| A32       | Other Shader<br>Memory Accesses             | Reserved, can generate per HDC version by looking at (hdc_cput == 1) && (hdc_dest[2:0] == 0b000    hdc_dest[2:0] == 0b010).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| A32       | Other Shader<br>Memory Accesses             | Aggregated total requests from all EUs to memory surfaces other than render target or texture surfaces (e.g. shader constants).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| A34       | Atomic Accesses                             | Aggregated total atomic accesses from all EUs. This counter increments on atomic accesses to both SLM and URB.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| A35       | Barrier Messages                            | Aggregated total completed barriers (one per barrier).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| A35       | Barrier Messages                            | Aggregated total kernel barrier messages from all Eus (one per thread in barrier).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |



#### **Flexible EU Event Counters**

Since EU performance events are most interesting in many cases when aggregated across all EUs and many interesting EU performance events are limited to certain APIs (e.g. hull shader kernel stats only applicable when running a DX11+ workload), CHV, BSW adds some additional flexibility to the aggregated counters coming from the EU array.

The following block diagram shows the high-level flow that generates each flexible EU event.

Note that no support is provided for differences between flexible EU event programming between EUs because the resulting output from each EU is eventually merged into a single OA counter anyway.





#### **Supported Increment Events**

| Increment Event                                                       | Encoding | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|-----------------------------------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| EU FPU0 Pipeline<br>Active                                            | 060000   | Signal that is high on every EU clock where the EU FPU0 pipeline is actively executing a Gen ISA instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                                       |          | Please note that FPU0 in this EU is the closest match to previous Gen EU's FPU pipe.                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| EU FPU1 Pipeline<br>Active                                            | 0b0001   | Signal that is high on every EU clock where the EU FPU1 pipeline is actively executing a Gen ISA instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                                       |          | Please note that FPU1 in this EU is the closest match to previous Gen EU's EM pipe.                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| EU SEND Pipeline<br>Active                                            | 0b0010   | Signal that is high on every EU clock where the EU send pipeline is actively executing a Gen ISA instruction. Only fine event filters 0b0000,0b0101, 0b0110, 0b0111, 0b1000, 0b1001, and 0b1010 are supported with this increment event.                                                                                                                                                                                                                                                                                                       |
| EU FPU0 & FPU1<br>Pipelines<br>Concurrently Active                    | 0b0011   | Signal that is high on every EU clock where the EU FPU0 and FPU1 pipelines<br>are both actively executing a Gen ISA instruction. Only coarse event filters<br>0b0000, 0b0111, and 0b1000 are supported with this increment event. Only<br>fine event filters 0b0000, 0b0111, 0b1000, 0b1001, and 0b1010 are supported<br>with this increment event.                                                                                                                                                                                            |
| Some EU Pipeline<br>Active                                            | 0b0100   | Signal that is high on every EU clock where at least one EU pipeline is actively executing a Gen ISA instruction. Only coarse event filters 0b0000, 0b0111, and 0b1000 are supported with this increment event. Only fine event filters 0b0000,0b0101, 0b0110, 0b0111, 0b1000, 0b1001, and 0b1010 are supported with this increment event.                                                                                                                                                                                                     |
| At Least 1 Thread<br>Loaded But No EU<br>Pipeline Active              | 0b0101   | Signal that is high on every EU clock where at least one thread is loaded but<br>no EU pipeline is actively executing a Gen ISA instruction. Only coarse event<br>filters 0b0000, 0b0111, and 0b1000 are supported with this increment event.<br>Only fine event filters 0b0000, 0b0111, 0b1000, 0b1001, and 0b1010 are<br>supported with this increment event.                                                                                                                                                                                |
| Threads loaded<br>integrator = = max<br>threads for current<br>HW SKU | 0Ь1000   | Implies an accumulator which increases every EU clock by the number of<br>loaded threads, signal pulses high for one clock when the accumulator<br>exceeds a multiple of the number of thread slots (e.g. for a 8-thread EU, signal<br>pulses high every clock where the increment causes a 3-bit accumulator to<br>overflow). Only coarse event filters 0b0000, 0b0111, and 0b1000 are<br>supported with this increment event. Only fine event filters 0b0000, 0b0111,<br>0b1000, 0b1001, and 0b1010 are supported with this increment event. |



#### **Supported Coarse Event Filters**

| Coarse Event Filter                     | Encoding | Notes                                                                                                                             |
|-----------------------------------------|----------|-----------------------------------------------------------------------------------------------------------------------------------|
| No mask                                 | 0b0000   | Never masks increment event.                                                                                                      |
| Currently executing thread came from VS | 0b0001   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of VS.                          |
| Currently executing thread came from HS | 0b0010   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of HS.                          |
| Currently executing thread came from DS | 0b0011   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of DS.                          |
| Currently executing thread came from GS | 0b0100   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of GS.                          |
| Currently executing thread came from PS | 0b0101   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of PS.                          |
| Currently executing thread came from TS | 0b0110   | Masks increment event unless the FFID which dispatched the currently executing thread equals FFID of TS.                          |
| Row = 0                                 | 0b0111   | Masks increment event unless the row ID for this EU is 0 (control register is in TDL so only have to check within quarter-slice). |
| Row = 1                                 | 0b1000   | Masks increment event unless the row ID for this EU is 1 (control register is in TDL so only have to check within quarter-slice). |



#### **Fine Event Filters**

| Fine Event Filter                                          | Encoding | Notes                                                                                                                                                                                                                                                                                                                                                                             |
|------------------------------------------------------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| None                                                       | 0b0000   | Never mask increment event.                                                                                                                                                                                                                                                                                                                                                       |
| Cycles where<br>hybrid instructions<br>are being executed  | 0b0001   | Masks increment event unless the instruction(s) being executed on the pipeline(s) selected by the increment event are hybrid instructions.                                                                                                                                                                                                                                        |
|                                                            |          | Filter behaves unreliably when shader ISA uses 64-bit immediate values.                                                                                                                                                                                                                                                                                                           |
| Cycles where<br>ternary instructions<br>are being executed | 0b0010   | Masks increment event unless the instruction(s) being executed on the pipeline(s) selected by the increment event are ternary instructions.                                                                                                                                                                                                                                       |
| Cycles where<br>binary instructions<br>are being executed  | 0b0011   | Masks increment event unless the instruction(s) being executed on the pipeline(s) selected by the increment event are binary instructions.                                                                                                                                                                                                                                        |
| Cycles where mov<br>instructions are<br>being executed     | 0b0100   | Masks increment event unless the instruction(s) being executed on the pipeline(s) selected by the increment event are mov instructions.                                                                                                                                                                                                                                           |
| Cycles where<br>sends start being<br>executed              | 0b0101   | Masks increment event unless the instruction(s) being executed on the pipeline(s) selected by the increment event are send start of dispatch. Note that if this fine event filter is used in combination with increment events not related to the EU send pipeline (e.g. FPU0 active), the associated flexible event counter will increment in an implementation-specific manner. |
| EU# = 0b00                                                 | 0b0111   | Masks increment event unless the EU number for this EU is 0b00.                                                                                                                                                                                                                                                                                                                   |
| EU# = 0b01                                                 | 0b1000   | Masks increment event unless the EU number for this EU is 0b01.                                                                                                                                                                                                                                                                                                                   |
| EU# = 0b10                                                 | 0b1001   | Masks increment event unless the EU number for this EU is 0b10.                                                                                                                                                                                                                                                                                                                   |
| EU# = 0b11                                                 | 0b1010   | Masks increment event unless the EU number for this EU is 0b11.                                                                                                                                                                                                                                                                                                                   |