
Competition Phases
This competition invites researchers and professionals to develop a comprehensive system design for a chip optimized to efficiently run state-of-the-art computer vision models, specifically Deep Neural Networks (DNNs). Participants are expected to propose a design that addresses real-world application requirements, supported by clear justifications and a well-structured breakdown. The goal is to ensure that even fresh graduates can implement and test the proposed system effectively. The competition places a strong emphasis on in-depth research, culminating in a practical chip design tailored for real-world computer vision tasks, and backed by a robust software and firmware stack.
Phase I
Research & System Design: Active
Computer vision applications are varied and include essential tasks such as object
detection, image segmentation, motion tracking, scene reconstruction, and
anomaly detection, among others. Each of these applications requires specific
neural network operations that demand high processing efficiency and low latency
to be effective in real-time. Our ultimate goal is an ASIC chip designed for computer
vision that supports a diverse set of operations to enable these applications run smoothly and accurately.
The ASIC chip is mainly a SoC integrating a CPU, interconnect and various
accelerators such as NPUs, GPUs, DPUs..etc.
Such a SoC should be able to run linux and python. It should also be able to run inference using a well known stack such as pytorch, keras, tensorflow, TVM..etc.
The SoC must also demonstrate its capability by performing competitively
on the MLPerf Inference Benchmark
Pahse II
Modelling : TBD
That need hard ware acceleration on such a chip include convolutions and depthwise separable convolutions, as they are critical for feature extraction in Convolutional Neural Networks (CNNs). This support is essential, given that convolutions are computation-intensive but crucial for detecting patterns, edges, and textures across applications like object detection and face recognition. Additionally, attention mechanisms found in Vision Transformers [1] (ViTs) are increasingly used in state-of-the-art models, as they improve performance on complex recognition and classification tasks by allowing the model to focus on important image regions. These mechanisms rely on efficient matrix multiplication and memory management, both of which should be optimized on the chip.
Phase III
Development : TBD
The ASIC should also support batch normalization and activation functions (such as ReLU, Swish, and GELU) to enable fast, smooth training and inference cycles. Pooling layers and residual connections are also necessary, as they facilitate hierarchical feature extraction and allow for deeper networks, which are common in advanced applications. Moreover, many computer vision applications, such as real-time object detection and semantic segmentation, also require the chip to handle upsampling and downsampling operations, which allow the network to focus on different spatial resolutions. Since many computer vision applications are deployed in resource-constrained environments, the chip must also support quantization and mixed-precision arithmetic. These techniques reduce the model size and enhance power efficiency, which is particularly advantageous for edge applications like autonomous vehicles and surveillance systems. By incorporating these capabilities, the ASIC can deliver high performance, and energy efficiency, ensuring it meets the diverse needs of modern computer vision tasks across a wide range of devices and platforms.
Phase IV
Verification : TBD
A complete software stack to map high-level models to the programming interface of the
target accelerator. Frameworks like TensorFlo and PyTorch allow easy model
expression and scalable training, enabling seamless compilation for deployment
on the target accelerator. Below is an example of an open sourced deep learning
acceleration stack called
VTA
- Figure 1: The VTA stack showing the various hardware and software components involved in processing workloads. Source
Objectives

Complete System Specs
Define system specs clearly showing interaction and integration requirements between system components.

Application-Focused Design
Create a research-backed design for a computer vision application that addresses specified application requirements.

Justification
Provide a well-supported explanation for how and why this design meets the requirements.

Educational Task Breakdown
Break down the design into structured tasks that fresh graduates in software, hardware, and firmware can understand and implement.

Testing & Validation Plan
Develop a robust testing and validation plan to guide implementation and ensure quality.
Contest Submission Guidelines
Research & Design Proposal
- Research: Participants are expected to conduct in-depth research on state of the art SoC design targeting computer vision. They are allowed to choose an already existing designs that meet the requirements. In addition, they have their space for innovation and creativity to build upon the state of the art. In the latter case, a justification based on thorough analysis is further required; this effort is taken into consideration when evaluating a contest entry. Accordingly, the first section should provide a brief literature review of chip architecting efforts that have been made.
- Design Proposal: Provides a conclusion on the research subsection; participants are expected to either select one or a collection of papers presented in th research subsection as a candidate for the participant’s proposal or it can be an innovation of the participants where they can modify the design to yield better figure of merits for the design.
- Justification: Provides justification for selecting the candidate paper and/or the innovative modifications introduced by the participant. It has to be based on an elaborate comparison between the references used by the participant
Applicability
- The proposed system must be technically realizable. Participants are required to submit a script, implemented in Python or any other suitable programming language, that simulates the execution of inference on an open-source model of their choice, targeting the proposed instruction set architecture (ISA) of the chip. Specifically, given a representative model—such as ResNet—implemented in a common machine learning framework (e.g., PyTorch or TensorFlow), participants must demonstrate that the model can be mapped to and executed on their proposed hardware design.
Ownership
- Any part of component in the system should be reachable meaning if a component in the system belongs to a certain vendor we should be able to get that component with an appropriate license agreement without or with minimal cost.
Task oriented
- System tasks should be well described and each task should be designed such that one engineer is allocated for it and interaction and integration with other tasks should be well explained.