Competition Phases

This competition invites researchers and professionals to develop a comprehensive system design for a chip optimized to efficiently run state-of-the-art computer vision models, specifically Deep Neural Networks (DNNs). Participants are expected to propose a design that addresses real-world application requirements, supported by clear justifications and a well-structured breakdown. The goal is to ensure that even fresh graduates can implement and test the proposed system effectively. The competition places a strong emphasis on in-depth research, culminating in a practical chip design tailored for real-world computer vision tasks, and backed by a robust software and firmware stack.

OverviewBreakdownObjectivesContest Submission Guidelines

Phase I

Research & System Design: Active

Computer vision applications are varied and include essential tasks such as object detection, image segmentation, motion tracking, scene reconstruction, and anomaly detection, among others. Each of these applications requires specific neural network operations that demand high processing efficiency and low latency to be effective in real-time. Our ultimate goal is an ASIC chip designed for computer vision that supports a diverse set of operations to enable these applications run smoothly and accurately. The ASIC chip is mainly a SoC integrating a CPU, interconnect and various accelerators such as NPUs, GPUs, DPUs..etc. Such a SoC should be able to run linux and python. It should also be able to run inference using a well known stack such as pytorch, keras, tensorflow, TVM..etc.

The SoC must also demonstrate its capability by performing competitively on the MLPerf Inference Benchmark

Pahse II

Modelling : TBD

That need hard ware acceleration on such a chip include convolutions and depthwise separable convolutions, as they are critical for feature extraction in Convolutional Neural Networks (CNNs). This support is essential, given that convolutions are computation-intensive but crucial for detecting patterns, edges, and textures across applications like object detection and face recognition. Additionally, attention mechanisms found in Vision Transformers [1] (ViTs) are increasingly used in state-of-the-art models, as they improve performance on complex recognition and classification tasks by allowing the model to focus on important image regions. These mechanisms rely on efficient matrix multiplication and memory management, both of which should be optimized on the chip.

Phase III

Development : TBD

The ASIC should also support batch normalization and activation functions (such as ReLU, Swish, and GELU) to enable fast, smooth training and inference cycles. Pooling layers and residual connections are also necessary, as they facilitate hierarchical feature extraction and allow for deeper networks, which are common in advanced applications. Moreover, many computer vision applications, such as real-time object detection and semantic segmentation, also require the chip to handle upsampling and downsampling operations, which allow the network to focus on different spatial resolutions. Since many computer vision applications are deployed in resource-constrained environments, the chip must also support quantization and mixed-precision arithmetic. These techniques reduce the model size and enhance power efficiency, which is particularly advantageous for edge applications like autonomous vehicles and surveillance systems. By incorporating these capabilities, the ASIC can deliver high performance, and energy efficiency, ensuring it meets the diverse needs of modern computer vision tasks across a wide range of devices and platforms.

Phase IV

Verification : TBD

A complete software stack to map high-level models to the programming interface of the target accelerator. Frameworks like TensorFlo and PyTorch allow easy model expression and scalable training, enabling seamless compilation for deployment on the target accelerator. Below is an example of an open sourced deep learning acceleration stack called VTA

Figure 1: The VTA stack showing the various hardware and software components involved in processing workloads. Source

Objectives

Complete System Specs

Define system specs clearly showing interaction and integration requirements between system components.

Application-Focused Design

Create a research-backed design for a computer vision application that addresses specified application requirements.

Justification

Provide a well-supported explanation for how and why this design meets the requirements.

Educational Task Breakdown

Break down the design into structured tasks that fresh graduates in software, hardware, and firmware can understand and implement.

Testing & Validation Plan

Develop a robust testing and validation plan to guide implementation and ensure quality.

Contest Submission Guidelines

Research & Design Proposal

Research & Design Proposal

Research: Participants are expected to conduct in-depth research on state of the art SoC design targeting computer vision. They are allowed to choose an already existing designs that meet the requirements. In addition, they have their space for innovation and creativity to build upon the state of the art. In the latter case, a justification based on thorough analysis is further required; this effort is taken into consideration when evaluating a contest entry. Accordingly, the first section should provide a brief literature review of chip architecting efforts that have been made.
Design Proposal: Provides a conclusion on the research subsection; participants are expected to either select one or a collection of papers presented in th research subsection as a candidate for the participant’s proposal or it can be an innovation of the participants where they can modify the design to yield better figure of merits for the design.
Justification: Provides justification for selecting the candidate paper and/or the innovative modifications introduced by the participant. It has to be based on an elaborate comparison between the references used by the participant

Applicability

The proposed system must be technically realizable. Participants are required to submit a script, implemented in Python or any other suitable programming language, that simulates the execution of inference on an open-source model of their choice, targeting the proposed instruction set architecture (ISA) of the chip. Specifically, given a representative model—such as ResNet—implemented in a common machine learning framework (e.g., PyTorch or TensorFlow), participants must demonstrate that the model can be mapped to and executed on their proposed hardware design.

Ownership

Any part of component in the system should be reachable meaning if a component in the system belongs to a certain vendor we should be able to get that component with an appropriate license agreement without or with minimal cost.

Task oriented

System tasks should be well described and each task should be designed such that one engineer is allocated for it and interaction and integration with other tasks should be well explained.