50 Core Transformer Multiple Choice Questions with Answers

This article lists 50 Core Transformer MCQs for engineering students. All the Core Transformer Questions & Answers given below include a hint and a link wherever possible to the relevant topic. This is helpful for users who are preparing for their exams, interviews, or professionals who would like to brush up on the fundamentals of the Core Transformer.

A core transformer is a type of neural network architecture that uses the self-attention mechanism to process sequential data, such as natural language sentences or audio signals. The core transformer architecture consists of a series of identical layers, each of which includes a self-attention mechanism followed by a feedforward neural network.

The self-attention mechanism enables the model to attend to different parts of the input sequence based on their relevance to the task at hand, while the feedforward network provides a non-linear transformation of the attention output. The output of each layer is then fed into the next layer, allowing the model to capture increasingly complex relationships between the input tokens.

One of the key advantages of the core transformer architecture is its ability to process sequential data in parallel, rather than sequentially like traditional recurrent neural networks. This makes it much more efficient to train and allows it to handle longer input sequences. Additionally, the self-attention mechanism enables the model to capture both short-range and long-range dependencies between the input tokens, making it well-suited for tasks that require a global understanding of the input sequence.

Overall, the core transformer architecture has proven to be highly effective and has achieved state-of-the-art performance on many natural language processing tasks. It has also been extended and modified in various ways to address different applications and improve its performance, such as incorporating convolutional neural networks or using different types of attention mechanisms.

1). The ____________________ projects the input embedding to a query vector that is used to compute attention scores with the key vector?

Query vector

Key vector

Value vector

Weighted sum

2). Which one of the following components in core transformer converts input tokens into fixed-length vectors?

Positional encoding

Input embedding

Encoder layers

Multi-head attention

3). A mechanism that allows the model to attend to different parts of the input sequence based on their relevance to the task at hand is known as ________________?

Self-Attention mechanism

Multi-Head attention

Positional encoding

Encoder layers

4). Which one of the following maps input tokens to fixed-length vectors?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

5). The distributed representation, context-awareness are the principles of _______________?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

6). Compute attention scores for each input token is the operation of __________________?

Input embedding

Self-attention mechanism

Residual connections

Layer normalization

7). What is the purpose of a core in a transformer?

To insulate the windings from each other

To provide a path for the magnetic flux

To connect the primary and secondary windings

To regulate the output voltage

8). Which type of core material has the lowest core loss?

Laminated

CRGO

Amorphous

Ferrite

9). The large-scale language models that are based on the transformer architecture is known as ________________?

Decoder layers

Transformer-based language models

Pretraining and fine-tuning

Encoder layers

10). The ____________________ projects the input embedding to a key vector that is used to compute attention scores with the query vector?

Query vector

Key vector

Value vector

Weighted sum

11). Which one of the following components adds information about the position of each token in the input sequence?

Positional encoding

Input embedding

Encoder layers

Multi-head attention

12). A series of layers that each include a self-attention mechanism, a multi-head attention mechanism, and a feedforward neural network is known as ________________?

Decoder layers

Transformer-based language models

Pretraining and fine-tuning

Encoder layers

13). Which one of the following adds position information to input embeddings?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

14). Compute attention scores for each head and concatenate is the operation of __________________?

Input embedding

Multi-head attention

Residual connections

Layer normalization

15). A two-stage training process where a transformer-based language model is first pre-trained on a large corpus of text and then fine-tuned on a smaller task-specific dataset is known as ________________?

Decoder layers

Transformer-based language models

Pretraining and fine-tuning

Encoder layers

Core Transformer MCQ for Quiz

16). The improve model convergence, reduce internal covariate shift are the principles of _______________?

Feedforward neural network

Self-attention mechanism

Residual connections

Layer normalization

17). The look up embeddings for each input token is the operation of __________________?

Input embedding

Self-attention mechanism

Residual connections

Layer normalization

18). Apply feedforward network to output of attention layer is the operation of __________________?

Feedforward neural network

Residual connections

Layer normalization

Multi-head attention

19). Which type of winding configuration is commonly used in high voltage transformers?

Wound

Layer

Helical

Cylindrical

20). The ____________________ projects the input embedding to a value vector that is used to compute the weighted sum of the input embeddings based on the attention scores?

Query vector

Key vector

Value vector

Weighted sum

21). Which one of the following components consist of self-attention and feedforward neural network components?

Positional encoding

Input embedding

Encoder layers

Multi-head attention

22). A series of layers that each include a self-attention mechanism and a feedforward neural network is known as ________________?

Self-Attention Mechanism

Multi-Head Attention

Positional Encoding

Encoder Layers

23). Which one of the following computes attention scores based on pairwise similarities between input tokens?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

24). The prevent vanishing gradients, stabilize training are the principles of _______________?

Feedforward neural network

Self-attention mechanism

Residual connections

Input embedding

25). Add the output of each sublayer to its input is the operation of __________________?

Feedforward neural network

Residual connections

Layer normalization

Multi-head attention

26). What is the purpose of insulation in a transformer?

To reduce the size and weight of the transformer

To prevent electrical breakdown between windings

To increase the efficiency of the transformer

To reduce the cost of the transformer

27). What is the purpose of a tap changer in a transformer?

To adjust the frequency of the transformer

To adjust the voltage ratio of the transformer

To provide a path for the magnetic flux

To insulate the windings from each other

28). Which winding arrangement has a better cooling effect?

Layer

Helical

Disc

Cylindrical

29). Introduce additional complexity and modelling power are the principles of _______________?

Feedforward neural network

Self-attention mechanism

Multi-head attention

Input embedding

30). The ____________________ computes the similarity between the query vector and key vector for each pair of input tokens and normalizes the scores using the SoftMax function?

Query vector

Key vector

Value vector

Attention scores

Core Transformer MCQ for Exams

31). Which one of the following components computes multiple sets of attention scores in parallel?

Positional encoding

Input embedding

Encoder layers

Multi-head attention

32). Attend to multiple parts of the input sequence simultaneously are the principles of _______________?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

33). Normalize the output of each sublayer before adding residual connection is the operation of __________________?

Feedforward neural network

Residual connections

Layer normalization

Multi-head attention

34). The core rolled grain-oriented silicon steel is the material of ________________ component?

CRGO core

Amorphous core

Both a and b

None of the above

35). In which one of the following transformers the inductance is higher?

Shell transformer

Core transformer

Both a and b

None of the above

36). What is the purpose of interleaved windings in a transformer?

To reduce the leakage inductance

To increase the voltage ratio of the transformer

To reduce the size and weight of the transformer

To improve the cooling effect

37). Which type of winding is commonly used in high power transformers?

Wound

Layer

Helical

Cylindrical

38). What are the main types of core losses in transformers?

Copper losses and hysteresis losses

Eddy current losses and hysteresis losses

Eddy current losses and copper losses

Hysteresis losses and magnetic losses

39). Which type of transformer is more likely to experience core losses?

Low voltage transformers

High voltage transformers

Step-up transformers

Step-down transformers

40). The ____________________ computes the weighted sum of the value vectors using the attention scores, resulting in a context vector that captures the most important information in the input sequence for the given task?

Query vector

Key vector

Value vector

Weighted sum

41). Which one of the following components applies a non-linear transformation to the output of the attention layer?

Feedforward neural network

Input embedding

Encoder layers

Multi-head attention

42). A technique for adding information about the position of each token in the input sequence to the input embeddings is known as ________________?

Self-Attention Mechanism

Multi-Head Attention

Positional Encoding

Encoder Layers

43). Attend to different parts of input sequence, capture long-range dependencies are the principles of _______________?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

44). The electrical resistivity is high in ________________ component?

CRGO core

Amorphous core

Both a and b

None of the above

45). Which one of the following components normalizes the output of each layer before feeding it into the next layer?

Feedforward neural network

Residual connections

Layer normalization

Multi-head attention

46). Which one of the following transformers is used in high voltage applications?

Core transformer

Shell transformer

Both a and b

None of the above

47). Add positional encodings to input embeddings is the operation of __________________?

Input embedding

Positional encoding

Residual connections

Layer normalization

48). The capture absolute and relative position information are the principles of _______________?

Positional encoding

Self-attention mechanism

Multi-head attention

Input embedding

49). A variation of the self-attention mechanism that computes multiple sets of attention scores in parallel is known as ________________?

Self-Attention Mechanism

Multi-Head Attention

Positional Encoding

Encoder Layers

50). Which one of the following components allow gradients to flow more easily through the network?

Feedforward neural network

Residual connections

Encoder layers

Multi-head attention

Core Transformer Question & Answers

Core Transformer MCQ for Quiz

Core Transformer MCQ for Exams

For More MCQs