Core Transformer Question & Answers April 20, 2023 By Wat Electrical This article lists 50 Core Transformer MCQs for engineering students. All the Core Transformer Questions & Answers given below include a hint and a link wherever possible to the relevant topic. This is helpful for users who are preparing for their exams, interviews, or professionals who would like to brush up on the fundamentals of the Core Transformer. A core transformer is a type of neural network architecture that uses the self-attention mechanism to process sequential data, such as natural language sentences or audio signals. The core transformer architecture consists of a series of identical layers, each of which includes a self-attention mechanism followed by a feedforward neural network. The self-attention mechanism enables the model to attend to different parts of the input sequence based on their relevance to the task at hand, while the feedforward network provides a non-linear transformation of the attention output. The output of each layer is then fed into the next layer, allowing the model to capture increasingly complex relationships between the input tokens. One of the key advantages of the core transformer architecture is its ability to process sequential data in parallel, rather than sequentially like traditional recurrent neural networks. This makes it much more efficient to train and allows it to handle longer input sequences. Additionally, the self-attention mechanism enables the model to capture both short-range and long-range dependencies between the input tokens, making it well-suited for tasks that require a global understanding of the input sequence. Overall, the core transformer architecture has proven to be highly effective and has achieved state-of-the-art performance on many natural language processing tasks. It has also been extended and modified in various ways to address different applications and improve its performance, such as incorporating convolutional neural networks or using different types of attention mechanisms. 1). The ____________________ projects the input embedding to a query vector that is used to compute attention scores with the key vector? Query vector Key vector Value vector Weighted sum Hint 2). Which one of the following components in core transformer converts input tokens into fixed-length vectors? Positional encoding Input embedding Encoder layers Multi-head attention Hint 3). A mechanism that allows the model to attend to different parts of the input sequence based on their relevance to the task at hand is known as ________________? Self-Attention mechanism Multi-Head attention Positional encoding Encoder layers Hint 4). Which one of the following maps input tokens to fixed-length vectors? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 5). The distributed representation, context-awareness are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 6). Compute attention scores for each input token is the operation of __________________? Input embedding Self-attention mechanism Residual connections Layer normalization Hint 7). What is the purpose of a core in a transformer? To insulate the windings from each other To provide a path for the magnetic flux To connect the primary and secondary windings To regulate the output voltage Hint 8). Which type of core material has the lowest core loss? Laminated CRGO Amorphous Ferrite Hint 9). The large-scale language models that are based on the transformer architecture is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers Hint 10). The ____________________ projects the input embedding to a key vector that is used to compute attention scores with the query vector? Query vector Key vector Value vector Weighted sum Hint 11). Which one of the following components adds information about the position of each token in the input sequence? Positional encoding Input embedding Encoder layers Multi-head attention Hint 12). A series of layers that each include a self-attention mechanism, a multi-head attention mechanism, and a feedforward neural network is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers Hint 13). Which one of the following adds position information to input embeddings? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 14). Compute attention scores for each head and concatenate is the operation of __________________? Input embedding Multi-head attention Residual connections Layer normalization Hint 15). A two-stage training process where a transformer-based language model is first pre-trained on a large corpus of text and then fine-tuned on a smaller task-specific dataset is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers Hint Core Transformer MCQ for Quiz 16). The improve model convergence, reduce internal covariate shift are the principles of _______________? Feedforward neural network Self-attention mechanism Residual connections Layer normalization Hint 17). The look up embeddings for each input token is the operation of __________________? Input embedding Self-attention mechanism Residual connections Layer normalization Hint 18). Apply feedforward network to output of attention layer is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention Hint 19). Which type of winding configuration is commonly used in high voltage transformers? Wound Layer Helical Cylindrical Hint 20). The ____________________ projects the input embedding to a value vector that is used to compute the weighted sum of the input embeddings based on the attention scores? Query vector Key vector Value vector Weighted sum Hint 21). Which one of the following components consist of self-attention and feedforward neural network components? Positional encoding Input embedding Encoder layers Multi-head attention Hint 22). A series of layers that each include a self-attention mechanism and a feedforward neural network is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers Hint 23). Which one of the following computes attention scores based on pairwise similarities between input tokens? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 24). The prevent vanishing gradients, stabilize training are the principles of _______________? Feedforward neural network Self-attention mechanism Residual connections Input embedding Hint 25). Add the output of each sublayer to its input is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention Hint 26). What is the purpose of insulation in a transformer? To reduce the size and weight of the transformer To prevent electrical breakdown between windings To increase the efficiency of the transformer To reduce the cost of the transformer Hint 27). What is the purpose of a tap changer in a transformer? To adjust the frequency of the transformer To adjust the voltage ratio of the transformer To provide a path for the magnetic flux To insulate the windings from each other Hint 28). Which winding arrangement has a better cooling effect? Layer Helical Disc Cylindrical Hint 29). Introduce additional complexity and modelling power are the principles of _______________? Feedforward neural network Self-attention mechanism Multi-head attention Input embedding Hint 30). The ____________________ computes the similarity between the query vector and key vector for each pair of input tokens and normalizes the scores using the SoftMax function? Query vector Key vector Value vector Attention scores Hint Core Transformer MCQ for Exams 31). Which one of the following components computes multiple sets of attention scores in parallel? Positional encoding Input embedding Encoder layers Multi-head attention Hint 32). Attend to multiple parts of the input sequence simultaneously are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 33). Normalize the output of each sublayer before adding residual connection is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention Hint 34). The core rolled grain-oriented silicon steel is the material of ________________ component? CRGO core Amorphous core Both a and b None of the above Hint 35). In which one of the following transformers the inductance is higher? Shell transformer Core transformer Both a and b None of the above Hint 36). What is the purpose of interleaved windings in a transformer? To reduce the leakage inductance To increase the voltage ratio of the transformer To reduce the size and weight of the transformer To improve the cooling effect Hint 37). Which type of winding is commonly used in high power transformers? Wound Layer Helical Cylindrical Hint 38). What are the main types of core losses in transformers? Copper losses and hysteresis losses Eddy current losses and hysteresis losses Eddy current losses and copper losses Hysteresis losses and magnetic losses Hint 39). Which type of transformer is more likely to experience core losses? Low voltage transformers High voltage transformers Step-up transformers Step-down transformers Hint 40). The ____________________ computes the weighted sum of the value vectors using the attention scores, resulting in a context vector that captures the most important information in the input sequence for the given task? Query vector Key vector Value vector Weighted sum Hint 41). Which one of the following components applies a non-linear transformation to the output of the attention layer? Feedforward neural network Input embedding Encoder layers Multi-head attention Hint 42). A technique for adding information about the position of each token in the input sequence to the input embeddings is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers Hint 43). Attend to different parts of input sequence, capture long-range dependencies are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 44). The electrical resistivity is high in ________________ component? CRGO core Amorphous core Both a and b None of the above Hint 45). Which one of the following components normalizes the output of each layer before feeding it into the next layer? Feedforward neural network Residual connections Layer normalization Multi-head attention Hint 46). Which one of the following transformers is used in high voltage applications? Core transformer Shell transformer Both a and b None of the above Hint 47). Add positional encodings to input embeddings is the operation of __________________? Input embedding Positional encoding Residual connections Layer normalization Hint 48). The capture absolute and relative position information are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding Hint 49). A variation of the self-attention mechanism that computes multiple sets of attention scores in parallel is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers Hint 50). Which one of the following components allow gradients to flow more easily through the network? Feedforward neural network Residual connections Encoder layers Multi-head attention Hint For More MCQs Single Phase Transformer Question & Answers Shell Type Transformer Question & Answers Transformer Question & Answers Time's up