Core Transformer Question & AnswersApril 20, 2023 By Wat Electrical This article lists 50 Core Transformer MCQs for engineering students. All the Core Transformer Questions & Answers given below include a hint and a link wherever possible to the relevant topic. This is helpful for users who are preparing for their exams, interviews, or professionals who would like to brush up on the fundamentals of the Core Transformer.A core transformer is a type of neural network architecture that uses the self-attention mechanism to process sequential data, such as natural language sentences or audio signals. The core transformer architecture consists of a series of identical layers, each of which includes a self-attention mechanism followed by a feedforward neural network. The self-attention mechanism enables the model to attend to different parts of the input sequence based on their relevance to the task at hand, while the feedforward network provides a non-linear transformation of the attention output. The output of each layer is then fed into the next layer, allowing the model to capture increasingly complex relationships between the input tokens. One of the key advantages of the core transformer architecture is its ability to process sequential data in parallel, rather than sequentially like traditional recurrent neural networks. This makes it much more efficient to train and allows it to handle longer input sequences. Additionally, the self-attention mechanism enables the model to capture both short-range and long-range dependencies between the input tokens, making it well-suited for tasks that require a global understanding of the input sequence. Overall, the core transformer architecture has proven to be highly effective and has achieved state-of-the-art performance on many natural language processing tasks. It has also been extended and modified in various ways to address different applications and improve its performance, such as incorporating convolutional neural networks or using different types of attention mechanisms. 1). The ____________________ projects the input embedding to a query vector that is used to compute attention scores with the key vector? Query vector Key vector Value vector Weighted sum None Hint2). Which one of the following components in core transformer converts input tokens into fixed-length vectors? Positional encoding Input embedding Encoder layers Multi-head attention None Hint3). A mechanism that allows the model to attend to different parts of the input sequence based on their relevance to the task at hand is known as ________________? Self-Attention mechanism Multi-Head attention Positional encoding Encoder layers None Hint4). Which one of the following maps input tokens to fixed-length vectors? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint5). The distributed representation, context-awareness are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint6). Compute attention scores for each input token is the operation of __________________? Input embedding Self-attention mechanism Residual connections Layer normalization None Hint7). What is the purpose of a core in a transformer? To insulate the windings from each other To provide a path for the magnetic flux To connect the primary and secondary windings To regulate the output voltage None Hint8). Which type of core material has the lowest core loss? Laminated CRGO Amorphous Ferrite None Hint9). The large-scale language models that are based on the transformer architecture is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers None Hint10). The ____________________ projects the input embedding to a key vector that is used to compute attention scores with the query vector? Query vector Key vector Value vector Weighted sum None Hint11). Which one of the following components adds information about the position of each token in the input sequence? Positional encoding Input embedding Encoder layers Multi-head attention None Hint12). A series of layers that each include a self-attention mechanism, a multi-head attention mechanism, and a feedforward neural network is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers None Hint13). Which one of the following adds position information to input embeddings? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint14). Compute attention scores for each head and concatenate is the operation of __________________? Input embedding Multi-head attention Residual connections Layer normalization None Hint15). A two-stage training process where a transformer-based language model is first pre-trained on a large corpus of text and then fine-tuned on a smaller task-specific dataset is known as ________________? Decoder layers Transformer-based language models Pretraining and fine-tuning Encoder layers None HintCore Transformer MCQ for Quiz16). The improve model convergence, reduce internal covariate shift are the principles of _______________? Feedforward neural network Self-attention mechanism Residual connections Layer normalization None Hint17). The look up embeddings for each input token is the operation of __________________? Input embedding Self-attention mechanism Residual connections Layer normalization None Hint18). Apply feedforward network to output of attention layer is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention None Hint19). Which type of winding configuration is commonly used in high voltage transformers? Wound Layer Helical Cylindrical None Hint20). The ____________________ projects the input embedding to a value vector that is used to compute the weighted sum of the input embeddings based on the attention scores? Query vector Key vector Value vector Weighted sum None Hint21). Which one of the following components consist of self-attention and feedforward neural network components? Positional encoding Input embedding Encoder layers Multi-head attention None Hint22). A series of layers that each include a self-attention mechanism and a feedforward neural network is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers None Hint23). Which one of the following computes attention scores based on pairwise similarities between input tokens? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint24). The prevent vanishing gradients, stabilize training are the principles of _______________? Feedforward neural network Self-attention mechanism Residual connections Input embedding None Hint25). Add the output of each sublayer to its input is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention None Hint26). What is the purpose of insulation in a transformer? To reduce the size and weight of the transformer To prevent electrical breakdown between windings To increase the efficiency of the transformer To reduce the cost of the transformer None Hint27). What is the purpose of a tap changer in a transformer? To adjust the frequency of the transformer To adjust the voltage ratio of the transformer To provide a path for the magnetic flux To insulate the windings from each other None Hint28). Which winding arrangement has a better cooling effect? Layer Helical Disc Cylindrical None Hint29). Introduce additional complexity and modelling power are the principles of _______________? Feedforward neural network Self-attention mechanism Multi-head attention Input embedding None Hint30). The ____________________ computes the similarity between the query vector and key vector for each pair of input tokens and normalizes the scores using the SoftMax function? Query vector Key vector Value vector Attention scores None HintCore Transformer MCQ for Exams31). Which one of the following components computes multiple sets of attention scores in parallel? Positional encoding Input embedding Encoder layers Multi-head attention None Hint32). Attend to multiple parts of the input sequence simultaneously are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint33). Normalize the output of each sublayer before adding residual connection is the operation of __________________? Feedforward neural network Residual connections Layer normalization Multi-head attention None Hint34). The core rolled grain-oriented silicon steel is the material of ________________ component? CRGO core Amorphous core Both a and b None of the above None Hint35). In which one of the following transformers the inductance is higher? Shell transformer Core transformer Both a and b None of the above None Hint36). What is the purpose of interleaved windings in a transformer? To reduce the leakage inductance To increase the voltage ratio of the transformer To reduce the size and weight of the transformer To improve the cooling effect None Hint37). Which type of winding is commonly used in high power transformers? Wound Layer Helical Cylindrical None Hint38). What are the main types of core losses in transformers? Copper losses and hysteresis losses Eddy current losses and hysteresis losses Eddy current losses and copper losses Hysteresis losses and magnetic losses None Hint39). Which type of transformer is more likely to experience core losses? Low voltage transformers High voltage transformers Step-up transformers Step-down transformers None Hint40). The ____________________ computes the weighted sum of the value vectors using the attention scores, resulting in a context vector that captures the most important information in the input sequence for the given task? Query vector Key vector Value vector Weighted sum None Hint41). Which one of the following components applies a non-linear transformation to the output of the attention layer? Feedforward neural network Input embedding Encoder layers Multi-head attention None Hint42). A technique for adding information about the position of each token in the input sequence to the input embeddings is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers None Hint43). Attend to different parts of input sequence, capture long-range dependencies are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint44). The electrical resistivity is high in ________________ component? CRGO core Amorphous core Both a and b None of the above None Hint45). Which one of the following components normalizes the output of each layer before feeding it into the next layer? Feedforward neural network Residual connections Layer normalization Multi-head attention None Hint46). Which one of the following transformers is used in high voltage applications? Core transformer Shell transformer Both a and b None of the above None Hint47). Add positional encodings to input embeddings is the operation of __________________? Input embedding Positional encoding Residual connections Layer normalization None Hint48). The capture absolute and relative position information are the principles of _______________? Positional encoding Self-attention mechanism Multi-head attention Input embedding None Hint49). A variation of the self-attention mechanism that computes multiple sets of attention scores in parallel is known as ________________? Self-Attention Mechanism Multi-Head Attention Positional Encoding Encoder Layers None Hint50). Which one of the following components allow gradients to flow more easily through the network? Feedforward neural network Residual connections Encoder layers Multi-head attention None HintFor More MCQsSingle Phase Transformer Question & AnswersShell Type Transformer Question & AnswersTransformer Question & Answers Time's up