Viewing 5 reply threads
  • Author
    • #36512
      Anastasios Asimakopoulos
          • What did you discover when using the tool? (you can report any findings from your searches)
          • How easy did you find the tool?
          • How would you use the tool (course/materials development, hands-on tasks in class, self-access for students, etc.) and why?
        • #37621
          Judith Gorham

              The n-grams tool seems useful for looking at discourse markers where there is more than one word, such as ‘In addition’. What I’d like to do is compare how different disciplines use these. Can I do that?

              • #37641
                Anastasios Asimakopoulos

                    Hi @judith-gorham yes that’s a great idea. The function n-grams can be used to identify various text-oriented lexical bundles such as transition signals (in contrast to the), structuring signals (in the present study), resultative signals (as a result of) and framing signals (on the basis of) – see this week’s text by Hyland (2018) for more types of lexical bundles. You can narrow down your search to one of the four broad disciplinary categories by selecting AH, SS, PS or LS under Subcorpus; unfortunately, narrowing it down to one specific discipline is not available unless one has an institutional/personal log in. I have attached two screenshots, but the second one is only if you have access via your institution. How were you planning to use the tool?

                    However, a way around this is to use Frequency. Once you create your n-gram list, you can go to the concordance of a specific n-gram e.g. on the other hand, and click Frequency – Text Types to see how frequent it is in individual disciplines. This is from this week’s guide, so give it a go and let me know if you have any questions.

                • #37826
                  Rhian Webb

                      @anastasios & @judith-gorham

                      Hi both! I have just finished marking 70 writing papers written by Chinese students and they seem to really like writing ‘last but not least’ as a transition signal in a topic sentence. I wanted to know why they do this because I had not taught them to use it over the past 3 months of the course.  I wanted to see if N-grams could help me to see the frequency for this multi-word expression, so I checked the BAWE and the BASE. But alas, i could not find the expression in either corpora.

                      I did discover the frequency for multi-word expressions starting with [last] for both written and spoken register.


                      Last week I…  frequency 16

                      Last week we….13


                      Last few years….frequency of 25

                      lasting welfare settlement … 19

                      I was really surprised by the [lasting welfare settlement 2.28 / million tokens ] as I have no idea what the context could be for this high frequency expression! So I checked and it was only used in History & Sociology essay question:  Why did the Swedish left succeed in creating a lasting Welfare Settlement in the 1930s while the British left did not achieve the same goal until the late 1940s?

                      I really got concerned that a corpus can be so heavily skewed towards what is in it and not really a true reflection of how we frequently we write [last] in written form.

                      So I used the word list to see what it came up with and the results made more sense to me.

                      lemma [lastly] 166 frequency – from the concordancer – the word/punctuation that mostly proceeded [lastly] was [and] or a full stop. This felt about right to me.

                      lemma [last] 38 frequency – from the concordance lines, the word mostly proceeding [last]  was [at] – this I was also surprised by!

                      I think the Word List is easy enough to use with students. I wonder if the students would be as surprised as I was by the findings, and perhaps the level of surprise, leads to recognition that what they think/believe is frequently used in written or spoken word, is, in fact, actually not the case. And then, they can look for and spot a more frequent word which is more appropriate, such as, in this case, [lastly] as opposed to [last but not least] or [last].

                      I think this could work out quite nicely next semester with my students!!! :yes:

                      • #37908
                        Anastasios Asimakopoulos

                            Hi @rmwebb thank you for this wonderful reflection. There’s definitely a lot to unpack in your comments.

                            From experience, I can tell that Last but not least is one of those IELTS writing task 2 tricks where students introduce their last point and try to get as many words in as possible! As you discovered Last but not least is not very frequent; it occurs 32 times in BAWE with a relative frequency of 3.12 per million tokens (link), while Finally occurs 754 times with a relative frequency of 90.45 per million (that’s 30 times more frequent). What this shows is that naturally occurring language is different from what published teaching/learning materials present, and this is where corpora can come into play and be used by students and teachers to challenge syllabi and practices. I was doing a similar activity this week on a course I am taking, where we were examining spoken interactions in EFL coursebooks and extracts from a spoken corpus; it’s literally like you are looking at two different languages :mail: I still get a little irked when I hear students say In my opinion because of how rarely I use it or hear colleagues and friends use it. Now, back to Last but not least. Looking at the Frequency – Text Types, we can see that none of these 32 examples of Last but not least have been used by students with English as their L1:

                            My guess would be that academic writers would not want to ‘waste’ four words on a transition when they need to prioritise saying as much as they can in terms of content and being critical.

                            Now, regarding n-grams, I could perhaps clarify that the longer the n-grams the more specific the phrase becomes and therefore less frequent. In addition, n-grams doesn’t allow us to specify the part of speech or forms i.e. last as an adjective or adverb; it only pulls any recurrent patterns. That’s why you found the adjective lasting in lasting welfare settlement. So, it’s not a matter of skewness. I am also guessing that you typed last not Last? See the difference it makes here – link

                            But to go back to what you are saying, students can examine their ‘beliefs’ and background knowledge to see how it compares to authentic student writing. That’s the fun part of discovery learning – being surprised ;)

                        • #38028
                          Ana Vucicevic

                              Hiii @anastasios, I opted for N-gram this time and I agree with @judith-gorham regarding the use of this function for investigating discourse markers. That was my first thought.

                              I have some doubts with respect to some terms, so I would be really grateful if you could clarify these for me. First of all, what is the difference between reference and focus corpus/frequency? Secondly, can you illustrate the search for sem, textpart and lempos attributes?


                            • #38051
                              Anastasios Asimakopoulos

                                  Hi @ana93

                                  I am assuming your first question is about the Keywords function. This function allows you to create a list of words (or phrases) that are typical of the corpus when compared to a different corpus and are determined using statistical measures such as the Simple Math Parameter (SMP) – see the formula here if you are interested in the maths behind it. So, focus corpus is the corpus we are examining while reference corpus is the corpus we are comparing it with. We aren’t covering it in the course because the free version of Sketch Engine doesn’t include many corpora for meaningful comparisons. However, one could compare a BAWE subcorpus (focus) with the entire BAWE (reference).

                                  Regarding your second question about Frequency, I can give you a quick overview:

                                  • lempos is a combination of lemma (lem) and part of speech (pos) e.g. take-V, good-J, system-N, etc. I rarely use them myself if I am honest; I’d rather have students figure out the part of speech from the context.
                                  • textpart allows you to specify which part of the text you want to see the word in e.g. abstract, headings, bibliography, running-text (the main text), etc. Please note that this feature depends on how a corpus was designed and created; not all corpora have this.
                                  • sem refers to semantic tags. BAWE has been semantically tagged using the UCREL Semantic Analysis System see for more information You can basically search for certain topics (semantic fields) using CQL (Unit 6) if you are interested in looking at vocabulary semantically related e.g. [sem=”X2.1.*”] (thought/belief) – see concordance here. It’s a little advanced and given the short nature of the course, we focus on lexico-grammatical structures when teaching CQL. Obviously, the limitation is that this tagging is automatic and cannot differentiate between various meanings of words; that’s why qualitative analysis of concordance lines is necessary.

                                  I hope this information helps a little.

                                  • #38097
                                    Ana Vucicevic

                                        Actually @anastasios it helps a lot! I tend to get confused with the terms :scratch:

                                        For the time being, I am just going to give a wide berth to the sem function, although it really sounds interesting. :D

                                    • #38572
                                      Samuel Pealing

                                          I worked on using the wordlist function to explore the frequency of different words in specific discipline groups. Most of my classes are with business students, so identifying the most frequently used nouns, verbs and adjectives, and then exploring the collocations and frequently used constructions of these seems quite useful for learners who, like mine, keep a personal vocabulary log of useful language.


                                          For example. by searching for noun in SS part of the BAWE, we can see that the 7th most frequently used word is policy. From there, I can select Word Sketch and see that in terms of modifiers, fiscal, monetary and economic are three of the most frequently used words with policy. This can lead into an investigation into the difference between these three words, which all have something to do with money, and that’s what could be used in regards to creating a classroom activity.


                                          In conclusion, the wordlist feature can be used as a great jumping off point if you’re looking for inspiration.

                                      Viewing 5 reply threads
                                      • You must be logged in to reply to this topic.