The n-grams tool seems useful for looking at discourse markers where there is more than one word, such as ‘In addition’. What I’d like to do is compare how different disciplines use these. Can I do that?
Hi @judith-gorham yes that’s a great idea. The function n-grams can be used to identify various text-oriented lexical bundles such as transition signals (in contrast to the), structuring signals (in the present study), resultative signals (as a result of) and framing signals (on the basis of) – see this week’s text by Hyland (2018) for more types of lexical bundles. You can narrow down your search to one of the four broad disciplinary categories by selecting AH, SS, PS or LS under Subcorpus; unfortunately, narrowing it down to one specific discipline is not available unless one has an institutional/personal log in. I have attached two screenshots, but the second one is only if you have access via your institution. How were you planning to use the tool?
However, a way around this is to use Frequency. Once you create your n-gram list, you can go to the concordance of a specific n-gram e.g. on the other hand, and click Frequency – Text Types to see how frequent it is in individual disciplines. This is from this week’s guide, so give it a go and let me know if you have any questions.
Hi both! I have just finished marking 70 writing papers written by Chinese students and they seem to really like writing ‘last but not least’ as a transition signal in a topic sentence. I wanted to know why they do this because I had not taught them to use it over the past 3 months of the course. I wanted to see if N-grams could help me to see the frequency for this multi-word expression, so I checked the BAWE and the BASE. But alas, i could not find the expression in either corpora.
I did discover the frequency for multi-word expressions starting with [last] for both written and spoken register.
Spoken
Last week I… frequency 16
Last week we….13
Written
Last few years….frequency of 25
lasting welfare settlement … 19
I was really surprised by the [lasting welfare settlement 2.28 / million tokens ] as I have no idea what the context could be for this high frequency expression! So I checked and it was only used in History & Sociology essay question: Why did the Swedish left succeed in creating a lasting Welfare Settlement in the 1930s while the British left did not achieve the same goal until the late 1940s?
I really got concerned that a corpus can be so heavily skewed towards what is in it and not really a true reflection of how we frequently we write [last] in written form.
So I used the word list to see what it came up with and the results made more sense to me.
lemma [lastly] 166 frequency – from the concordancer – the word/punctuation that mostly proceeded [lastly] was [and] or a full stop. This felt about right to me.
lemma [last] 38 frequency – from the concordance lines, the word mostly proceeding [last] was [at] – this I was also surprised by!
I think the Word List is easy enough to use with students. I wonder if the students would be as surprised as I was by the findings, and perhaps the level of surprise, leads to recognition that what they think/believe is frequently used in written or spoken word, is, in fact, actually not the case. And then, they can look for and spot a more frequent word which is more appropriate, such as, in this case, [lastly] as opposed to [last but not least] or [last].
I think this could work out quite nicely next semester with my students!!!
Hi @rmwebb thank you for this wonderful reflection. There’s definitely a lot to unpack in your comments.
From experience, I can tell that Last but not least is one of those IELTS writing task 2 tricks where students introduce their last point and try to get as many words in as possible! As you discovered Last but not least is not very frequent; it occurs 32 times in BAWE with a relative frequency of 3.12 per million tokens (link), while Finally occurs 754 times with a relative frequency of 90.45 per million (that’s 30 times more frequent). What this shows is that naturally occurring language is different from what published teaching/learning materials present, and this is where corpora can come into play and be used by students and teachers to challenge syllabi and practices. I was doing a similar activity this week on a course I am taking, where we were examining spoken interactions in EFL coursebooks and extracts from a spoken corpus; it’s literally like you are looking at two different languages I still get a little irked when I hear students say In my opinion because of how rarely I use it or hear colleagues and friends use it. Now, back to Last but not least. Looking at the Frequency – Text Types, we can see that none of these 32 examples of Last but not least have been used by students with English as their L1:
My guess would be that academic writers would not want to ‘waste’ four words on a transition when they need to prioritise saying as much as they can in terms of content and being critical.
Now, regarding n-grams, I could perhaps clarify that the longer the n-grams the more specific the phrase becomes and therefore less frequent. In addition, n-grams doesn’t allow us to specify the part of speech or forms i.e. last as an adjective or adverb; it only pulls any recurrent patterns. That’s why you found the adjective lasting in lasting welfare settlement. So, it’s not a matter of skewness. I am also guessing that you typed last not Last? See the difference it makes here – link
But to go back to what you are saying, students can examine their ‘beliefs’ and background knowledge to see how it compares to authentic student writing. That’s the fun part of discovery learning – being surprised ;)
Hiii @anastasios, I opted for N-gram this time and I agree with @judith-gorham regarding the use of this function for investigating discourse markers. That was my first thought.
I have some doubts with respect to some terms, so I would be really grateful if you could clarify these for me. First of all, what is the difference between reference and focus corpus/frequency? Secondly, can you illustrate the search for sem, textpart and lempos attributes?
I am assuming your first question is about the Keywords function. This function allows you to create a list of words (or phrases) that are typical of the corpus when compared to a different corpus and are determined using statistical measures such as the Simple Math Parameter (SMP) – see the formula here if you are interested in the maths behind it. So, focus corpus is the corpus we are examining while reference corpus is the corpus we are comparing it with. We aren’t covering it in the course because the free version of Sketch Engine doesn’t include many corpora for meaningful comparisons. However, one could compare a BAWE subcorpus (focus) with the entire BAWE (reference).
Regarding your second question about Frequency, I can give you a quick overview:
lempos is a combination of lemma (lem) and part of speech (pos) e.g. take-V, good-J, system-N, etc. I rarely use them myself if I am honest; I’d rather have students figure out the part of speech from the context.
textpart allows you to specify which part of the text you want to see the word in e.g. abstract, headings, bibliography, running-text (the main text), etc. Please note that this feature depends on how a corpus was designed and created; not all corpora have this.
sem refers to semantic tags. BAWE has been semantically tagged using the UCREL Semantic Analysis System see http://ucrel.lancs.ac.uk/usas/ for more information You can basically search for certain topics (semantic fields) using CQL (Unit 6) if you are interested in looking at vocabulary semantically related e.g. [sem=”X2.1.*”] (thought/belief) – see concordance here. It’s a little advanced and given the short nature of the course, we focus on lexico-grammatical structures when teaching CQL. Obviously, the limitation is that this tagging is automatic and cannot differentiate between various meanings of words; that’s why qualitative analysis of concordance lines is necessary.
I worked on using the wordlist function to explore the frequency of different words in specific discipline groups. Most of my classes are with business students, so identifying the most frequently used nouns, verbs and adjectives, and then exploring the collocations and frequently used constructions of these seems quite useful for learners who, like mine, keep a personal vocabulary log of useful language.
For example. by searching for noun in SS part of the BAWE, we can see that the 7th most frequently used word is policy. From there, I can select Word Sketch and see that in terms of modifiers, fiscal, monetary and economic are three of the most frequently used words with policy. This can lead into an investigation into the difference between these three words, which all have something to do with money, and that’s what could be used in regards to creating a classroom activity.
In conclusion, the wordlist feature can be used as a great jumping off point if you’re looking for inspiration.
Author
Posts
Viewing 5 reply threads
You must be logged in to reply to this topic.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.