#35603
Anastasios Asimakopoulos
Keymaster
    @anastasios

    Hi all. Thought I could use this to share notes from today’s drop in (not necessarily for Unit 3). Big thanks to those who could make it by the way. Here are some pointers/reminders/answers to your questions:

    1. Sorting vs shuffling – what’s the difference?

    Sorting and shuffling are two ways of rearranging our concordance lines. When we search for a word, the system presents the results in a ‘chronological’ order. In corpus terms, that means that it will show us the first instance of the word from the first text, then the second instance of the word in the same document, and so on, before it moves to the second document, third, etc. For example, if you search for the, the first 20 concordance lines will all come from the same document, since the is such a frequent word. To get results from various documents, and therefore a more representative sample, we shuffle.

    Sorting helps in the same way, but most importantly it allows us to see patterns more easily by organising the right/left contexts in an alphabetical order (see images below)

    2. How to search for a word at the beginning of a sentence?

    3. Narrow searches and a few results

    We also talked about how searching for two or more words makes the search a little narrower and you might not be able to find what you were looking for. For example, the phrase scientific methods occurs 15 times in the corpus and there wasn’t much to see in terms of verbs that collocate with it. I recommend using the noun methods (2,188 times); that would yield more examples and you will able to identify verbs. Looking at the collocation scientific + method, I noticed that it comes mainly from psychology, philosophy and archaeology (see link  created via Word Sketch) and some verbs are: consider, use, follow, shape, aid, etc. So, if you are looking for more variety, it’s better to search for method only.

    4. Can I create my own corpus with Sketch Engine?

    Yes, that’s possible with an institutional log in but since we are losing that (around April 2022?), we decided not to cover this on the course and focus on free corpora which we will always have access to. I recommend AntCorGen and AntConc by Laurence Antony. The first allows you to download research articles (or specific sections) from various disciplines and very specific topics. The second allows you to upload these articles and search your corpus using corpus techniques we are covering e.g. concordancing, sorting, collocations, n-grams, etc. I think we have a guide somewhere, so I can add it to list of useful resources we share at the end of course.

    Hope you find these notes helpful. Again, thank you for coming today!