text-analysis

A guide to text analysis tools

Table of Contents

Oxford Text Checker

Introduction

The Oxford Text Checker allows users to analyse texts against the Oxford 3000 (a list of the 3000 most useful words for learners of English to know), the Oxford 5000 (an expanded list for advanced speakers) and OPAL (Oxford Phrasal Academic Lexicon). You can learn more about these lists here. You can analyse texts and create wordlists and exercises based on them.

Checking text against the Oxford 3000/5000/OPAL word lists

First, paste your text into the box (1),then click “Check Text” (2): The results will look something like this (text here and in subsequent images from theguardian.com). The site defaults to the Oxford 5000 list, and the words are colour coded according to the CEFR: You can hover over the word to find out the part of speech and which CEFR band it belongs to (an underlined word means it appears in more than one level for different parts of speech): The dropdown menu at the top allows you to switch between the four available wordlists:

Results

Clicking the “Results” tab at the top gives you a nice breakdown of the text in pie-chart form (click “More detail” at the bottom if you’re hungry for bar-charts:

Activities

If you click the “Activities” tab (1), you can click on individual words in the text to highlight them. The “Filters” button brings up a panel of checkboxes where you can include/exclude individual CEFR levels – this will automatically highlight the words of the selected levels:

Create a Word List / Exercise

At the bottom you have 2 choices: “Create a word list” or “Create an exercise”: “Create a word list” organises the words by CEFR level into a PDF which you can download by pressing “Export” at the bottom: “Create an exercise” creates a gapfill of your text to download as a PDF (you can choose whether to have a box at the top showing the missing words and/or to show the first letter of missing words): Again, click “Export” at the bottom to create an exercise like this:

The EAP foundation Academic Word List (AWL) highlighter and gapfill maker

Introduction

The EAP foundation website has a couple of useful tools that will analyse texts for AWL words. This tool differentiates between the different AWL sublists and (like the Oxford Text Checker) has the handy ability to create gapfill exercises from your texts.

The AWL Tag Cloud

This tool can help visualise the predominance of AWL words in a text and also gives neat options for creating gapfill exercises. To use it, go to this page and then: In the menu, hover over Vocabulary (1), then Academic Vocabulary (2), then Academic Word List (AWL) (3), then click AWL tag cloud & gapfill (4). Paste your text into the box (1) and click Submit (2). There is an automatic limit of 20,000 characters, but you can change this to unlimited by clicking the arrow (3) next to 20,000. You will see the text with AWL words in different colours and at different sizes. You can change the colours and sizes of the sublists using the options in (1). Or you can make all AWL words the same size by choosing from the options in (2). Below the text is a list of AWL words divided by sublists.

The AWL highlighter

This tool highlights all the AWL words in a text and also gives options for creating gapfill exercises. To use it, go to this page and then: In the menu, hover over Vocabulary (1), then Academic vocabulary (2), then Academic Word List (AWL) (3) and click AWL highlighter & gapfill. Paste your text into the box (1) and click Submit (2). There is an automatic limit of 20,000 characters, but you can change this to unlimited by clicking the arrows next to 20,000. The words will automatically be highlighted grey for sublists 1-5 and orange for sublists 6-10. You can customise the colours using the options in (10), make them all the same colour using the options in (2), or clear all highlighting using (3). Clearing highlighting may seem an odd choice, but if you clear highlighting and then use the controls in (1) to focus only on the sublists you want to see, the resulting text can be cleaner and easier to handle. A word list divided by sublists appears beneath your text.

Making gapfills

On both the cloud and highlighter tools, you can make gapfills that are easy to copy and paste to use with students. To make gapfills, scroll down below the word list and choose from three options: simple (1), head word (2) or word family (3). A simple gapfill (1) removes all the AWL words from the text and gives students no clues as to what may be missing. A head word gapfill (2) provides the AWL head word in brackets after the gap, meaning students must decide whether to complete the gap with the head word or a different form of it. A word family gapfill (3) provides a word in brackets after the gap from the same word family as the missing AWL word, meaning students must decide how to transform it. To share the gapfills with students, all you have to do is copy and paste them into a different document and share that. Obviously, this also gives you the option of editing to make the task less difficult if necessary.

Webcorp Wordlist Generator

Introduction

Hosted by Birmingham City University, Webcorp enables users to create a concordance for any word by generating examples from the internet. In addition, it has a useful tool that can generate a wordlist for any text, and a concordance to those words as used in the text. This has obvious value for both teachers and students when they need to identify key lexis in a text.

Creating a wordlist

Click Wordlist tool (1). Click ‘Or specify the text to analyse…’ (2): Paste the text into the box (3). Click ‘Submit’ (4). This will generate a wordlist ordered by frequency of word in the text. lex14.png lex15.png

Creating the concordance of a word in the text

1 Click on any of the words in the list. This will generate a concordance for that word within the text. 2 To see the distribution of the word within the text, click ‘Text’. This can be particularly useful for guiding students to seeing how topic-related lexis may occur in the introduction, then cluster in one or two paragraphs in the body before reoccurring in the conclusion. lex16.png lex17.png

Using the wordlist tool to find in-text collocations

The wordlist tool can also be used to create wordlists of pairs of words or strings of up to five words. To do this, paste the text as normal, then click on the arrow next to n-gram and choose the length of word string from the drop-down arrow (1). Click ‘Submit’ (2). lex18.png The list may appear frustrating at first, largely dominated by ‘the+noun’ pairs, but scrolling down can help identify other more interesting collocations that recur in the text. lex19.png

Flax

Introduction

The Flax corpus has many useful tools for the teacher and learner, including PhD abstracts that can be analysed for language and an easy-to-use collocation tool. One tip: Flax isn’t very easy to find via Google, so it’s worth bookmarking the page. The best place to start is the Learning Collocations tool – click the link shown below:

Using the Learning Collocations tool

1 Type the word you want to research into the box (1). Click the arrow and select the corpus in which you want to research the word (2). Click ‘Go’ (3). 2 The list of top collocation and word strings will appear, grouped by pattern (e.g. noun + noun) and ranked by the number of occurrences within the corpus. Related words are shown at the top of the screen (1). Clicking on any of these will show results for that word. If there is an arrow at the bottom of the screen (2), click it to reveal more patterns. Use the ‘More’ arrows at the right (3) to see further collocations. lex22.png 3 Click on any collocation to see a word list for it appear in a box below. The numbers refer to how many examples there are in the corpus. 4 Click on any collocation to see example sentences in a box. The collocation appears in bold. These sentences can be copied and pasted into word processing documents and adapted for gapfills etc. You can change the results to those for a different corpus by clicking and selecting from the dropdown menu at any time. The results will update accordingly.

Getting More out of the Flax Library

Useful Academic Language

You and our students can analyse language as used in PhD abstracts and/or the British Academic Writing, a useful way to see language as used in disciplines. A simple way to do this is: 1 Click on ‘Useful Words for Academic Writing’ in the library. lex32.png 2 Choose a discipline from the drop-down box (this is only available in the BAWE; useful language in the PhD abstracts library is currently limited to Arts & Humanities). lex32.1.png 3 A list of categories will appear. Click on the ‘+’ sign next to a category to expand it. lex33.png lex34.png 4 Click on any blue word to see a list of the lexical patterns that word occurs in. The image shows the pattern for ‘I’ in the PhD abstract library. Clearly, this would give the student an array of phrases to choose from when writing their abstract. lex35.png

Analysing PhD abstracts

1 Back on the Flax homepage, select an area from the PhD Abstracts Collections section of the library, e.g. Arts and Humanities. lex36.png 2 Click on the plus [+] buttons to expand the subjects on the left, and again to expand subgroups, then click on a title from the list. 3 Click on wordlist. This will automatically highlight the words from the list of 1000 most frequent words and give a percentage. lex38.png 4 Click on the drop-down menu in the top left to select a different wordlist to check it against. Results will automatically update and the highlighted words change. lex39.png lex40.png 5 To analyse collocations within an abstract, click on a word-class label in the top right. The image shows adjectival collocations. lex41.png