An Investigation of Different Language Choice through Personal Pronouns in the Twitter

This study aimed to find evidence regarding the use of personal pronouns in the discourses produced by males and females. Personal pronouns were chosen as the object of analysis, as several studies has suggested them as one of the features that may distinguish the gender of the authors. This study analysed publically available corpus, Rovereto Twitter N-Gram Corpus (RTC), utilized by Herdagdelen (2013). It is gender-of-the-author tagged, which makes the author’s gender analysis easier. The corpus was analysed using AntConc (Anthony, 2014). From AntConc’s concordance analysis, it was found that women utilised more personal pronouns, especially the ones that can create closer bond. On the other hand, men have greater tendency to distant themselves using generic pronouns than women. In conclusion, men and women in this study may use personal pronouns differently.


INTRODUCTION
In academic writing, different language choices are almost invisible.On the other hand, informal conversation can usually depict the different involvedness level of gender because of its 'interactional nature' (Argamon et al., 2003, p. 322).In English articles written by Persian speakers, female writers used the same pattern as their native counterparts, while male ones were affected mostly by their native language (Seyyedrezaie and Vahedi, 2017).In addition, Hosseini and Tammimy (2016) and Jasmani, et al. (2011) found that verbs and pronouns may also provide distinctive gender-oriented information.For these reasons, I intend to investigate the different utilisation of language by men and women through Twitter which is increasingly popular and provides more natural interaction unlike other written discourses such as essays.The strength of Twitter, therefore, is it may provide a natural condition for people to either be 'involved' or 'informative' (Argamon et al., 2003) in their produced-tweets without being afraid to be interrupted.
The study of how gender distinguish the language utilisation has been in the spotlight since Lakoff's (1975).From then, many researchers have tried to find evidence that gender plays an important role in the produced discourse (Tannen, 1994;Cameron, 2003;Talbot, 2003;Cunha et al., 2014).These studies found that men tend to be more straightforward than women.Nonetheless, Cameron (2003, p. 465) noted that the language and gender stereotypes 'has often begun from folklinguistic stereotypes'.Thus, a research is often set by an agenda, and the results cannot avoid to recirculate these stereotypes.This article, as a result, admits that this study also fell into the similar paradox.Nonetheless, it is important to note that this study does not deny other possible factors that may affect the language utilised in the data.This paper has two key aims.Firstly, to find evidence whether involved features, particularly personal pronouns, are more frequent in women-authored discourses as found in many studies (Argamon et al., 2003;Argamon et al., 2007;Newman et al., 2008;Bamman et al., 2014).Secondly, I will address how each gender utilises language through frequent collocation of the most common personal pronouns.One thing to note, however, is that this paper only aims to present phenomena as additional evidence of gender differences in Twitter.Thus, it does not necessarily mean that gender is the distinguishing factor in discourse.

Research Questions
There has been much literature providing evidence that men and women use different linguistic features in their discourses, especially in social media.It is also evident that personal pronouns are more favourable by females (Hosseini and Tammimy, 2016).Nevertheless, it is rare to find how these devices and the semantic preferences accompanying them differ between genders.Therefore, the following research questions will guide my study: -Do females from Herdagdelen's (2003) Rovereto Twitter N-Gram Corpus (RTC) use personal pronouns in their tweets more than males?
-If yes, how do their language choices through the utilisation of personal pronouns differ?

Approaches to Language and Gender
Language and gender have been studied by many researchers using various approaches.The first is deficit approach, labelled to Lakoff's (1975) Language and Woman's Place.Lakoff (1975) mainly discussed women's language in everyday conversation.This approach sees women as a weaker group and is often criticised because it indirectly suggests women to 'speak like men if they want to be taken seriously' (Coates, 2004, p. 6).The second approach is the dominance approach which still sees women as the oppressed group (Coates, 2004).
Unlike the previous approaches, the third approach sees that men and women are two different values; thus, they speak differently (Coates, 2004, p. 6).In this approach, each gender was seen equal, and any differences that occur are due to the gender's 'culture' differences, not their social position.Although the difference approach is greatly criticised when it is applied to the talk among people of different genders (Coates, 2004, p. 6), Cameron (1992, p. 61) argued that gender is a social construction itself.In other words, just like how people of different cultures may speak and use similar language differently, men's and women's using language differently should be seen as a distinguishable social phenomenon.For this reason, dynamic approachsocial constructionismthen emerged.
It is suggested from the fourth approach that language should not be dichotomized based on masculine/feminine point of view; nevertheless, I will adopt different approach because I would like to see the differences in the tweets produced by each gender.However, this preference does not mean that I agree that men and women in general speak differently.It is merely due to the limitation of the data, postings in Twitter, which only has one social variable, gender.

Discourse and Corpus
Before moving further into the use tweets as a written discourse, the definition of discourse and corpus will first be discussed.Discourse is the produced language used in communication.Cameron and Panovic (2014, 3-4) summarised three possible definitions of discourse: (1) 'language above the sentence'; (2) 'language in use'; and (3) a social practice with language as ameroneither spoken or written although recently computer-mediated discourse -produced in computer-mediated environment, such as email, social media and online chatting-becomes popular too.
The record of these divisions is often called as a text.Partington et al. (2013, 3) summed a text as a 'by-product', 'the record' or 'the trace of discourse action'.In other word, although discourse, especially the spoken one, occurs in real time, it can be reproduced for analysis in form of the text.The analysis of this text is commonly called discourse analysis.It studies how 'the language is used to influence the beliefs and behaviour of other people' (Partington et al. 2013, 3).
To sum up, discourse is the language itself.While sentence can be produced without any certain relation to certain context, discourse can often be related to a particular context.Discourse can be divided into two dimensions, spoken and written, although recently the emergence of internet and computer also sheds light to a new dimension, computer-mediated discourse, for any discourses produced in the computer/internet related environment.Examples of computer-mediated discourse is email, online chatting, or status in social media.
Corpus is a collection of texts, and in this project, the texts are students' papers.
Corpus linguistics usually focuses on the analysis of the authentic texts.It often analyses the texts quantitatively.For this reason, Hyland (2009, 110) claimed corpus linguistics and discourse analysis, which usually uses qualitative approach, as 'perfect bedfellows'.While corpus commonly consists of a huge amount of texts, discourse analysis often focuses on a small number of cases.Applying a corpusbased research for a huge amount of texts, and from the findings selecting some special cases and analysing these may be time-saving and more meaningful.In addition, Partington et al. (2013, 5-6) also argued that corpus linguistics may provide useful evidences for discourse analysis.

Twitter as a Computer-mediated Discourse Research Tool
The growth of computer-mediated discourse has been in line with the growth of Internet; they are increasing.Unlike the traditional discourse, computer-mediated discourse (CMD) such as the one in Twitter is written discourse which works like the spoken one.Crystal (2006) introduces the term 'netspeak' for any computermediated ,discourse.The inevitable growth of this 'netspeak' is a promising area for linguistic research.
Moreover, unlike traditional spoken discourses in which the power has been an issue when comparing two genders' linguistic choices, Web 2.0 as a 'user-generated platform' (O'Reilly, 2005), gives more freedom for users to write what they want.This is one strength of using Twitter as the corpus.In addition, a large amount of dataset from social media allows researchers 'to analyse the frequency of individual words' (Bamman et al., 2014, p. 136).This possibility allows not only to find different features such as pronouns, but also to track any lexical words accompanying them.For these reasons, I would like to present individual pronouns used by each gender to see any different semantic utilisation by each gender.

Use of Pronoun as a Gender Representative
During the past 40 years, much information regarding gender differences in language use has become evident.In the academic writing, however, the use of pronouns does not necessarily represent the gender of the author.It seems to be a common agreement that the use of pronouns in academic writing greatly depends upon the discipline it was written.In many hard science articles, such as physics, there is a sense to distance for the author's findings in the article.Hence, first-person pronouns, i.e.I and we, are used less often.The third person pronoun, it,is more prevalent.Geertz (1983( , cited in Harwood, 2005, p. , p. 1208) used the term 'authorevacuated' for this phenomenon.These authors often choose this method regarding the findings to avoid being falsified by other fellow academics of the discipline.Moreover, Hyland (2001) also argued that the eradication of self-mention in research article aims to gain acceptance by the readers.Nevertheless, he emphasised the self-mention as in the personal pronoun.Nevertheless, the use of I and we in academic writing tends to be emphasised as these can be used as author's promotional devices, not the gender representative (Harwood, 2005(Harwood, , p.1226)).
Apart from academic writing, several recent studies suggest that although there may be some gender stereotypes, it is evident that men and women in general use language differently.In the spoken discourse, females are more likely to be 'affective' than males (Holmes 1993).In written discourse, such as blog, Newman et al. (2008) study found that men and women use function words such as pronouns, intensifier and articles differently.In general, a woman has greater tendency to use more pronouns and intensifiers while a man uses articles more often (Newman et al., 2008, p. 231).
Similarly, Argamon et al. (2003) introduced the term 'involvement' for the use of pronouns and intensifier and 'informational' for function words such as determiner and articles.These terms were used to distinguish different characteristics between two genders' writing.According to Argamon et al. (2003), pronouns, especially the devices which refer to relationship between the writer and the reader, are favoured more in women's writing.These differences may not be absolute, but in comparison, women generally use them more (Baker 2008).In addition, although Bamman et al. (2014) argued that it is hard to make a clear distinguishable notions using the 'involved' and 'informational' terms for gender-specific discourses distinction, their findings in the use of pronouns in Twitter also supported the notion of women being involved by using devices such as personal pronouns.Argamon et al. (2007) went further by conducting an automated blog analysis.In their research, Argamon et al. (2007) comprised an English corpus of approximately 140 million words from blogs.Similar to the previous research (Argamon et al., 2003), the algorithm can quite accurately classify the gender of the author at around 80.5% accuracy by tracking some specific features of each gender's writing (Argamon et al., 2007, p. 6).In addition, Burger et al. (2011, cited in Bamman et al., 2014, p. 137) also came into conclusion that the automated prediction is more accurate than human's when it comes to classifying gender of the author.This literature has emphasised the importance of such features in the gender-based discourses.
In short, it is evident that men and women generally have different characteristics in utilising language, in this case English, and these characteristics can even be used to trace the gender of the author.This article intends to see how these linguistic features differ through the use of personal pronouns in men's and women's produced tweets.

The Data
I applied a corpus-driven analysis (Baker, 2010) by analysing a naturally occurring data to see different language utilisation by Twitter users.By using corpus, it is easier to see and compare different utilisation by both genders.In addition, there are certain patterns that can be drawn by analysing words in 'naturally occurring language' (Baker, 2008, p. 76-77).Baker also added that corpus may give unexpected yet insightful cases of how language is utilised (Baker, 2008, p. 81).
The analysed data is a publically available corpus, Rovereto Twitter N-Gram Corpus (RTC), utilized by Herdagdelen (2013).RTC is gender-of-the-author tagged which makes gender distinction process easier.In addition, using existing corpus is useful to 'scholars who may have neither the resources to assemble large teams of researchers nor the time and computing know-how to develop and use new tools' (Mautner, 2007, p. 52).
Although the RTC consists of approximately 75 million Twitter posts, I randomly chose 1000 tweets for each gender using excel RAND formula from 1-gram tweets of RTC.From these 2000 tweets, there are approximately 12781 word tokens of the males' tweets and 13592 word tokens of the females' ones.These tweets were separately analysed, so there are two corpora: the female-produced and the maleproduced tweets.For ethical purpose, any users mentioned in the corpora will be replaced by username.The analysis was conducted in 2016.

Approach and Analysis
A mixed method research was employed to answer the research questions.Unlike Argamon's (Argamon et al., 2003), this study analysed only one variable of the produced discoursespersonal pronounsand provided concordance analysis to present the phenomena.In addition, Argamon et al. (2003) explored gender differences in writing of various genre while this article explored the differences in Twitter.For the first question, a quantitative method was utilised in order to compare the differences; the findings were normalised by presenting per 1000 words number.This quantitative result would be used to answer my first research questions.
One of the most well-known tools for assessing a corpus is AntConc (Anthony, 2014).It is famous because it is free and user-friendly.I utilised AntConc to explore the use of personal pronouns.Firstly, personal pronouns used in each gender's corpus were tracked and counted; the non-standard variations such as 'em and u were also tracked.After that, The results of the frequent words collocating the personal pronouns such as I, you and it, which are dominant in both corpora will be presented.
Finally, to answer the second question, I employ qualitative method by discussing semantic preferences accompanying the three most frequent personal pronouns (I, you and it).This discussion will be accompanied with AntConc's concordance of I love in both corpora.
Through the corpus-driven approach, I intend to explore how each gender utilises language by looking to the personal pronouns and the verb accompanying them.One thing to note, however, I did not intend to make a baseless speculation regarding the findings nor do I intend to generalize that every female uses personal pronouns more than men.Instead, I would like to present them as linguistic phenomena.

The Use of Personal Pronouns by both Gender
The overall use of personal pronouns is displayed in In general, each gender uses almost all personal pronouns.I was used the most in each corpus although women use it nearly twice more often than men.On the second rank, you occurs in each corpus quite equally.Non-human single third person pronoun, it, ranks the third and occurs a bit higher in the female-authored corpus.Me ranks fourth and occurs quite equally in both corpora.
Some other points from table 1 above merit attention.Firstly, the occurrences of third-person pronouns (with it being the exception) are the least of all; he, him, she, her, they and them occur once in a thousand words of male's tweets and almost twice in the female-posted ones (table 1).Although the females use these devices twice as many as men, I cannot say that it is generally true because of small differences between two corpora.
Secondly, first-person singular devices (I and me) are used more by females (92 occurrences per 1000 words).This phenomenon may reflect that women try to be more 'involved' than men by creating closer relationship using these devices.Argamon et al. (2003, p. 330) argued that these pronouns are used to 'encode the relationship between the writer and the reader'.
It can also be seen that women use more personal pronouns in their tweets.A big gap of occurrences (136 of women's as opposed to 105 of men's tweets) in both corpora means that women utilise more personal pronouns than men; however, men used first-person plural (we and us) more than five times than women.This is interesting to note because Argamon et al. (2003, p. 329) speculated that: 'The greater use of plural pronouns reflects the tendency of male authors to encode classes rather than individualized entities and may also serve as a depersonalization mechanism that reduces the specificity of reference to gender, number, and personhood'.
In other words, men might use first-person plural pronouns to distant themselves from being only an individual.This is different from female users who utilised greater number of I and me than the plural counter-parts (we and us).
Taken together, these findings suggest that women utilise more personal pronouns, especially the ones that can create closer bond.On the other hand, men have greater tendency to distant themselves using generic pronouns than women.Nonetheless, it must be taken with extreme caution as there is a possibility that such utilisations were made due to other social factors, such as individual's personality (Hosseini & Tammimy, 2016).

Semantic Preferences Accompanying Personal Pronouns
Firstly, I will show frequent words collocating I, you and it from both corpora.These pronouns were chosen due to having the three highest frequencies.The words are the most frequent two words to the right (2R) of the pronouns.After that, I will discuss some selected concordances based on these findings.Note: username is the replacement for any username mentioned using '@' function in Twitter; the italicized words are content words.
From table 2 above, while using I, there is only one word, love, that expresses feeling in male-authored corpus.On the other hand, miss, love, wish and feel frequently collocate I in the female's postings.I will show the concordance of love used in both corpora later in this section.In addition, feel and hurts, other verbs that may express desire or feeling, were also frequent to accompany you and it in the female-authored tweets.
Interestingly, there is a use of xx (sometimes also xoxo) which accompanies you in the female corpus.This word is usually used to represent kisses (X) and hugs (O) which can also be said as words that express positive feelings.I cannot find any example use of xx or xoxo in the male corpus, so it is not possible to discuss about the use of xoxo in the male corpus.Nevertheless, it is evident that women use more content words to express their feeling, such as miss and love, than men.While I and you are 'linguistic devices that solidify relationships' (Argamon et al., 2003, p. 322), the semantic preferences of miss, love, wish, feel and xx possibly indicate that women use more affective words when talking about relationships than men.
To illustrate the use of the affective words in more detail, I now examine cases of how each gender utilises I and love from table 3 and table 4 below.Overall, table 3 reveals that men utilise the word love towards non-human objects such as dog, game and Xbox.This means that although love conveys their feelings, this feeling might not be used to solidify relationship.In addition, line 1 illustrates that they also use love as a statement to unspecified humans, i.e. people in general.
On the other hand, table 4 indicates that women tend to use love to depict affection towards human beings because of the human-referring pronouns that follow.Although a number of concordances are not equal, these tables present how semantic utilisations between men and women are different.
In short, it is evident that men and women might interact differently in the high interaction linguistic context, in this case Twitter, where the whole netizens who has access to their tweets can see.Nevertheless, the difference in power in Twitter needs to be exercised cautiously as the other background of the users who tweeted were unclear.

DISCUSSION AND CONCLUSION
This study set out to better understand the differences between men and women in utilising language.The data presented here is still limited in term of the corpus as well as the representativeness of each gender.The reality may differ when a bigger number of corpora were utilised.The variable is another limitation because the linguistic features characterising gender cannot be generalised from only the use of personal pronouns; there are a lot of features needed to take into account & Tammimy, 2016; Bamman, 2014;Newman et al., 2008;Argamon et al., 2007;Argamon et al., 2003).
In addition, when compared individually per gender, it is also possible for each gender to utilise personal pronouns differently.Apart from gender, other valuesideology, culture and personalityof each individual may affect their word choices too (Hosseini & Tammimy, 2016).
This study only aimed to find a supporting evidence that 'informational' and 'involvement' devices are utilised differently by each gender.Moreover, this study does not generally mean that each gender utilises language differently, but that gender is one of the social variables to affect individual's language utilisation.
Returning to questions posed at the beginning of this study, it is now possible to state that quantitatively it is female who uses more personal pronouns.Furthermore, using different approach model, men and women utilised language differently (though not absolutely); men might utilise the content words only to express feeling, whereas women are more likely to both express their feelings or desires and create closer ties using specific words such as love, miss and feel.One thing to note, however, the finding of this study does not mean that gender play the most important role in writing.
table 1 below.Because male and female corpora has different word tokens, 12781 and 13592 respectively, the normalized numbers are shown in table 1 as well.Table 1.The use of personal pronouns in both genders' tweets

Table 2 .
The frequent words (2R) accompanying I, you and it

Table 3 .
Concordance of I love in male corpus (appendix 1)

Table 4 .
Concordance of I love in female corpus (appendix 2)