Last Thursday, I had a really helpful meeting with my dissertation advisors. I had been floundering a bit before the meeting, trying to wrap my head around this dissertation project and what it might look like. Particularly, I wasn’t sure how I was going to build this corpus of blogs. I had worked for a while (longer than necessary, probably) writing some code that would take a txt file of blog URLs, scrape the text from the posts that appeared on the first page of each blog, and then save each blog’s text in a separate txt file. For someone like me who’s still relatively new to programming, writing this code was a feat! But I had concerns. I wasn’t super confident about my method for selecting the blogs. I had basically settled on using Blogger’s Next Blog feature, but I learned that in 2009, the feature stopped taking users to random blogs and instead used some sort of algorithm to return blogs on topics that would more likely be interesting to the user. Not what I needed for this study. I also contacted to see if there was a way that I could systematically gather a list of personal blogs, but the response made it clear that doing so was not possible the way I wanted to.

So I came to the meeting not totally sure how to go about collecting my data. After talking with Jo and Bethany, we decided on taking an approach to corpus collection that mirrors more closely Bethany’s method–that is, working with a smaller, more controlled corpus rather than trying to gather a sample of texts that is as large as possible. We also discussed the idea that I can develop a survey (basically a language attitudes survey) and administer it to the authors of the blogs I’ll include in my corpus. I love this idea because, assuming people actually respond to my survey, it will mean that I will rely much less on assumptions about what influences these author’s linguistic choices and more on actual data that they provide.

So the next step is conducting a situational analysis for the different types of blogs I’ll include in my study. Bethany’s book Linguistic Variation in Research Articles has been really helpful so far.

More later.