Reading Like a Computer

Final Project: Corpus Analysis and Data Storytelling (Spring 2026)

Overview

For the final project, you will conduct an in-depth analysis of a corpus using computational methods and communicate your findings through written analysis, visualizations, and a brief presentation to the class.

Project Proposal Due: Friday, April 11, 2026
Final Project Due: Friday, May 16, 2026
Presentations: Tuesday, May 12 & Thursday, May 14, 2026
Submission Format: Web-based (on your course website) + class presentation
Length: 2,500–3,500 words + visualizations + presentation slides
Value: 30% of your final grade


Learning Objectives

By completing this project, you will:

  1. Design and conduct an original computational analysis of a corpus
  2. Apply multiple methods from the course to a cohesive research project
  3. Generate insights through large-scale analysis and interpret findings in context
  4. Create compelling visualizations that tell a story about the data
  5. Communicate technical work to both specialist and non-specialist audiences
  6. Critically evaluate methodology, limitations, and ethical implications

Project Components

Component 1: Project Proposal (Due: Friday, April 11)

Submit a 1-2 page proposal that includes:

1. Research Question(s)

  • What do you want to learn from the corpus?
  • What patterns, themes, or characteristics are you investigating?
  • Why does this question matter?

2. Corpus Description

  • What is your corpus? (specific texts, size, time period)
  • Why this corpus? What makes it interesting or important?
  • Where will you source it? (public domain, open access, your own collection, etc.)

3. Proposed Methods

  • What computational methods will you use?
  • Which tools will you employ?
  • Why are these methods appropriate for your research questions?

4. Timeline

  • What will you accomplish in each remaining week of class?

5. Preliminary Resources

  • Where will you find/create your corpus?
  • Which tools will you need?
  • Any ethical considerations or permissions needed?

Approved Corpora / Suggested Corpus Options:

Option A: Science Fiction Texts

  • Corpus of science fiction novels and short stories from Project Gutenberg
  • Explore: themes, terminology evolution, genre conventions, representations of technology/future

Option B: NYUAD Arts Center Performances

  • Transcriptions and metadata from NYUAD Arts Center events
  • Explore: themes in campus cultural events, artist backgrounds, audience engagement patterns

Option C: Historical Gulf Materials

  • Historical texts, documents, and writings about the Gulf region
  • Explore: terminology and concepts over time, perspectives, archival gaps

Option D: Custom Corpus

  • Your own collection of texts (music lyrics, social media, historical documents, etc.)
  • Must be approved by instructor

Component 2: Final Project (Due: Friday, May 16)

Your final project should include the following sections:

1. Introduction and Research Context (400–500 words)

  • Background: Why is this corpus worth studying? What makes it significant?
  • Research Questions: State clearly what you want to learn
  • Corpus Description: Detailed description of your corpus
    • Source(s) of texts
    • Number and nature of documents
    • Time span covered
    • Any relevant metadata
    • Decisions you made in corpus construction (what to include/exclude and why)
  • Significance: How might this analysis matter to humanists, data scientists, or broader audiences?

2. Methodology (400–500 words)

  • Methods Overview: What computational methods did you use?
  • Tool Selection: Which specific tools and why?
  • Process: Step-by-step explanation of your analysis process
  • Parameters and Choices: What decisions did you make in analyzing the corpus?
    • Stopwords removed or included?
    • Text normalization choices?
    • Parameters for algorithms?
  • Justification: Why these choices? How might different choices yield different results?

3. Analysis and Findings (800–1,000 words)

  • Primary Findings: What patterns, themes, or characteristics did you discover?
  • Visualizations: 5–8 high-quality visualizations that support your findings
    • Each should have a clear caption and be referenced in the text
    • Variety of visualization types (charts, graphs, word clouds, networks, etc.)
  • Interpretation: What do these findings mean?
    • Grounded analysis connecting computational results to humanistic knowledge
    • Historical, cultural, or social context
    • Surprising or notable patterns
    • Comparisons to other corpora or expectations

4. Critical Reflection (500–700 words)

  • Limitations: What are the limitations of your analysis?
    • Corpus limitations (size, representativeness, bias)
    • Tool limitations (accuracy, affordances)
    • Methodological constraints
    • What would you do differently?
  • Embedded Biases: What biases or assumptions are embedded in your corpus and methods?
    • Who created the corpus? Whose perspectives are represented/absent?
    • What are the tools’ limitations or biases?
    • How might these shape your findings?
  • Implications: What broader questions or implications does your analysis raise?
  • Future Directions: What would you explore next? What remains uncertain?

5. Conclusion (200–300 words)

  • Synthesis of key findings
  • Return to original research questions
  • Reflection on what this project taught you about computational thinking and digital humanities

Component 3: Presentation (5–10 minutes)

You will present your project to the class on May 12 or 14.

Presentation Should Include:

  • Title and research question(s)
  • Brief corpus overview
  • Key findings with 2–3 visualizations
  • Critical reflection on methodology and limitations
  • Implications and insights

Format Options:

  • Slides (Google Slides, PowerPoint, Keynote)
  • Website or web-based presentation
  • Other multimedia format (with approval)

Presentation Tips:

  • Tell a story with your data
  • Use visuals effectively
  • Explain technical terms for a general audience
  • Engage the class—invite questions
  • Practice timing and pacing

Format and Submission

Publication

  • Post your complete project on your course website
  • Create a clear, professional layout
  • Include all visualizations, appendices, and supporting materials
  • Make sure all links and images work correctly
  • Include project metadata (date, corpus information, tools used)

Visual Presentation

  • 5–8 visualizations integrated throughout or in a gallery
  • Professional, high-quality graphics
  • Clear captions and legends
  • Appropriate use of color and design

Structure and Readability

  • Clear section headings
  • Logical flow
  • Accessible to a general academic audience
  • Proper formatting (margins, line spacing, font)

Citation and Attribution

  • Cite all sources: readings, tools, data sources, visualizations
  • Use consistent citation style (Chicago, MLA, or APA)
  • Attribute any images or materials from others

Assessment Rubric

See the Rubrics page for detailed evaluation criteria.

Key Areas Evaluated (30 points total):

  • Research question and corpus design (5 pts)
  • Method selection and application (5 pts)
  • Analysis quality and interpretation (5 pts)
  • Critical evaluation of limitations (5 pts)
  • Visualizations and presentation (5 pts)
  • Writing quality and organization (5 pts)

Resources and Support

Research and Corpus Building

Tools and Tutorials

  • Voyant Tools: https://voyant-tools.org/
  • AntConc: https://www.laurenceanthony.net/software/antconc/
  • R for Text Analysis: https://quanteda.io/
  • Python NLTK: https://www.nltk.org/
  • Orange Data Mining: https://orange.readthedocs.io/
  • Class tutorials and recorded materials

Guidance and Feedback

  • Office hours: Discuss corpus ideas, methodology questions, troubleshooting
  • Proposal feedback: Detailed feedback on April 11 submission
  • Class workshop sessions: Lab time and peer feedback (weeks 13–14)
  • Peer review: Feedback from classmates on presentation

Timeline and Milestones

Date Milestone
Friday, April 11 Project proposal due
Friday, April 18 Proposal feedback provided
April 21–May 8 Build corpus and begin analysis
May 5–7 Lab sessions and instructor consultations
May 9–16 Finalize analysis and write-up
May 12 & 14 Class presentations
Friday, May 16 Final project due

Common Questions

Q: Can I use a corpus someone else created?
A: Yes, but you must acknowledge the source and explain why you’re using this particular corpus rather than creating your own.

Q: How large does my corpus need to be?
A: 50–10,000 texts depending on your research questions. Discuss if you have questions.

Q: What if my analysis doesn’t find what I expected?
A: That’s valuable! Negative results and unexpected findings are important in research. Discuss what you learned.

Q: Can I use data I’ve collected myself?
A: Yes, with instructor approval. You may need to anonymize or de-identify personal data.

Q: Can I work with a partner?
A: No, this is an individual project. However, you can discuss ideas and give peer feedback to classmates.

Q: Can I use multiple methods or tools?
A: Absolutely! Using multiple complementary approaches often yields richer insights.


Academic Integrity Reminders

  • All work must be your own. Collaboration, plagiarism, or unauthorized use of outside work will be reported.
  • Cite all sources: tools, tutorials, datasets, visualizations, texts you’re analyzing.
  • If you use images, code, or visualizations from others, cite and attribute properly.
  • Disclose any limitations or errors honestly in your reflection section.

Getting Help

Need clarification?

  • Ask in class
  • Attend office hours (Tu 2:15-3:15 PM, Th 7-8 PM)
  • Email: djw12@nyu.edu
  • Office: A6 1151