Final Assignment Guidelines
Reading Like a Computer
Final Project: Corpus Analysis and Data Storytelling (Spring 2026)
Overview
For the final project, you will conduct an in-depth analysis of a corpus using computational methods and communicate your findings through written analysis, visualizations, and a brief presentation to the class.
Project Proposal Due: Friday, April 11, 2026
Final Project Due: Friday, May 16, 2026
Presentations: Tuesday, May 12 & Thursday, May 14, 2026
Submission Format: Web-based (on your course website) + class presentation
Length: 2,500–3,500 words + visualizations + presentation slides
Value: 30% of your final grade
Learning Objectives
By completing this project, you will:
- Design and conduct an original computational analysis of a corpus
- Apply multiple methods from the course to a cohesive research project
- Generate insights through large-scale analysis and interpret findings in context
- Create compelling visualizations that tell a story about the data
- Communicate technical work to both specialist and non-specialist audiences
- Critically evaluate methodology, limitations, and ethical implications
Project Components
Component 1: Project Proposal (Due: Friday, April 11)
Submit a 1-2 page proposal that includes:
1. Research Question(s)
- What do you want to learn from the corpus?
- What patterns, themes, or characteristics are you investigating?
- Why does this question matter?
2. Corpus Description
- What is your corpus? (specific texts, size, time period)
- Why this corpus? What makes it interesting or important?
- Where will you source it? (public domain, open access, your own collection, etc.)
3. Proposed Methods
- What computational methods will you use?
- Which tools will you employ?
- Why are these methods appropriate for your research questions?
4. Timeline
- What will you accomplish in each remaining week of class?
5. Preliminary Resources
- Where will you find/create your corpus?
- Which tools will you need?
- Any ethical considerations or permissions needed?
Approved Corpora / Suggested Corpus Options:
Option A: Science Fiction Texts
- Corpus of science fiction novels and short stories from Project Gutenberg
- Explore: themes, terminology evolution, genre conventions, representations of technology/future
Option B: NYUAD Arts Center Performances
- Transcriptions and metadata from NYUAD Arts Center events
- Explore: themes in campus cultural events, artist backgrounds, audience engagement patterns
Option C: Historical Gulf Materials
- Historical texts, documents, and writings about the Gulf region
- Explore: terminology and concepts over time, perspectives, archival gaps
Option D: Custom Corpus
- Your own collection of texts (music lyrics, social media, historical documents, etc.)
- Must be approved by instructor
Component 2: Final Project (Due: Friday, May 16)
Your final project should include the following sections:
1. Introduction and Research Context (400–500 words)
- Background: Why is this corpus worth studying? What makes it significant?
- Research Questions: State clearly what you want to learn
- Corpus Description: Detailed description of your corpus
- Source(s) of texts
- Number and nature of documents
- Time span covered
- Any relevant metadata
- Decisions you made in corpus construction (what to include/exclude and why)
- Significance: How might this analysis matter to humanists, data scientists, or broader audiences?
2. Methodology (400–500 words)
- Methods Overview: What computational methods did you use?
- Tool Selection: Which specific tools and why?
- Process: Step-by-step explanation of your analysis process
- Parameters and Choices: What decisions did you make in analyzing the corpus?
- Stopwords removed or included?
- Text normalization choices?
- Parameters for algorithms?
- Justification: Why these choices? How might different choices yield different results?
3. Analysis and Findings (800–1,000 words)
- Primary Findings: What patterns, themes, or characteristics did you discover?
- Visualizations: 5–8 high-quality visualizations that support your findings
- Each should have a clear caption and be referenced in the text
- Variety of visualization types (charts, graphs, word clouds, networks, etc.)
- Interpretation: What do these findings mean?
- Grounded analysis connecting computational results to humanistic knowledge
- Historical, cultural, or social context
- Surprising or notable patterns
- Comparisons to other corpora or expectations
4. Critical Reflection (500–700 words)
- Limitations: What are the limitations of your analysis?
- Corpus limitations (size, representativeness, bias)
- Tool limitations (accuracy, affordances)
- Methodological constraints
- What would you do differently?
- Embedded Biases: What biases or assumptions are embedded in your corpus and methods?
- Who created the corpus? Whose perspectives are represented/absent?
- What are the tools’ limitations or biases?
- How might these shape your findings?
- Implications: What broader questions or implications does your analysis raise?
- Future Directions: What would you explore next? What remains uncertain?
5. Conclusion (200–300 words)
- Synthesis of key findings
- Return to original research questions
- Reflection on what this project taught you about computational thinking and digital humanities
Component 3: Presentation (5–10 minutes)
You will present your project to the class on May 12 or 14.
Presentation Should Include:
- Title and research question(s)
- Brief corpus overview
- Key findings with 2–3 visualizations
- Critical reflection on methodology and limitations
- Implications and insights
Format Options:
- Slides (Google Slides, PowerPoint, Keynote)
- Website or web-based presentation
- Other multimedia format (with approval)
Presentation Tips:
- Tell a story with your data
- Use visuals effectively
- Explain technical terms for a general audience
- Engage the class—invite questions
- Practice timing and pacing
Format and Submission
Publication
- Post your complete project on your course website
- Create a clear, professional layout
- Include all visualizations, appendices, and supporting materials
- Make sure all links and images work correctly
- Include project metadata (date, corpus information, tools used)
Visual Presentation
- 5–8 visualizations integrated throughout or in a gallery
- Professional, high-quality graphics
- Clear captions and legends
- Appropriate use of color and design
Structure and Readability
- Clear section headings
- Logical flow
- Accessible to a general academic audience
- Proper formatting (margins, line spacing, font)
Citation and Attribution
- Cite all sources: readings, tools, data sources, visualizations
- Use consistent citation style (Chicago, MLA, or APA)
- Attribute any images or materials from others
Assessment Rubric
See the Rubrics page for detailed evaluation criteria.
Key Areas Evaluated (30 points total):
- Research question and corpus design (5 pts)
- Method selection and application (5 pts)
- Analysis quality and interpretation (5 pts)
- Critical evaluation of limitations (5 pts)
- Visualizations and presentation (5 pts)
- Writing quality and organization (5 pts)
Resources and Support
Research and Corpus Building
- Project Gutenberg — 70,000+ free texts
- Google Scholar — Academic texts
- Internet Archive — Historical documents and texts
- Kaggle Datasets — Pre-compiled datasets
- UCI Machine Learning Repository — Data science datasets
- GitHub — Open-source datasets and projects
Tools and Tutorials
- Voyant Tools: https://voyant-tools.org/
- AntConc: https://www.laurenceanthony.net/software/antconc/
- R for Text Analysis: https://quanteda.io/
- Python NLTK: https://www.nltk.org/
- Orange Data Mining: https://orange.readthedocs.io/
- Class tutorials and recorded materials
Guidance and Feedback
- Office hours: Discuss corpus ideas, methodology questions, troubleshooting
- Proposal feedback: Detailed feedback on April 11 submission
- Class workshop sessions: Lab time and peer feedback (weeks 13–14)
- Peer review: Feedback from classmates on presentation
Timeline and Milestones
| Date | Milestone |
|---|---|
| Friday, April 11 | Project proposal due |
| Friday, April 18 | Proposal feedback provided |
| April 21–May 8 | Build corpus and begin analysis |
| May 5–7 | Lab sessions and instructor consultations |
| May 9–16 | Finalize analysis and write-up |
| May 12 & 14 | Class presentations |
| Friday, May 16 | Final project due |
Common Questions
Q: Can I use a corpus someone else created?
A: Yes, but you must acknowledge the source and explain why you’re using this particular corpus rather than creating your own.
Q: How large does my corpus need to be?
A: 50–10,000 texts depending on your research questions. Discuss if you have questions.
Q: What if my analysis doesn’t find what I expected?
A: That’s valuable! Negative results and unexpected findings are important in research. Discuss what you learned.
Q: Can I use data I’ve collected myself?
A: Yes, with instructor approval. You may need to anonymize or de-identify personal data.
Q: Can I work with a partner?
A: No, this is an individual project. However, you can discuss ideas and give peer feedback to classmates.
Q: Can I use multiple methods or tools?
A: Absolutely! Using multiple complementary approaches often yields richer insights.
Academic Integrity Reminders
- All work must be your own. Collaboration, plagiarism, or unauthorized use of outside work will be reported.
- Cite all sources: tools, tutorials, datasets, visualizations, texts you’re analyzing.
- If you use images, code, or visualizations from others, cite and attribute properly.
- Disclose any limitations or errors honestly in your reflection section.
Getting Help
Need clarification?
- Ask in class
- Attend office hours (Tu 2:15-3:15 PM, Th 7-8 PM)
- Email: djw12@nyu.edu
- Office: A6 1151