4 min. read
Tabroom.com is the official(?) platform for Speech and Debate data. It’s used to record all of the ballots and data for tournaments in the activity.
During my high school career, I developed TabroomAPI which scrapes the data from the website. I’ll share some secrets I found out while developing the library.
The library is written in Kotlin, designed for Kotlin Multiplatform. This rule is arguable used the most:
internal const val ROW_SELECTOR = "tbody > tr"
The library at its core just downloads HTML files, then uses JSoup (Java) or KSoup (Everything else), uses CSS query selectors to find the tags I need, and parses the data into classes.
I chose Kotlin for multiplatform & asynchronous reasons. Using launch {}
is super useful and nice that I don’t have to figure out how to transfer it across different platforms.
Looking through scraper.kt
reveals a few CSS query rules for extracting tournament data on a tournament page.
This is one of many functions that help gets tournament data:
internal fun getTournament(doc: Document): Tournament {
val name = doc.querySelector(TOURNAMENT_NAME_SELECTOR)?.textContent ?: "Unknown"
val descHtml = doc.querySelector(TOURNAMENT_DESC_SELECTOR)?.textContent ?: ""
val desc = descHtml.replace(Regex("<[^>]*>"), "")
val subtitle = doc.querySelector(TOURNAMENT_SUBTITLE_SELECTOR)?.textContent ?: ""
val year = subtitle.substringBefore(" ").trim()
val location = subtitle.substringAfterLast(" ").trim()
return Tournament(name, descHtml, desc, year, location)
}
Document
is a custom class that is just a wrapper around the parser library I mentioned earlier for a full HTML document.Document#querySelector
and Document#querySelectorAll
are CSS query selectors (see below).
Element
class that wraps around the parser’s Element class, giving me access to text content and tag attributes.Those are really all we need to scrape tabroom.
internal const val TOURNAMENT_NAME_SELECTOR = "div.main.index > h2.centeralign.marno"
internal const val TOURNAMENT_SUBTITLE_SELECTOR = "div.main.index > .full.centeralign.marno"
internal const val TOURNAMENT_DESC_SELECTOR = ".thenines.leftalign.plain.martop.whiteback.fullscreen.padvertmore.frontpage"
There’s more for the event sidebar and entry links:
internal const val EVENT_LINKS_SELECTOR = "div.menu > div.sidenote > a.half.marvertno"
internal const val ENTRY_EVENT_SELECTOR = "div.menu > div.sidenote > a.full"
Then, I have some for each individual event (LD, PF, Policy, Congress, etc.)
internal const val EVENT_INFO_SELECTOR = "div.menu > div.sidenote > a.nowrap.half.marvertno"
internal const val EVENT_INFO_KEY_SELECTOR = "div.main > div.row > span.third.semibold"
internal const val EVENT_INFO_VALUE_SELECTOR = "div.main > div.row > span.twothirds"
We also have to grab the judges data, to figure out who we need to strike:
internal const val JUDGES_LIST_SELECTOR = "div.menu > div.sidenote > div.nospace"
internal const val JUDGES_LINK_SELECTOR = "$JUDGES_LIST_SELECTOR > span.third.nospace > a.padvertless"
Unfortunately, paradigms are authetication based, so that isn’t apart of an API I can use.
For some reason, there’s a JSON string somewhere in a <script>
tag that has all of the ballot results.
internal const val RECORD_TITLE = "div.blankfull > div > span.nospace > h3"
internal suspend fun getRecord(doc: Document, isDouble: Boolean): Map<Int, Ballot> = coroutineScope {
val map = mutableMapOf<Int, Ballot>()
val data = json.decodeFromString<JsonObject>(doc.html.substringAfter("var panels = ").substringBefore(";").trim())
for ((id, ballotJson) in data.entries) {
if (ballotJson !is JsonObject) continue
if (ballotJson.isEmpty()) continue
launch {
val ballot = json.decodeFromJsonElement<Ballot>(ballotJson)
map.put(id.toInt(), ballot)
}
}
return@coroutineScope map
}
This was a short introduction to the inner workings of TabroomAPI, which scrapes Speech & Debate information from the website. You can look at the repository to view more.