A few weeks ago I overheard a colleague, one of Thetus’ resident data scientists, as he wrestled with the interface for creating an account on a Chinese geocoding website.

Mandarin-language CAPTCHA dialogue
Mandarin-language CAPTCHA dialogue

“Every stroke is perfect,” he told me as he stared at a Mandarin-language CAPTCHA dialogue. “I spent two hours getting it right.” All he wanted to do is find map coordinates for a large group of factories in mainland China, but Google doesn’t recognize the addresses and to Baidu Maps he might as well have been a poorly-programmed bot. To this day we don’t know exactly where those buildings are.

But the language difficulties he faced then are just a small and naturally-occurring symptom of a larger phenomenon that stands between open-source analysts and the data their duties require: a balkanization of information. By design the Internet is an open book, equally accessible to anyone with a connected device. Today, however, it is in the process of fracturing into a collection of tenuously-connected shards.

The first cracks appeared in the 1990s as China’s ruling party began to institute controls on their citizens’ access to outside information. Implemented through content filters, IP blocking, and widespread monitoring of Internet users, the Great Firewall of China’s primary goal is to suppress outside data on its way to the viewer. Of greater interest to an outside analyst are its secondary effects: stricter controls on outside sites cause users to prefer China’s native alternatives for social media, searching, and commerce. The result is that outside analysts must exert greater efforts across a wider variety of sites to see the same information that their subjects do. Another challenge is the chilling effect that Internet monitoring has on speech – the more users feel obliged to self-censor, the less likely it is that observers can expect to find or exploit true and complete information.

China is not the only authority attempting to filter or isolate its national networks. World governments have only grown more interested in similar policies, especially since the 2013 disclosures by former National Security Agency contractor Edward Snowden about the extent of the NSA’s electronic spying. Iran’s government has taken concrete steps to lay the foundation of an isolated “halal Internet”; most tech-savvy North Koreans have only ever seen the inside of the DPRK’s “walled garden” national intranet. Even Brazil has expressed serious interest in taking greater ownership of its Internet infrastructure, laying plans for its own undersea cable to Europe to avoid interception by America and flirting briefly with the idea of requiring web-based companies to store data on Brazilian clients inside the country.

At the same time, the Internet as a whole is expanding rapidly: almost three hundred million new websites emerged between 2013 and 2014, though most of these are probably of no interest to observers. Even the tiny minority that could have useful information expand an analyst’s search space massively. With access to certain viewpoints simultaneously becoming more difficult to gain, we risk becoming trapped in an echo chamber repeating homogenous opinions and facts.

The takeaway for analysts working in the open-source environment is that we have to be better than ever in order to stay competitive. We have to seek out opposing viewpoints, verify our sources, and work harder to keep our biases in check. Perhaps the most important edge we can seize is technology – tools that can sift the ocean of unsorted data at our fingertips, bringing it some efficiency and structure. At Thetus, we are working to build better nets.

~ Colin McWrightman, Analyst

Curious? Let's set up a free trial.

Try Savanna
Scroll to Top