In this report, we analyze the Windows, Android, and iOS versions of Tencent’s Sogou Input Method, the most popular Chinese-language input method in China. Our analysis found serious vulnerabilities in the app’s custom encryption system and how it encrypts sensitive data. These vulnerabilities could allow a network eavesdropper to decrypt sensitive communications sent by the app, including revealing all keystrokes being typed by the user. Following our disclosure of these vulnerabilities, Sogou released updated versions of the app that identified all of the issues we disclosed.
Vulnerabilities in Sogou Keyboard encryption expose keypresses to network eavesdropping.
It’s stories like this that don’t surprise me as much as make me ask: How the fuck do you store and process this much data to get anything useful out of it.
I could be wrong, and this is a generalization of any country you can name, but my impression is data is stored on everyone so when they decide someday to look you up they already have all the data collected. It’s not really processed until needed.
The real answer is compute power. At the moment it’s very expensive to run the computations necessary for big LLMs, I’ve heard some companies are even developing specialized chips to run them more efficiently. On the other hand, you probably don’t want your phone’s keyboard app burning out the tiny CPU in it and draining your battery. It’s not worth throwing anything other than a simple model at the problem.
It’s stories like this that don’t surprise me as much as make me ask: How the fuck do you store and process this much data to get anything useful out of it.
I could be wrong, and this is a generalization of any country you can name, but my impression is data is stored on everyone so when they decide someday to look you up they already have all the data collected. It’s not really processed until needed.
deleted by creator
And how can autosuggest / autocorrect be so bad with so much training data
Did you ever see how an average person types? It’s not the amount of data that is the problem. We have too much dumb data!
The real answer is compute power. At the moment it’s very expensive to run the computations necessary for big LLMs, I’ve heard some companies are even developing specialized chips to run them more efficiently. On the other hand, you probably don’t want your phone’s keyboard app burning out the tiny CPU in it and draining your battery. It’s not worth throwing anything other than a simple model at the problem.
deleted by creator