Gene expression updates

The work with Shirley Pepke on using CorEx to find patterns in gene expression data is finally published in BMC Medical Genomics.

Shirley wrote a blog post about it as well. She will present this work at the Harvard Precision Medicine conference and we’ll both present at Berkeley’s Data Edge conference.

The code we used for the paper is online. I’m excited to see what people discover with these techniques, but I also can see we have more to do. If speed is an issue (it took us two days to run on a dataset with 6000 genes… many datasets can have an order of magnitude more genes), please get in touch as we have some experimental versions that are faster. We are also working on making the entire analysis pipeline more automated (i.e. connecting discovered factors with known biology and visualizing predictive factors.) To that end, I want to thank the Kestons for supporting future developments under the Michael and Linda Keston Executive Directorship Endowment.

 


Source: Apparent Horizons

Millions of social bots invaded Twitter!

Our work titled Online Human-Bot Interactions: Detection, Estimation, and Characterization has been accepted for publication at the prestigious International AAAI Conference on Web and Social Media (ICWSM 2017) to be held in Montreal, Canada in May 2017!

The goal of this study was twofold: first, we aimed at understanding how difficult is to detect social bots on Twitter respectively for machine learning models and for humans. Second, we wanted to perform a census of the Twitter population to estimate how many accounts are not controlled by humans, but rather by computer software (bots).

To address the first question, we developed a family of machine learning models that leverages over one thousand features characterising the online behaviour of Twitter accounts. We then trained these models with manually-annotated collections of examples of human and bot-controlled accounts across the spectrum of complexity, ranging from simple bots to very sophisticated ones fueled by advanced AI. We discovered that, while human accounts and simple bots are very easy to identify, both by other humans and by our models, there exist a family of sophisticated social AIs that systematically escape identification by our models and by human snap-judgment.

Our second finding reveals that a significant fraction of Twitter accounts, between 9% and 15%,  are likely social bots. This translates in nearly 50 million accounts, according to recent estimates that put the Twitter userbase at above 320 million. Although not all bots are dangerous, many are used for malicious purposes: in the past, for example, Twitter bots have been used to manipulate public opinion during election times, to manipulate the stock market, and by extremist groups for radical propaganda.

To learn more, read our paper: Online Human-Bot Interactions: Detection, Estimation, and Characterization.

Cite as:

Onur Varol, Emilio Ferrara, Clayton Davis, Filippo Menczer, Alessandro Flammini. Online Human-Bot Interactions: Detection, Estimation, and Characterization. ICWSM 2017

 

Press Coverage

  1. CMO Today: Marketers and Political Wonks Gather for SXSW – The Wall Street Journal
  2. Huge number of Twitter accounts are not operated by humans – ABC News
  3. Up to 48 million Twitter accounts are bots, study says – CNET
  4. R u bot or not? – VICE
  5. New Machine Learning Framework Uncovers Twitter’s Vast Bot Population – VICE/Motherboard
  6. A Whopping 48 Million Twitter Accounts Are Actually Just Bots, Study Says – Tech Times
  7. Study reveals whopping 48M Twitter accounts are actually bots – CBS News
  8. Twitter is home to nearly 48 million bots, according to report – The Daily Dot
  9. As many as 48 million Twitter accounts aren’t people, says study – CNBC
  10. New Study Says 48 Million Accounts On Twitter Are Bots – We are social media
  11. Almost 48 million Twitter accounts are bots – Axios
  12. Twitter user accounts: around 15% or 48 million are bots [study] – The Vanguard
  13. Rise of the TWITTERBOTS – Daily Mail
  14. 15 per cent of Twitter is bots, but not the Kardashian kind – The Inquirer
  15. 48 mn Twitter accounts are bots, says study – The Economic Times
  16. 9-15 per cent of Twitter accounts are bots, reveals study – Financial Express
  17. Nearly 48 million Twitter accounts are bots: study – Deccan herald
  18. Study: Nearly 48 Million Twitter Accounts Are Fake; Many Push Political Agendas – The Libertarian Republic
  19. As many as 48 million accounts on Twitter are actually bots, study finds – Sacramento Bee
  20. Study Reveals Roughly 48M Twitter Accounts Are Actually Bots – CBS DFW
  21. Up to 48 million Twitter accounts may be Bots – Financial Buzz
  22. Up to 15% of Twitter accounts are not real people – Blasting News
  23. Tech Bytes: Twitter is Being Invaded by Bots – WDIO Eyewitness News
  24. About 9-15% of Twitter accounts are bots: Study – The Indian Express
  25. Twitter Has Nearly 48 Million Bot Accounts, So Don’t Get Hurt By All Those Online Trolls – India Times
  26. Twitter May Have 45 Million Bots on Its Hands – Investopedia
  27. Bots run amok on Twitter – My Broadband
  28. 9-15% of Twitter accounts are bots: Study – MENA FN
  29. Up To 15 Percent Of Twitter Users Are Bots, Study Says – Vocativ
  30. 48 million active Twitter accounts could be bots – Gearbrain
  31. Study: 15% of Twitter accounts could be bots – Marketing Dive
  32. 15% of Twitter users are actually bots, study claims – MemeBurn
  33. Almost 48 million Twitter accounts are bots – Click Lancashire

Press in non-English media

  1. Bad Bot oder Mensch – das ist hier die Frage – Medien Milch (in German)
  2. Studie: Bis zu 48 Millionen Twitter-Nutzer sind in Wirklichkeit Bots – T3N (in German)
  3. Der Aufstieg der Twitter-Bots: 48 Millionen Nutzer sind nicht menschlich – Studie – Sputnik News (in German)
  4. Studie: Bis zu 48 Millionen Nutzer auf Twitter sind Bots – der Standard (in German)
  5. “Blade Runner”-Test für Twitter-Accounts: Bot oder Mensch? – der Standard (in German)
  6. Bot-Paradies Twitter – Sachsische Zeitung (in German)
  7. 15 Prozent Social Bots? – DLF24 (in German)
  8. TWITTER: IST JEDER SIEBTE USER EIN BOT? – UberGizmo (in German)
  9. Twitter: Bis zu 48 Millionen Bot-Profile – Heise (in German)
  10. Studie: Bis zu 15 Prozent aller aktiven, englischsprachigen Twitter-Konten sind Bots – Netzpolitik (in German)
  11. Automatische Erregung – Wiener Zeitung (in German)
  12. 15 por ciento de las cuentas de Twitter son ‘bots’: estudio – CNET (in Spanish)
  13. 48 de los 319 millones de usuarios activos de Twitter son bots – TIC Beat (in Spanish)
  14. 15% de las cuentas de Twitter son ‘bots’ – Merca 2.0 (in Spanish)
  15. 48 de los 319 de usuarios activos en Twitter son bots – MDZ (in Spanish)
  16. Twitter, paradis des «bots»? – Slate (in French)
  17. Twitter compterait 48 millions de comptes gérés par des robots – MeltyStyle (in French)
  18. Twitter : 48 millions de comptes sont des bots – blog du moderateur (in French)
  19. ’30 tot 50 miljoen actieve Twitter-accounts zijn bots’ – NOS (in Dutch)
  20. 48 εκατομμύρια χρήστες στο Twitter δεν είναι άνθρωποι, σύμφωνα με έρευνα Πηγή – LiFo (in Greek)
  21. 48 triệu người dùng Twitter là bot và mối nguy hại – Khoa Hoc Phattrien (in Vietnamese)


Source: Emilio

Complex System Society 2016 Junior Scientific Award!

I was selected as recipient of the 2016 Junior Scientific Award by the Complex System Society!

The award readsEmilio Ferrara is one of the most active and successful young researchers in the field of computational social sciences. His works include the design and application of novel network-science models, algorithms, and tools to study phenomena occurring in large, dynamical techno-social systems. They improved our understanding of the structure of large online social networks and the dynamics of information diffusion. He has explored online social phenomena (protests, rumours, etc.), with applications to model and forecast individual behaviour, and characterise information diffusion and cyber-crime. 

14581453_10154195743616748_8983025224411157609_n


Source: Emilio

Twitter, Social Bots, and the US Presidential Elections!

First Monday: Social bots distort the 2016 U.S. Presidential election online discussion

Our paper titled Social bots distort the 2016 U.S. Presidential election online discussion was published on the November 2016 issue of First Monday and selected as Editor’s featured article!

We investigated how social bots, automatic accounts that populate the Twitter-sphere, are distorting the online discussion about the 2016 U.S. Presidential elections. In a nutshell, we discovered that:

  • About one-in-five tweets regarding the elections has been posted by a bot, totalling about 4 Million tweets posted during the month prior to the elections by over 400,000 bots.
  • Regular (human) users cannot determine whether the source of some specific information is another legitimate user or a bot: therefore, bots are being retweeted at the same rate as humans.
  • Bots are biased (by construction): Trump-supporting bots, for example, are producing systematically only positive contents in support of their candidate, altering the public perception by giving the impression that there is a grassroot positive and sustained support for that candidate.
  • It remains impossible, to date, to determine who’s behind these bots (the master puppeteers): single individuals, third-party organizations, and even foreign governments may be orchestrating these operations.

To know more, read our paper: Social bots distort the 2016 U.S. Presidential election online discussion

Cite as:

Alessandro Bessi, Emilio Ferrara. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday 21(11), 2016

Press Coverage

  1. How the Bot-y Politic Influenced This Election – MIT Technology Review
  2. Facebook, Twitter & Trump – The New York Review of Books
  3. How Twitter bots played a role in electing Donald Trump – WIRED
  4. How Twitter bots helped Donald Trump win the US presidential election – Arstechnica
  5. On Twitter, No One Knows You Are a Trump Bot – Fast Company
  6. Election 2016 Belongs to the Twitter Bots – VICE
  7. Almost a fifth of election chatter on Twitter comes from bots – Fusion
  8. Study reports that nearly 20% of election-related tweets were ‘algorithmically driven’ – Talking New Media
  9. How Twitter bots affected the US presidential campaign – The Conversation
  10. Advertising is driving social media-fuelled fake news and it is here to stay – The Conversation
  11. 20% of All Election Related Tweets Came From Non-Humans – Futurism
  12. Twitter Bots Dominate 2016 Presidential Election: New Study – Heavy
  13. Tracking The Election With Social Media In Real-Time: How Accurate Is It? – Heavy
  14. BOTS ‘SWAY’ ELECTION Fake tweets by social media robots could swing US Presidential election – The Sun (UK)
  15. A fifth of all US election tweets have come from bots – ABC News
  16. There are 400,000 Bots That Just Tweet Political Views All Day – Investopedia
  17. Real, or not? USC study finds many political tweets come from fake accounts – Science Blog
  18. Software bots distort Donald Trump support on Twitter: Study – ETCIO
  19. How hackers, social bots, data analysts shaped the U.S. election – The Nation
  20. That swarm of political tweets in your feed? Many could be from bots – The Business Journals
  21. Software ‘bots’ distort Trump support on Twitter – New Vision
  22. Bots Invade Twitter, Spreads Misinformation On US Election – EconoTimes
  23. Software ‘bots’ seen skewing support for Trump on Twitter – The Japan Times
  24. US Presidential Elections 2016: Bot-generated fake tweets influencing US election outcome, says new study – Indian Express
  25. US elections 2016: Researchers show how Twitter bots are trying to influence the poll in favour of Trump – International Business Times
  26. Hillary vs Trump: Most of the election chatter online by Twitter bots, says study – Tech 2 First Post
  27. Twitter bots distort Trump support – iAfrica
  28. Social Media ‘Bots’ Working To Influence U.S. Election – CBS San Francisco
  29. Elezioni Usa: il 19% dei tweet elettorali è prodotto da software – Repubblica.it (in Italian)
  30. Almost a fifth of election chatter on Twitter comes from bots – Full Act
  31. Software ‘bots’ distort Trump support on Twitter: study – Yahoo! News
  32. Bots Will Break 2016 US Elections Results – iTechPost
  33. Scientist Worries Robot-Generated Tweets Could Compromise The Presidential Election – Newsroom America
  34. Software ‘bots’ distort Trump support on Twitter: study – Phys.org
  35. Spotlight: Fake tweets endanger integrity of U.S. presidential election – XinhuanNet
  36. New Study: Twitter Bots Amount for One-Fifth of US Election Conversation – Dispatch Weekly
  37. Are Robot generated Tweets compromising US Polls? – TechRadar India
  38. Fake tweets endanger integrity of US presidential election – Global Times
  39. Software ‘bots’ distort Trump support on Twitter: study – The Daily Star
  40. Software ‘bots’ distort Trump support on Twitter: study – News Dog
  41. Malicious Twitter bots could have profound consequences for the election – RawStory
  42. ‘Robot-generated fake tweets influencing US election outcome’ – DNA – Daily News & Analysis
  43. Sophisticated Bot-Generated Tweets Could Influence Outcome of US Presidential Election – Telegiz
  44. UIC Journal Shows ‘Bots’ Sway Political Discourse, Could Impact Election – NewsWise
  45. Bot-generated tweets could threaten integrity of 2016 US presidential election: Study – BGR.in
  46. Robots behind the millions of tweets: “The integrity at danger” – Svenska Dagbladet (in Swedish)
  47. Bot generated tweets influence US Presidential election polls – I4U News
  48. High percentage of robot-generated fake tweets likely to influence public opinion – NewsGram
  49. ‘Robot-generated fake tweets influencing US election outcome’ – Press Trust of India
  50. Robot-generated fake tweets influencing US election outcome: Study – IndianExpress
  51. Fake Tweets, real consequences for the election – Phys.org
  52. Real, or not? USC study finds many political tweets come from fake accounts – USC News
  53. We’re in a digital world filled with lots of social bots – USC News


Source: Emilio

Cancer in the time of algorithms

Edit: Also check out the story by the Washington Post and on cancer.gov.

Shirley is a collaborator of mine who works on using gene expression data to get a better understanding of ovarian cancer. She has a remarkable personal story that is featured in a podcast about our work together. I laughed, I cried, I can’t recommend it enough. It can be found on itunes and on soundcloud (link below).

As a physicist, I’m drawn towards simple principles that can explain phenomena that look complex. In biology, on the other hand, explanations tend to be messy and complicated. My recent work has really revolved around trying to use information theory to cut through messy data to discover the strongest signals. My work with Shirley applies this idea to gene expression data for patients with ovarian cancer. Thanks to Shirley’s amazing work, we were able to find a ton of interesting biological signals that could potentially have a real impact on treating this deadly disease. You can see a preprint of our work here.

I want to share one quick result. People often judge clusters discovered in gene expression data based on how well they recover known biological signals. The plot below shows how well our method (CorEx) does compared to a standard method (k-means) and a very popular method in the literature (hierarchical clustering). We are doing a much better job of finding biologically meaningful clusters (at least according to gene ontology databases), and this is very useful for connecting our discovery of hidden factors that affect long-term survival to new drugs that might be useful for treating ovarian cancer.

TCGA clusters

 

 


Source: Apparent Horizons

Premature optimization optimization

Here’s one way to solve a problem. (1) Visualize what a good solution would look like. (2) Quantify what makes that solution “good”. (3) Search over all potentials solutions for one that optimizes the goodness.

I like working on this whole pipeline, but I have come to the realization that I have been spending too much time on (3). What if there were a easy, general, powerful framework for doing (3) that would work pretty well most of the time? That’s really what tensorflow is. In most cases, I could spend some time engineering a task-specific optimizer that will be better, but this is really premature optimization of my optimization and, as Knuth famously said: “About 97% of the time, premature optimization is the root of all evil”.The docker whale


Source: Apparent Horizons

The Rise of Social Bots!

Emilio Ferrara discusses “The Rise of Social Bots” on the July 2016 Communications of the ACM.

Our review paper on the rise of social bots has appeared on the cover of the July 2016 issue of Communications of the ACM and is the subject of my interview above!

The Rise of Social Bots

Social bots populate techno-social systems: they are often benign, or even useful, but some are created to harm, by tampering with, manipulating, and deceiving social media users. Social bots have been used to infiltrate political discourse, manipulate the stock market, steal personal information, and spread misinformation. The detection of social bots is therefore an important research endeavor. A taxonomy of the different social bot detection systems proposed in the literature accounts for network-based techniques, crowdsourcing strategies, feature-based supervised learning, and hybrid systems.

Cite as:

Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, Alessandro Flammini. The Rise of Social Bots. Communications of the ACM, Vol. 59 No. 7, Pages 96-104


Source: Emilio