Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BlaBlaMeter detects how much bullshit is in your text (blablameter.com)
108 points by grabeh on Aug 30, 2012 | hide | past | favorite | 92 comments


Derrida, Chapter 2 'Of Grammatology'

>Bullshit Index :0.26 Your text shows some indications of 'bullshit'

some indications ... I'd say this thing is broken.


I agree.

I tried some Hegel, who scored 0.18. Clearly broken.

Schopenhauer got it right, where BlaBlaMeter gets it wrong:

"If I were to say that the so-called philosophy of this fellow Hegel is a colossal piece of mystification which will yet provide posterity with an inexhaustible theme for laughter at our times, that it is a pseudo-philosophy paralyzing all mental powers, stifling all real thinking, and, by the most outrageous misuse of language, putting in its place the hollowest, most senseless, thoughtless, and, as is confirmed by its success, most stupefying verbiage, I should be quite right. Further, if I were to say that this summus philosophus ... scribbled nonsense quite unlike any mortal before him, so that whoever could read his most eulogized work, the so-called Phenomenology of the Mind, without feeling as if he were in a madhouse, would qualify as an inmate for Bedlam, I should be no less right."


That snippet scores 0.26


I copied that text into the meter, got the .26 you saw. I then listened in to a conversation at the cube next to mine and wrote that in. "Using software as a service we can replicate and evergreen in the cloud." One simple sentence jumped it to a .35.


YES. Someone needs to make a voice-to-text app that has a "bullshit meter" needle, and that beeps rapidly (a la PKE meter) when a threshold is exceeded.

http://www.gbfans.com/equipment/pke-meter/


You're just bitter that that era of philosopher in general used an artificially complicated writing style, complete with invented terminology, that borders bullshit :)


I'm just pointing out it is broken.

Finnegan's Wake, Chapter 2, Book 1.

> Bullshit Index :0.11 Your text shows only a few indications of 'bullshit'-English.

definitely broken

Or perhaps it's a feature and it can pick out philosophical / artistic 'bullshit' from PR 'bullshit'.

Quote for people not familiar with 'Finnegan's Wake'

> And aroud the lawn the rann it rann and this is the rann that Hosty made. Spoken. Boyles and Cahills, Skerretts and Pritchards, viersefied and piersified may the treeth we tale of live in stoney. Here line the refrains of. Some vote him Vike, some mote him Mike, some dub him Llyn and Phin while others hail him Lug Bug Dan Lop, Lex, Lax, Gunne or Guinn. Some apt him Arth, some bapt him Barth, Coll, Noll, Soll, Will, Weel, Wall but I parse him Persse O'Reilly else he's called no name at all. To- gether. Arrah, leave it to Hosty, frosty Hosty, leave it to Hosty for he's the mann to rhyme the rann, the rann, the rann, the king of all ranns. Have you here? (Some ha) Have we where? (Some hant) Have you hered? (Others do) Have we whered (Others dont) It's cumming, it's brumming! The clip, the clop! (All cla) Glass crash. The(klikkaklakkaklaskaklopatzklatschabattacreppycrotty- graddaghsemmihsammihnouithappluddyappladdypkonpkot!)


You have a weird definition of 'bullshit' if you think that counts.

Anyway, I'd assume it's looking for certain 'filler' words and phrases that get used a lot when writing BS.


"Bullshit" is a word I'd use to describe Finnegan's Wake, but it certainly is valuable & extremely beautiful bullshit.


I believe there may be a line drawn between "bullshit" and "nonsense" :)


Man, that is so much better if you read it aloud.


Try the following text, which I pulled from a brand management web site:

We help businesses increase profitability by helping develop a strategic market approach and communicate the right message.

Our approach is designed for those serious and committed to looking inward so they can connect outward for greater results in an expedient manner.

We analyze your business goals, assess your marketing needs to accomplish them, create a strategic plan and manage its implementation, leaving owners and managers to tend to other needs of their business.

Excerpt is cited as fair use, namely the educational value in evaluating a tool with three sentences of buzzword heavy english from a real site. Specific citation info omitted only so they won't be embarrassed, but may be found with Google.


The main reason I posted was in the hope of triggering discussion over the method used to analyse the text.

I've seen users getting a lower score simply by separating out a block of text into numbered paragraphs which would seem to point to quite a simplistic method.

http://ipdraughts.wordpress.com/2012/08/25/cutting-down-on-t...


I'd like to use it as a plugin for email, Reddit Enhancement Suite, and other such things of that nature .. ;)


I'll withstand my statement: model based on a corpus of PR, scholar, licenses and the like texts. If they are into real statistical NLP.

Or just esthetic rules + word dictionary.


If I were to make the software, the corpus of PR, licenses, etc. would be the way I go. But "they did it statistically" doesn't answer the question "what is the model?" There are many different statistical models one could use. My other post has a few things we've figured out.

But I'm starting to think a rule-based lexicon isn't out of the question, given these >1 scores on some texts.



I tried the german version of it and pasted a SAP article, which got a score of 1.34. They mostly talk bullshit, but the article wasn't bad to understand imho, so I think the algorithm needs some work.

The text was:

"Möchten Sie SAP-Software vor Ort installieren oder über die Cloud darauf zugreifen? Wir bieten in jedem Fall umfassende Services, zugeschnitten auf Ihre individuellen Anforderungen. Wir verfügen über eines der größten Expertenteams weltweit. Unsere qualifizierten Mitarbeiter beraten Sie gern bei der Konzeptionierung, Implementierung und Optimierung Ihrer Systemlandschaft. Profitieren Sie innerhalb kürzester Zeit von Ihrer SAP-Lösung."


Heh, pasted in the benefits here: http://www.oracle.com/us/products/applications/061863.html and got 1.26.


Great idea, but show me what in my text is triggering it, and have some "example" buttons.


You might like http://nbartlomiej.github.com/lisense/ then.

It my old pet project; like BlaBlaMeter, but for licenses.

I still hope to experiment with the idea of improving licenses in future.


A multi-colored 3d topology of bullshit would really make it hit home


Yeah, would be nice to the see the text again with a BS heatmap highlighting the text.


I tried it on the front page of Time Cube.

  Bullshit Index :0.12
  Your text shows only a few indications of 'bullshit'-English.
I get the feeling this tool is only testing for a very narrow definition of bullshit.


I just pasted some text from a random TechCrunch article relating to Spotify and got this:

Your text: 847 characters, 145 words Bullshit Index :0.56 Something's fishy. Obviously you want to sell something, or you're trying to impress somebody. Are you sure that you have a real message, and if so: who would understand it?


Apparently you're not the only one to try TechCrunch first - I did exactly the same thing after clicking the link.

It appears our internal bullshit meters work just fine as well :)


After that, I pasted in an excerpt from PandoDaily and it got a marginally higher score on the BS meter.


while I read TechCrunch from time to time, I never visit PandoDaily. Stopped after I saw Sarah Lacy's charade with Zuck on YouTube.


I tried with my own blog post, good to know its not full of bullshit


Barack Obama's Inaugural: 0.18 Ryan's RNC Speech: 0.14

Lincoln (Gettysburg): 0.09 Lincoln (Second Inaugural): 0.09 MLK (I Have A Dream): 0.08 (Lowest of any Political text I tried) Someone else said Churchill got a 0.08, but I didn't actually try the unaltered text for myself.

BUT

John Donne: 0


Any chance of an overview of the algorithm you're using to filter out the text?

My thinking is you are measuring word count versus commonly used marketing or political jargon count, but that's probably too simple.


Here's a few things I've gleamed from experimenting with it:

- It uses a unigram language model. You can take the same text, randomly permute the words, and you get the same score. This means it also can't be using things like POS tagging, phrases, etc.

- It normalizes words by making all letters lowercase. The exact same text in all upper case has the same score.

- The score is eventually normalized by the length of the text. The same text copied multiple times gets the same score.

- It does not form a valid probability distribution, as someone's managed to get some 1.16's. This makes me believe it's not a Naive Bayes classifier giving you the P(Bullshit|Text). Though this is what I originally thought it would be.


It is that simple. Looks like they assign a BS level to words, and then take some sort of average bullshit level amongst all words. Word order doesn't matter as slashcom says.

For example, if you take the score from the Oracle Pricing blurb posted by BitMistro, and change 'strategies' to 'goals', you drop down to 0.8 or so. If you add an extra random 'strategy' somewhere, it bumps up to 1.4 or something.

I actually suspect a bug on their pair for strategies... probably a decimal error when building to BS level tables.

But similar things happen with other 'bullshity' words, just to a lesser degree.


It should also be noted that on about 400 short texts (~300 words each), it did not correlate with the Flesh-Kinaid readability measure at all. So it's not measuring something like average word length or syllable counts.

But it's not QUITE a true lexicon, as it handles Out-Of-Vocabulary words quite strangely. If you use as input text:

"PR-Experts, politicians, ad writers or scientists need to be strong here! BlaBlaMeter unmasks without mercy how much bullshit hides in any text. A useful tool for everyone involved in writing! Simply copy your text into the white field and check your writing style. It works with english text up to 15.000 characters (overhead will be cut off). For a meaningful result we recommend a minimum length of 5 sentences."

Then you get 0.16. If you replace the last word 'sentences' with 'strategy' you go up to 0.44. However, if you change the last word to 'sentstrategyences' you get 0.47. Try it: you can basically insert 'strategy' inside ANY word and really raise your score. Actually, if you just insert "strateg" anywhere inside the text, it goes up massively.

So I actually think it's just doing string search counts over a lexicon.


As yes, you're right. If you insert random 'izations' into your text, your BS meter goes up as well. It also has a hardon for 'activity'.


Most unusual...

"Politics are great, come buy our new, brand spanking awesome banana phone, apple, steve jobs, cripplingly epic banana phone. Just great phones, with bananas, no apples to be found here. Samsung can suck on our banana phone. Android is better than iOS."

"Your text: 251 characters, 43 words Bullshit Index :0.03 Your text shows no or marginal indications of 'bullshit'-English."


You wrote a lot of bullshit but your text looks pretty normal from a vocabulary standard. looking a bit more to the website you'll see that what it calls bullshit english is that pattern often used in scientific articles or law texts (and president speeches) where they seem to be saying a lot of really wow stuff but all you're really left in the end as a big "?" cause you couldn't get half of what the person said.


I'd guess so. Or maybe they compiled a corpus of bullshit text (scientific articles, PR and political texts), created a little statistical model and are using that to check the level of bullshit of your text.


Obamas AMA answers came out between 0.10 and 0.30! http://www.reddit.com/user/PresidentObama


Holy Cr*p - The President actually did an AMA. I cannot imagine our Prime Minister replying to an email.

Edit: oh its happening today - I thought it was old news and no-one had told me. Apologies for the exclamation marks now gone.


That's the first thing I've checked, I'm a little disappointed.


Nice, the abstract to my thesis gets a 0.5[1]

The actual content parts hover around 0.16 (although I only tried segments without any equations). It makes me wonder what sort of results you'd get if you plotted this BS metric vs. page in a full length book.

  1. "You probably want to sell something, or you're trying to 
     impress somebody.  It still may be an acceptable result
     for a scientific text."


President's Obama's AMA answers score 0.21 (Your text shows some indications of 'bullshit'-English, but is still within an acceptable range.)


I tried that too, along with Paul Ryan's speech (which got 0.14 - Your text shows only a few indications of 'bullshit'). Then, I tried the copy on my landing page and got a 0.25. It's shameful to realize that I'm a bigger bullshitter than two politicians.


If you brazenly lie using strong phrases without any weasel words, it'll come across as not being 'bullshit' grammatically. I don't think the software has a fact checker.


I used the Potsmodernism Generator (http://www.elsewhere.org/pomo/ )

And got scores of 0.35-0.45


It would be much better if it could highlight bullshit sections.


It would be even better if it replaced the bullshit sections with "bla bla".


Random articles:

  TechCrunch: 0.42 ("Amazon Wants Everyone To Know The Kindle Fire Is Sold Out")
  Paul Ryan's RNC speech: 0.14
  NYTimes: 0.11 ("Storm’s Winds Slow as It Exits Southern Louisiana")
Obviously its an apples and oranges comparison here but interesting (to me anyway) nonetheless


My longest FAQ for HN

http://news.ycombinator.com/item?id=4270768

got

"Your text: 8044 characters, 1160 words Bullshit Index :0.17 Your text shows only a few indications of 'bullshit'-English."

I too would like to know what the model is for the online ratings. Has there been a validation study of the model?

AFTER EDIT: Paul Graham's essay "Why Nerds Are Unpopular"

http://www.paulgraham.com/nerds.html

(which is the first writing of his that I ever read) appears to reach the maximum length (by character count) that the program will evaluate, and comes out like this:

"Your text: 15000 characters, 2698 words Bullshit Index :0.09 Your text shows no or marginal indications of 'bullshit'-English."

Maybe Paul's procedure of having friends look over his essays and give suggestions helps cut out the bullshit.


Just for fun I ran quite a few Martin Fowler's articles through the BlaBlaMeter. On some of them, the "Bullshit Index" is unexpectedly large:

http://martinfowler.com/bliki/SnowflakeServer.html

> Bullshit Index: 0.31 – Your text shows indications of 'bullshit'-English. It's still ok for PR or advertising purposes, but more critical audiences may be skeptical.

http://martinfowler.com/bliki/PhoenixServer.html

> Bullshit Index: 0.4 – Something's getting a bit fishy. You probably want to sell something, or you're trying to impress somebody. It still may be an acceptable result for a scientific text.


I pasted in the Time Cube website and only got a 0.12.


It doesn't detect crazy; maybe it scores higher in one of the other corners of the cube.


"I am a beautiful flower. I sway in the breeze of my meadow, drinking in the sunshine. I love the little bees who come to visit me."

---

Your text: 130 characters, 28 words Bullshit Index :0.05 Your text shows no or marginal indications of 'bullshit'-English.


I was simply demonstrating, as everyone else in this thread is, that the tool doesn't match one's intution of what is bullshit.


"I am a beautiful flower. I sway in the breeze of my meadow. I drink in the sunshine. I love the little bees who come to visit me."

---

Your text: 129 characters, 29 words Bullshit Index :0 Your text shows no or marginal indications of 'bullshit'-English.

BlaBlaMeter prefers an extra word over a comma. Makes sense given the goal here.


Maybe this plays a role:

> For a meaningful result we recommend a minimum length of 5 sentences.


It would be nice, if http://www.blablameter.de/fragen_und_antworten.html is also available in English.

The result is that non-German speakers are standing in the dark, wondering what methodology is behind that software.

It might be even difficult to translate the concept to English, as some terms and ideas like http://de.wikipedia.org/wiki/Nominalstil are missing in the (restricted) English language. German is so much more elaborated, its a language of philosophers ;-)


* What's a good BlaBla Meter result?

High-quality journalistic texts are usually in the range of 0.1 and 0.3

* Are there any texts with an indexof 0.0?

This is rare, but occurs. Too low of an index is rather suspicious though, and might also indicate stilistical deficits.

* What is the highest index value?

Thile the BlaBla Meter was designed for index values between 0 and 1, but in rare cases the scale can be exceeded. The highest measured values so far have been over 2.0 even.

* Does Google use a similar algorithm?

We don't know, but if we were Google, we'd probably try! Our samples for highly competitive search terms show that highly ranked sites often show good index values.

* How does BlaBla Meter work?

BlaBla meter checks texts for various linguistic traits, e.g. it checks for exceeding use of Nominalstil[1]. In addition, the text is checked for various phrases [buzzwords] with a certain weighting. We don't want to reveal the secrets though ;)

[1] Heavy use of "nominal style" is a German particularity. You can replace most verbs by nouns, adding suffixes like '-ion' or '-age', and transform the sentence into passive. This is often found in legal texts, where the author doesn't want to appear as a suject.

* Why does my scientific text have such a high index?

Nominalstil has crept into scientific language at large. This often serves to 'beat around the bush' - in this regard, there are analogies to typical PR tongue.

* My mindless text gets a good score, why?

BlaBla Meter can't really understand the topic, it looks after linguistic traits. While rainbow press can be attributed all kinds of substantial weaknesses, their language is usually short and concise.

* My text is wrongly devalued!

As every computer algorithm, BlaBla Meter might be at fault - in doubt, human beings must decide of course. Generally the hit rate is quite large, so you should usually tweak the wording upon receiving high indices.

* But humans read texts completely differently ...

When BlaBla Meter crys havoc, one can usually assume that a human reader is alerted too. People are trained every day to tell authentic messages from artifical advertising messages - it'd be naive to belive they couldn't!

* What happens with entered texts?

Texts are private matter of course, and are neither saved nor processed in any way or shape.


One of my blog posts got a .16 so you have my seal of approval ;-)


I am not involved with them by any means.

Yeah, personally I also score around .16 to .18 on my internet texts. Long papers get around 0.2, so I'm probably losing momentum after the fourth page :-/


"We use innovative cloud-based patent-pending technologies to leverage synergies between verticals. In order to facilitate the maximum return on investment, we hyperfocus on execution and delivery. We find our clients often fail to think outside the box, but with our proprietary technologies we help to identify action items that directly impact the bottom line."

Bullshit Index: 0.46 Something's getting a bit fishy. You probably want to sell something, or you're trying to impress somebody. It still may be an acceptable result for a scientific text.

Pretty good figuring it's under the 5 sentence recommendation.


It would be nice if they gave even the smallest hint of what they're measuring.


The text of the site itself scores a surprisingly low 0.14: "Your text shows only a few indications of 'bullshit'-English."

This is not a useful tool because it does not define or point to the problems in the text.


I wonder if it's possible to get it to higher than 0.5. I just tried it with some patent gibberish and it only made that score. --

"This project is a blue sky implementation of a cloud based virtual paradigm which really shifts the boundries of our connected world.

Using patent pending technology to accelerate your business, we use full-stack web-scale systems built on noSQL databases to drive clicks.

And your security is safe with us. We use military grade encryption to protect all your data at rest."


Top story 'Amit on Grids' as I write this scored a .69 on the first paragraph.


I tried past cover letters for jobs and they all scored in the 0.5-0.6 range. This makes sense since I AM trying to sell myself in an honest but tailored way. Does this reveal a flaw in my writing style or in the purpose of cover letters in the hiring process? Is it even possible for a cover letter to score low on a bullshit scale?


Just tried to same with a recent cover letter, scoring 0.44. I wouldn't worry, although you (and me) can always try to bring it to a more down-to-earth level, making text easier and more pleasant to read. Unfortunately, a job application isn't the right domain for some scientific analysis on correlation between bullshit-measurements and interview invitations, if you're really interested in the job :-)


Hilarious! I plugged in some paragraphs from press releases and it was spot on. I also appreciated the "or Scientist" part ... simply awesome!

"Bullshit Index :0.64 This reeks. I bet you're a PR-Expert, Politician, Consultant or Scientist. If there is a message, it's unlikely it will reach anyone."


Scientist?

I put in a page or so from a scientific paper I recently wrote and got 0.19.


Heh .. try putting in the first para from a scientific paper. I bet 10 karma points that this will have higher BS content than the rest. Similarly, the last para of the intro is also almost always content free.

Note: I'm a scientist too ... good to poke fun at ourselves, eh? :)


You have the causality backwards.


It would be neat if someone integrated algorithms like this into discussion forum software.


Joel on Software's "Software Inventory"[1] gets a 0.22 rating. I'm a big fan of Joel on Software and am curious as to how this model evaluates "bullshit".

1. http://www.joelonsoftware.com/items/2012/07/09.html


Tried Churchill's "we shall fight them on the beaches" speech; got 0.08 BS index; I think it works :)


I just tried some text from one of our corporate announcements. Seems accurate.

0.8 This reeks. We bet you're a PR-Expert, Politician, Consultant or Scientist. If there is a message, it's unlikely it will reach anyone. Maybe you should spend less effort on trying to impress somebody.


I copied in a Daily Mail article and it only gave me 0.12. I think your algorithm needs work...


Tried PG's Writing and speaking article. 0.1 bullshit index, which is almost zero :)


How did Reddit's AMA do? NOT BAD!

http://news.ycombinator.com/item?id=4451328 - full text http://news.ycombinator.com/item?id=4451586 - summary text

  4. Small-Business tax breaks - [major bullshit] 0.5 [a]
  1. Space Race - [PR level bullshit] 0.33
  7. Money in Politics - [Some bullshit] 0.28
  2. Internet Freedom - [some bullshit] 0.23
  10. Work/Life Balance - [little bullshit] 0.16
  9. Jobs / recent grads - [little bullshit] 0.16
  6. Most Difficult Decision/afghanistan - [No Bullshit] 0.07

  [Un-analyzable]

  3. Favorite Basketball Player - too short
  8. White House Beer Recipe - too short
  5. First Activity on Nov 7 - too short

 [a]"Are you sure that you have a real message, and if so:
     who would understand it?"


I tried this on Paul Ryan's RNC address from last night, Barack Obama's innaugarel address, and a recent IEEE TKDE journal article. It gave bullshit measures of 0.15, 0.13 and 0.25, respectively. :-(


My blog posts (http://sveder.com/blog) are between .18-.22 so I feel relieved. Great idea, and like everyone else I'd like to know where I fail too.


I tried with Paul Ryan's speech at the RNC. Here's what it said:

Your text: 15000 characters, 2645 words Bullshit Index :0.13 Your text shows only a few indications of 'bullshit'-English.

Does that mean the algorithms work?


Just thought I'd share:

    Your text: 2418 characters, 451 words
    Bullshit Index :0.05
    Your text shows no or marginal indications of
        'bullshit'-English.




Good companion to Paul Ford's Passivator: http://www.ftrain.com/ThePassivator.html


I think that this works by calculating something called the Gunning Fog index , ie the no. Of words in a sentence with more Syllables than usual .


Interesting to look at Privacy Policies in here. Google's policy had a slightly more bullshit leaning side than the average document.


Whatever algorithm it uses thinks George W Bush and Sarah Palin lace their speeches with less bullshit than Winston Churchill.


"english text up to 15.000 characters (overhead will be cut off)", did I read it as 15 characters ? Bullshit


Meta index: the page itself has 0.14 bulshit index...


I'm writing a story; its a 0.12, acceptable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: