Advertise here with Carbon Ads

This site is made possible by member support. ❀️

Big thanks to Arcustech for hosting the site and offering amazing tech support.

When you buy through links on kottke.org, I may earn an affiliate commission. Thanks for supporting the site!

kottke.org. home of fine hypertext products since 1998.

πŸ”  πŸ’€  πŸ“Έ  😭  πŸ•³οΈ  🀠  🎬  πŸ₯”

kottke.org posts about artificial intelligence

An AI Bourdain Speaks From the Grave

I have been trying not to read too much about Morgan Neville’s documentary Roadrunner: A Film About Anthony Bourdain before I have had a chance to watch it, but the few things I have read about it have given me some pause. From Helen Rosner’s piece about the film drawn from an interview with Neville:

There is a moment at the end of the film’s second act when the artist David Choe, a friend of Bourdain’s, is reading aloud an e-mail Bourdain had sent him: “Dude, this is a crazy thing to ask, but I’m curious” Choe begins reading, and then the voice fades into Bourdain’s own: “…and my life is sort of shit now. You are successful, and I am successful, and I’m wondering: Are you happy?” I asked Neville how on earth he’d found an audio recording of Bourdain reading his own e-mail. Throughout the film, Neville and his team used stitched-together clips of Bourdain’s narration pulled from TV, radio, podcasts, and audiobooks. “But there were three quotes there I wanted his voice for that there were no recordings of,” Neville explained. So he got in touch with a software company, gave it about a dozen hours of recordings, and, he said, “I created an A.I. model of his voice.” In a world of computer simulations and deepfakes, a dead man’s voice speaking his own words of despair is hardly the most dystopian application of the technology. But the seamlessness of the effect is eerie. “If you watch the film, other than that line you mentioned, you probably don’t know what the other lines are that were spoken by the A.I., and you’re not going to know,” Neville said. “We can have a documentary-ethics panel about it later.”

Per this GQ story, Neville got permission from Bourdain’s estate:

We fed more than ten hours of Tony’s voice into an AI model. The bigger the quantity, the better the result. We worked with four companies before settling on the best. We also had to figure out the best tone of Tony’s voice: His speaking voice versus his “narrator” voice, which itself changed dramatically of over the years. The narrator voice got very performative and sing-songy in the No Reservation years. I checked, you know, with his widow and his literary executor, just to make sure people were cool with that. And they were like, Tony would have been cool with that. I wasn’t putting words into his mouth. I was just trying to make them come alive.

As a post hoc ethics panel of one, I’m gonna say this doesn’t appeal to me, but I bet this sort of thing becomes common practice in the years to come, much like Errol Morris’s use of reenactment in The Thin Blue Line. A longer and more nuanced treatment of the issue can be found in Justin Hendrix’s interview of Sam Gregory, who is an “expert on synthetic media and ethics”.

There’s a set of norms that people are grappling with in regard to this statement from the director of the Bourdain documentary. They’re asking questions around consent, right? Who consents to someone taking your voice and using it? In this case, the voiceover of a private email. And what if that was something that, if the person was alive, they might not have wanted. You’ve seen that commentary online, and people saying, “This is the last thing Anthony Bourdain would have wanted for someone to do this with his voice.” So the consent issue is one of the things that is bubbling here. The second is a disclosure issue, which is, when do you know that something’s been manipulated? And again, here in this example, the director is saying, I didn’t tell people that I had created this voice saying the words and I perhaps would have not told people unless it had come up in the interview. So these are bubbling away here, these issues of consent and disclosure.

Update: From Anthony’s ex-wife Ottavia Bourdain about the statement that “Tony would have been cool with that”:

I certainly was NOT the one who said Tony would have been cool with that.

(via @drawnonglass)


A History of Regular Expressions and Artificial Intelligence

regex-example.png

I have an unusually good memory, especially for symbols, words, and text, but since I don’t use regular expressions (ahem) regularly, they’re one of those parts of computer programming and HTML/EPUB editing that I find myself relearning over and over each time I need it. How did something this arcane but powerful even get started? Naturally, its creators were trying to discover (or model) artificial intelligence.

That’s the crux of this short history of “regex” by Buzz Andersen over at “Why is this interesting?”

The term itself originated with mathematician Stephen Kleene. In 1943, neuroscientist Warren McCulloch and logician Walter Pitts had just described the first mathematical model of an artificial neuron, and Kleene, who specialized in theories of computation, wanted to investigate what networks of these artificial neurons could, well, theoretically compute.

In a 1951 paper for the RAND Corporation, Kleene reasoned about the types of patterns neural networks were able to detect by applying them to very simple toy languagesβ€”so-called “regular languages.” For example: given a language whose “grammar” allows only the letters “A” and “B”, is there a neural network that can detect whether an arbitrary string of letters is valid within the “A/B” grammar or not? Kleene developed an algebraic notation for encapsulating these “regular grammars” (for example, a*b* in the case of our “A/B” language), and the regular expression was born.

Kleene’s work was later expanded upon by such luminaries as linguist Noam Chomsky and AI researcher Marvin Minsky, who formally established the relationship between regular expressions, neural networks, and a class of theoretical computing abstraction called “finite state machines.”

This whole line of inquiry soon falls apart, for reasons both structural and interpersonal: Pitts, McCullough, and Jerome Lettvin (another early AI researcher) have a big falling out with Norbert Wiener (of cybernetics fame), Minsky writes a book (Perceptrons) that throws cold water on the whole simple neural network as model of the human mind thing, and Pitts drinks himself to death. Minsky later gets mixed up with Jeffrey Epstein’s philanthropy/sex trafficking ring. The world of early theoretical AI is just weird.

But! Ken Thompson, one of the creators of UNIX at Bell Labs comes along and starts using regexes for text editor searches in 1968. And renewed takes on neural networks come along in the 21st century that give some of that older research new life for machine learning and other algorithms. So, until Skynet/global warming kills us all, it all kind of works out? At least, intellectually speaking.

(Via Jim Ray)


App Helps You Build New Creations from Your Existing Lego Pile

screenshots of the Brickit app

A new iOS app called Brickit has been developed to breathe new life into your old Lego pile. Just dump your bricks out into a pile and the app will analyze what Lego bricks you have, what new creations you can build with them, and provide you with detailed build instructions. It can even guide you to find individual pieces in the pile. View a short demo β€” I’m assuming they’re using some sort of AI/machine learning to do this?

My kids have approximately a billion Legos at my house, so I downloaded Brickit to try it out. The process is a little slow and you need to do a little bit of pre-sorting (by taking out the big pieces and spreading your pile out evenly), but watching the app do its thing is kinda magical. When I have more time later, I’m definitely going to go back and try to build some of the ideas it found for me. (via @marcprecipice)


A Rembrandt Masterpiece Uncropped by AI

a full frame version of Rembrandt's The Night Watch painting

In 1715, a significant chunk of Rembrandt’s masterpiece The Night Watch, including a 2-foot-wide swath from the left side of the painting, was lopped off in order to fit the painting in a smaller space. (WTF?!) Using a contemporary copy of the full scene painted by Gerrit Lundens and an AI program for getting the colors and angles right, the Rijksmuseum has “restored” The Night Watch, augmenting the painting with digital printouts of the missing bits. The uncropped Rembrandt is shown above and here is Lundens’s version:

Gerrit Lundens' version of Rembrandt's The Night Watch painting

I’m not an expert on art, but the 1715 crop and the shift of the principal characters from right-of-center to the center appears to have radically altered the whole feel of the painting.

With the addition especially on the left and the bottom, an empty space is created in the painting where they march towards. When the painting was cut [the lieutenants] were in the centre, but Rembrandt intended them to be off-centre marching towards that empty space, and that is the genius that Rembrandt understands: you create movement, a dynamic of the troops marching towards the left of the painting.

(via @john_overholt)


Ted Chiang: Fears of Technology Are Fears of Capitalism

Writer Ted Chiang (author of the fantastic Exhalation) was recently a guest on the Ezra Klein Show. The conversation ranged widely β€” I enjoyed his thoughts on superheroes β€” but his comments on capitalism and technology seem particularly relevant right now. From the transcript:

I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too. Most of our fears or anxieties about technology are best understood as fears or anxiety about how capitalism will use technology against us. And technology and capitalism have been so closely intertwined that it’s hard to distinguish the two.

Let’s think about it this way. How much would we fear any technology, whether A.I. or some other technology, how much would you fear it if we lived in a world that was a lot like Denmark or if the entire world was run sort of on the principles of one of the Scandinavian countries? There’s universal health care. Everyone has child care, free college maybe. And maybe there’s some version of universal basic income there.

Now if the entire world operates according to β€” is run on those principles, how much do you worry about a new technology then? I think much, much less than we do now. Most of the things that we worry about under the mode of capitalism that the U.S practices, that is going to put people out of work, that is going to make people’s lives harder, because corporations will see it as a way to increase their profits and reduce their costs. It’s not intrinsic to that technology. It’s not that technology fundamentally is about putting people out of work.

It’s capitalism that wants to reduce costs and reduce costs by laying people off. It’s not that like all technology suddenly becomes benign in this world. But it’s like, in a world where we have really strong social safety nets, then you could maybe actually evaluate sort of the pros and cons of technology as a technology, as opposed to seeing it through how capitalism is going to use it against us. How are giant corporations going to use this to increase their profits at our expense?

And so, I feel like that is kind of the unexamined assumption in a lot of discussions about the inevitability of technological change and technologically-induced unemployment. Those are fundamentally about capitalism and the fact that we are sort of unable to question capitalism. We take it as an assumption that it will always exist and that we will never escape it. And that’s sort of the background radiation that we are all having to live with. But yeah, I’d like us to be able to separate an evaluation of the merits and drawbacks of technology from the framework of capitalism.

Echoing some of his other thoughts during the podcast, Chiang also wrote a piece for the New Yorker the other day about how the singularity will probably never come.


How Do Algorithms Become Biased?

In the latest episode of the Vox series Glad You Asked, host Joss Fong looks at how racial and other kinds of bias are introduced into massive computer systems and algorithms, particularly those that work through machine learning, that we use every day.

Many of us assume that tech is neutral, and we have turned to tech as a way to root out racism, sexism, or other “isms” plaguing human decision-making. But as data-driven systems become a bigger and bigger part of our lives, we also notice more and more when they fail, and, more importantly, that they don’t fail on everyone equally. Glad You Asked host Joss Fong wants to know: Why do we think tech is neutral? How do algorithms become biased? And how can we fix these algorithms before they cause harm?


Full? Self? Driving? Hmmm…

How is Tesla’s full-self driving system coming along? Perhaps not so good. YouTuber AI Addict took the company’s FSD Beta 8.2 for a drive through downtown Oakland recently and encountered all sorts of difficulties. The video’s chapter names should give you some idea: Crosses Solid Lines, Acting Drunk, Right Turn In Wrong Lane, Wrong Way!!!, Near Collision (1), and Near Collision (2). They did videos of drives in SF and San Jose as well.

I realize this is a beta, but it’s a beta being tested by consumers on actual public roads. While I’m sure it works great on their immaculate test track, when irregularities in your beta can easily result in the death or grave injury of a pedestrian, cyclist, or other motorist several times over the course of 30 minutes, how can you consider it safe to release to the public in any way? It seems like Level 5 autonomy is going to be difficult to manage under certain road conditions. (via @TaylorOgan)


BirdCast: Real-Time Bird Migration Forecasts

Birdcast

Colorado State University and the Cornell Lab of Ornithology have developed a system called BirdCast that uses machine learning & two decades of historical bird movement data to develop daily bird migration forecasts for the United States.

Bird migration forecasts show predicted nocturnal migration 3 hours after local sunset and are updated every 6 hours. These forecasts come from models trained on the last 23 years of bird movements in the atmosphere as detected by the US NEXRAD weather surveillance radar network. In these models we use the Global Forecasting System (GFS) to predict suitable conditions for migration occurring three hours after local sunset.

The map above is the migration forecast for tonight β€” overall, warmer temperatures and increased bird movement are predicted for the next week or two. They also maintain up-to-the hour records of migration activity detected by the US weather surveillance radar network; this was the activity early this morning at 3:10am ET:

Birdcast

If the current & predicted bird radar maps were a part of the weather report on the local news, I might start watching again.


GANksy - an AI Street Artist that Emulates Banksy

street art made by an AI

street art made by an AI

street art made by an AI

GANksy is an AI program trained on Banksy’s street art.

GANksy was born into the cloud in September 2020, then underwent a strenuous A.I. training regime using hundreds of street art photos for thousands of iterations to become the fully-formed artist we see today. All of GANksy’s works are original creations derived from its understanding of shape, form and texture. GANksy wants to be put into a robot body so it can spraypaint the entire planet.

The results are cool but not super coherent β€” these look more like abstract NIN and Radiohead album covers than the sly & whimsical works Banksy stencils up around the world. With GANksy, you get the feel of Banksy’s art and the surfaces he chooses to put it on but little of the meaning, which is about what you would expect from training using a neural network based on style.


AlphaGo - The Movie

I missed this back in March (I think there was a lot going on back then?) but the feature-length documentary AlphaGo is now available to stream for free on YouTube. The movie documents the development by DeepMind/Google of the AlphaGo computer program designed to play Go and the competition between AlphaGo and Lee Sedol, a Go master.

With more board configurations than there are atoms in the universe, the ancient Chinese game of Go has long been considered a grand challenge for artificial intelligence. On March 9, 2016, the worlds of Go and artificial intelligence collided in South Korea for an extraordinary best-of-five-game competition, coined The DeepMind Challenge Match. Hundreds of millions of people around the world watched as a legendary Go master took on an unproven AI challenger for the first time in history.

During the competition back in 2016, I wrote a post that rounded up some of the commentary about the matches.

Move after move was exchanged and it became apparent that Lee wasn’t gaining enough profit from his attack.

By move 32, it was unclear who was attacking whom, and by 48 Lee was desperately fending off White’s powerful counter-attack.

I can only speak for myself here, but as I watched the game unfold and the realization of what was happening dawned on me, I felt physically unwell.


The AI Who Mistook a Bald Head for a Soccer Ball

Second-tier Scottish football club Inverness Caledonian Thistle doesn’t have a camera operator for matches at their stadium so the club uses an AI-controlled camera that’s programmed to follow the ball for their broadcasts. But in a recent match against Ayr United, the AI controller kept moving the camera off the ball to focus on the bald head of the linesman, making the match all but unwatchable. No fans allowed in the stadium either, so the broadcast was the only way to watch.


“Reverse Toonification” of Pixar Characters

Using an AI-based framework called Pixel2Style2Pixel and searching for faces in a dataset harvested from Flickr, Nathan Shipley made some more photorealistic faces for Pixar characters.

reverse toonification of Pixar characters

reverse toonification of Pixar characters

reverse toonification of Pixar characters

In response to a reader suggestion, Shipley fed the generated image for Dash back into the system and this happened:

reverse toonification of Pixar characters

I cannot tell where these images should live in the uncanny valley. You can see some similar experiments from Shipley here: a more realistic version of Miles from Spider-Verse, images of Frida Kahlo and Diego Rivera “reverse engineered” from paintings, and an image generated from a Rembrandt self-portrait.


A.I. Claudius

Roman Emperors Photos

Roman Emperors Photos

Roman Emperors Photos

For his Roman Emperor Project, Daniel Voshart (whose day job includes making VR sets for Star Trek: Discovery) used a neural-net tool and images of 800 sculptures to create photorealistic portraits of every Roman emperor from 27 BCE to 285 ACE. From the introduction to the project:

Artistic interpretations are, by their nature, more art than science but I’ve made an effort to cross-reference their appearance (hair, eyes, ethnicity etc.) to historical texts and coinage. I’ve striven to age them according to the year of death β€” their appearance prior to any major illness.

My goal was not to romanticize emperors or make them seem heroic. In choosing bust / sculptures, my approach was to favor the bust that was made when the emperor was alive. Otherwise, I favored the bust made with the greatest craftsmanship and where the emperor was stereotypically uglier β€” my pet theory being that artists were likely trying to flatter their subjects.

Some emperors (latter dynasties, short reigns) did not have surviving busts. For this, I researched multiple coin depictions, family tree and birthplaces. Sometimes I created my own composites.

You can buy a print featuring the likenesses of all 54 emperors on Etsy.

See also Hand-Sculpted Archaeological Reconstructions of Ancient Faces and The Myth of Whiteness in Classical Sculpture.


Audio Deepfakes Result in Some Pretty Convincing Mashup Performances

Have you ever wanted to hear Jay Z rap the “To Be, Or Not To Be” soliloquy from Hamlet? You are in luck:

What about Bob Dylan singing Britney Spears’ “…Baby One More Time”? Here you go:

Bill Clinton reciting “Baby Got Back” by Sir Mix-A-Lot? Yep:

And I know you’re always wanted to hear six US Presidents rap NWA’s “Fuck Tha Police”. Voila:

This version with the backing track is even better. These audio deepfakes were created using AI:

The voices in this video were entirely computer-generated using a text-to-speech model trained on the speech patterns of Barack Obama, Ronald Reagan, John F. Kennedy, Franklin Roosevelt, Bill Clinton, and Donald Trump.

The program listens to a bunch of speech spoken by someone and then, in theory, you can provide any text you want and the virtual Obama or Jay Z can speak it. Some of these are more convincing than others β€” with a bit of manual tinkering, I bet you could clean these up enough to make them convincing.

Two of the videos featuring Jay Z’s synthesized voice were forced offline by a copyright claim from his record company but were reinstated. As Andy Baio notes, these deepfakes are legally interesting:

With these takedowns, Roc Nation is making two claims:

1. These videos are an infringing use of Jay-Z’s copyright.
2. The videos “unlawfully uses an AI to impersonate our client’s voice.”

But are either of these true? With a technology this new, we’re in untested legal waters.

The Vocal Synthesis audio clips were created by training a model with a large corpus of audio samples and text transcriptions. In this case, he fed Jay-Z songs and lyrics into Tacotron 2, a neural network architecture developed by Google.

It seems reasonable to assume that a model and audio generated from copyrighted audio recordings would be considered derivative works.

But is it copyright infringement? Like virtually everything in the world of copyright, it depends-on how it was used, and for what purpose.

Celebrity impressions by people are allowed, why not ones by machines? It’ll be interesting to see where this goes as the tech gets better.


Deepfake Video of Robert Downey Jr. and Tom Holland in Back to the Future

This deepfake video of Back to the Future that features Robert Downey Jr. & Tom Holland as Doc Brown & Marty McFly is so convincing that I almost want to see an actual remake with those actors. (Almost.)

They really should have deepfaked Zendaya into the video as Lorraine for the cherry on top. Here’s an earlier effort with Holland as Marty that’s not as good.


Billie Eilish Interviewed by AI Bot

Collaborating with the team at Conde Nast Entertainment and Vogue, my pal Nicole He trained an AI program to interview music superstar Billie Eilish. Here are a few of the questions:

Who consumed so much of your power in one go?
How much of the world is out of date?
Have you ever seen the ending?

This is a little bit brilliant. The questions are childlike in a way, like something a bright five-year-old would ask a grownup, perceptive and nonsensical (or even Dr. Seussical) at the same time. As He says:

What I really loved hearing Billie say was that human interviewers often ask the same questions over and over, and she appreciated that the AI questions don’t have an agenda in the same way, they’re not trying to get anything from her.

I wonder if there’s something that human interviewers can learn from AI-generated questions β€” maybe using them as a jumping off point for their own questions or asking more surprising or abstract questions or adapting the mentality of the childlike mind.

See also Watching Teen Superstar Billie Eilish Growing Up.


A Machine Dreams Up New Insect Species

Using a book of insect illustrations from the 1890s, Bernat Cuni used a variety of machine learning tools to generate a bunch of realistic-looking beetles that don’t actually exist in nature.

Prints are available.


A Deepfake Nixon Delivers Eulogy for the Apollo 11 Astronauts

When Neil Armstrong and Buzz Aldrin landed safely on the Moon in July 1969, President Richard Nixon called them from the White House during their moonwalk to say how proud he was of what they had accomplished. But in the event that Armstrong and Aldrin did not make it safely off the Moon’s surface, Nixon was prepared to give a very different sort of speech. The remarks were written by William Safire and recorded in a memo called In Event of Moon Disaster.

Fifty years ago, not even Stanley Kubrick could have faked the Moon landing. But today, visual effects and techniques driven by machine learning are so good that it might be relatively simple, at least the television broadcast part of it.1 In a short demonstration of that technical supremacy, a group from MIT has created a deepfake version of Nixon delivering that disaster speech. Here are a couple of clips from the deepfake speech:

Fate has ordained that the men who went to the moon to explore in peace will stay on the moon to rest in peace.

The full film is being shown at IDFA DocLab in Amsterdam and will make its way online sometime next year.

The implications of being able to so convincingly fake the televised appearance of a former US President are left as an exercise to the reader. (via boing boing)

Update: The whole film is now online. (thx, andy)

  1. But technology is often a two-way street. If the resolution of the broadcast is high enough, CGI probably still has tells…and AI definitely does. And even if you got the TV broadcast correct, with the availability of all sorts of high-tech equipment, the backyard astronomer, with the collective help of their web-connected compatriots around the world, would probably be able to easily sniff out whether actual spacecraft and communication signals were in transit to and from the Moon.↩


Can You Copyright Work Made by an Artificial Intelligence?

In a recent issue of Why is this interesting?, Noah Brier collects a number of perspectives on whether (and by whom) a work created by an artificial intelligence can be copyrighted.

But as I dug in a much bigger question emerged: Can you actually copyright work produced by AI? Traditionally, the law has been that only work created by people can receive copyright. You might remember the monkey selfie copyright claim from a few years back. In that case, a photographer gave his camera to a monkey who then snapped a selfie. The photographer then tried to claim ownership and PETA sued him to try to claim it back for the monkey. In the end, the photograph was judged to be in the public domain, since copyright requires human involvement. Machines, like monkeys, can’t own work, but clearly something made with the help of a human still qualifies for copyright. The question, then, is where do we draw the line?


Astrology and Wishful Thinking

In the Guardian, former astrologer Felicity Carter writes about how fortune telling really works and why she had to quit.

I also learned that intelligence and education do not protect against superstition. Many customers were stockbrokers, advertising executives or politicians, dealing with issues whose outcomes couldn’t be controlled. It’s uncertainty that drives people into woo, not stupidity, so I’m not surprised millennials are into astrology. They grew up with Harry Potter and graduated into a precarious economy, making them the ideal customers.

What broke the spell for me was, oddly, people swearing by my gift. Some repeat customers claimed I’d made very specific predictions, of a kind I never made. It dawned on me that my readings were a co-creation β€” I would weave a story and, later, the customer’s memory would add new elements. I got to test this theory after a friend raved about a reading she’d had, full of astonishingly accurate predictions. She had a tape of the session, so I asked her to play it.

The clairvoyant had said none of the things my friend claimed. Not a single one. My friend’s imagination had done all the work.

The last paragraph, on VC-funded astrology apps, was particularly interesting. I’m reading Yuval Noah Harari’s 21 Lessons for the 21st Century right now and one of his main points is that AI + biotech will combine to produce an unprecedented revolution in human society.

For we are now at the confluence of two immense revolutions. Biologists are deciphering the mysteries of the human body, and in particular of the brain and human feelings. At the same time computer scientists are giving us unprecedented data-processing power. When the biotech revolution merges with the infotech revolution, it will produce Big Data algorithms that can monitor and understand my feelings much better than I can, and then authority will probably shift from humans to computers. My illusion of free will is likely to disintegrate as I daily encounter institutions, corporations, and government agencies that understand and manipulate what was until now my inaccessible inner realm.

I hadn’t thought that astrology apps could be a major pathway to AI’s control of humanity, but Carter’s assertion makes sense.


Machine Hallucination

Machine Hallucination

After seeing some videos on my pal Jenni’s Instagram of Refik Anadol’s immersive display at ARTECHOUSE in NYC, it’s now at the top of my list of things to see the next time I’m in NYC.

Machine Hallucination, Anadol’s first large-scale installation in New York City is a mixed reality experiment deploying machine learning algorithms on a dataset of over 300 million images β€” representing a wide-ranging selection of architectural styles and movements β€” to reveal the hidden connections between these moments in architectural history. As the machine generates a data universe of architectural hallucinations in 1025 dimensions, we can begin to intuitively understand the ways that memory can be spatially experienced and the power of machine intelligence to both simultaneously access and augment our human senses.

Here’s a video of Anadol explaining his process and a little bit about Machine Hallucination. Check out some reviews at Designboom, Gothamist, and Art in America and watch some video of the installation here.


Pixar’s AI Spiders

As I mentioned in a post about my west coast roadtrip, one of the things I heard about during my visit to Pixar was their AI spiders. For Toy Story 4, the production team wanted to add some dusty ambiance to the antique store in the form of cobwebs.

Toy Story Cobwebs

Rather than having to painstakingly create the webs by hand as they’d done in the past, technical director Hosuk Chang created a swarm of AI spiders that could weave the webs just like a real spider would.

We actually saw the AI spiders in action and it was jaw-dropping to see something so simple, yet so technically amazing to create realistic backgrounds elements like cobwebs. The spiders appeared as red dots that would weave their way between two wood elements just like a real spider would.

All the animators had to do is tell the spiders where the cobwebs needed to be.

“He guided the spiders to where he wanted them to build cobwebs, and they’d do the job for us. And when you see those cobwebs overlaid on the rest of the scene, it gives the audience the sense that this place has been here for a while.” Without that program, animators would have had to make the webs one strand at a time, which would have taken several months. “You have to tell the spider where the connection points of the cobweb should go,” Jordan says, “but then it does the rest.”

Chang and his colleague David Luoh presented a paper about the spiders (and dust) at SIGGRAPH ‘19 in late July (which is unfortunately behind a paywall).


VFX Breakdown of Ctrl Shift Face’s Ultra-Realistic Deepfakes

Ctrl Shift Face created the popular deepfake videos of Bill Hader impersonating Arnold Schwarzenegger, Hader doing Tom Cruise, and Jim Carrey in The Shining. For their latest video, they edited Freddie Mercury’s face onto Rami Malek1 acting in a scene from Mr. Robot:

And for the first time, they shared a short visual effects breakdown of how these deepfakes are made:

Mercury/Malek says in the scene: “Even I’m not crazy enough to believe that distortion of reality.” Ctrl Shift Face is making it difficult to believe these deepfakes aren’t real.

  1. I had dinner next to Malek at the bar in a restaurant in the West Village a few months ago, pre-Oscar. I didn’t notice who it was when he sat down but as soon as he opened his mouth, I knew it was him β€” that unmistakable voice. Several people came by to say hello, buy him drinks, etc. and he and his friends were super gracious to everyone, staff included. I’ve added him to my list of actors who are actually nice alongside Tom Hanks and Keanu Reeves.↩


Photo Wake-Up

Photo Wake Up

Researchers at the University of Washington and Facebook have developed an algorithm that can “wake up” people depicted in still images (photos, drawings, paintings) and create 3D characters than can “walk out” of their images. Check out some examples and their methods here (full paper):

The AR implementation of their technique is especially impressive…a figure in a Picasso painting just comes alive and starts running around the room. (thx nick, who accurately notes the Young Sherlock Holmes vibe)


Deepfakes: Imagine All the People

Here is a video of Donald Trump, Vladimir Putin, Barack Obama, Kim Jong Un, and other world leaders lip-syncing along to John Lennon’s Imagine:

Of course this isn’t real. The video was done by a company called Canny AI, which offers services like “replace the dialogue in any footage” and “lip-sync your dubbed content in any language”. That’s cool and all β€” picture episodes of Game of Thrones or Fleabag where the actors automagically lip-sync along to dubbed French or Chinese β€” but this technique can also be used to easily create what are referred to as deepfakes, videos made using AI techniques in which people convincingly say and do things they actually did not do or say. Like this video of Mark Zuckerberg finally telling the truth about Facebook. Or this seriously weird Steve Buscemi / Jennifer Lawrence mashup:

Or Bill Hader’s face morphing into Arnold Schwarzenegger’s face every time he impersonates him:

What should we do about these kinds of videos? Social media sites have been removing some videos intended to mislead or confuse people, but notably Facebook has refused to take the Zuckerberg video down (as well as a slowed-down video of Nancy Pelosi in which she appears drunk). Congress is moving ahead with a hearing on deepfakes and the introduction of a related bill:

The draft bill, a product of several months of discussion with computer scientists, disinformation experts, and human rights advocates, will include three provisions. The first would require companies and researchers who create tools that can be used to make deepfakes to automatically add watermarks to forged creations.

The second would require social-media companies to build better manipulation detection directly into their platforms. Finally, the third provision would create sanctions, like fines or even jail time, to punish offenders for creating malicious deepfakes that harm individuals or threaten national security. In particular, it would attempt to introduce a new mechanism for legal recourse if people’s reputations are damaged by synthetic media.

I’m hopeful this bill will crack down on the malicious use of deepfakes and other manipulated videos but leave ample room for delightful art and culture hacking like the Hader/Schwarzenegger thing or one of my all-time favorite videos, a slowed-down Jeff Goldblum extolling the virtues of the internet in an Apple ad:

“Internet? I’d say internet!”

Update: Here’s another Bill Hader deepfake, with his impressions of Tom Cruise and Seth Rogen augmented by his face being replaced by theirs.


Pattern Radio: Whale Songs

The National Oceanic and Atmospheric Administration (NOAA) and Google have teamed up on a project to identify the songs of humpback whales from thousands of hours of audio using AI. The AI proved to be quite good at detecting whale sounds and the team has put the files online for people to listen to at Pattern Radio: Whale Songs. Here’s a video about the project:

You can literally browse through more than a year’s worth of underwater recordings as fast as you can swipe and scroll. You can zoom all the way in to see individual sounds β€” not only humpback calls, but ships, fish and even unknown noises. And you can zoom all the way out to see months of sound at a time. An AI heat map guides you to where the whale calls most likely are, while highlight bars help you see repetitions and patterns of the sounds within the songs.

The audio interface is cool β€” you can zoom in and out of the audio wave patterns to see the different rhythms of communication. I’ve had the audio playing in the background for the past hour while I’ve been working…very relaxing.


Teaching a Neural Network How to Drive a Car

In this video, you can watch a simple neural network learn how to navigate a video game race track. The program doesn’t know how to turn at first, but the car that got the furthest in the first race (out of 650 competitors) is then used as the seed for the next generation. The winning cars from each generation are used to seed the next race until a few of them make it all the way around the track in just the 4th generation.

I think one of the reason I find neural network training so fascinating is that you can observe, in a very simple and understandable way, the basic method by which all life on Earth evolved the ability to do things like move, see, swim, digest food, echolocate, grasp objects, and use tools. (via dunstan)


This AI Converts Quick Sketches to Photorealistic Landscapes

NVIDIA has been doing lots of interesting things with deep learning algorithms lately (like AI-Generated Human Faces That Look Amazingly Real). Their most recent effort is the development and training of a program that takes rough sketches and converts them into realistic images.

A novice painter might set brush to canvas aiming to create a stunning sunset landscape β€” craggy, snow-covered peaks reflected in a glassy lake β€” only to end up with something that looks more like a multi-colored inkblot.

But a deep learning model developed by NVIDIA Research can do just the opposite: it turns rough doodles into photorealistic masterpieces with breathtaking ease. The tool leverages generative adversarial networks, or GANs, to convert segmentation maps into lifelike images.

Here’s a post I did 10 years ago that shows how far sketch-to-photo technology has come.


AI Algorithm Can Detect Alzheimer’s Earlier Than Doctors

A machine learning algorithm programmed by Dr. Jae Ho Sohn can look at PET scans of human brains and spot indicators of Alzheimer’s disease with a high level of accuracy an average of 6 years before the patients would receive a final clinical diagnosis from a doctor.

To train the algorithm, Sohn fed it images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a massive public dataset of PET scans from patients who were eventually diagnosed with either Alzheimer’s disease, mild cognitive impairment or no disorder. Eventually, the algorithm began to learn on its own which features are important for predicting the diagnosis of Alzheimer’s disease and which are not.

Once the algorithm was trained on 1,921 scans, the scientists tested it on two novel datasets to evaluate its performance. The first were 188 images that came from the same ADNI database but had not been presented to the algorithm yet. The second was an entirely novel set of scans from 40 patients who had presented to the UCSF Memory and Aging Center with possible cognitive impairment.

The algorithm performed with flying colors. It correctly identified 92 percent of patients who developed Alzheimer’s disease in the first test set and 98 percent in the second test set. What’s more, it made these correct predictions on average 75.8 months β€” a little more than six years β€” before the patient received their final diagnosis.

This is the stuff where AI is going to be totally useful…provided the programs aren’t cheating somehow.


AI-Generated Human Faces That Look Amazingly Real

The opening line of Madeline Miller’s Circe is: “When I was born, the name for what I was did not exist.” In Miller’s telling of the mythological story, Circe was the daughter of a Titan and a sea nymph (a lesser deity born of two Titans). Yes, she was an immortal deity but lacked the powers and bearing of a god or a nymph, making her seem unnervingly human. Not knowing what to make of her and for their own safety, the Titans and Olympic gods agreed to banish her forever to an island.

Here’s a photograph of a woman who could also claim “when I was born, the name for what I was did not exist”:

AI Faces

The previous line contains two lies: this is not a photograph and that’s not a real person. It’s an image generated by an AI program developed by researchers at NVIDIA capable of borrowing styles from two actual photographs of real people to produce an infinite number of fake but human-like & photograph-like images.

AI Faces

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis.

The video offers a good look at how this works, with realistic facial features that you can change with a slider, like adjusting the volume on your stereo.

Photographs that aren’t photographs and people that aren’t people, born of a self-learning machine developed by humans. We’ll want to trust these images because they look so real, especially once they start moving and talking. I wonder…will we soon seek to banish them for our own safety as the gods banished Circe?

Update: This Person Does Not Exist is a single serving site that provides a new portrait of a non-existent person with each reload.