Monday, April 10, 2023

Meta unveils image-analyzing AI


By Saundra Latham, Editor at LinkedIn News

Meta continues to shift its focus from the metaverse to artificial intelligence. The Facebook parent on Wednesday released a new AI model called "Segment Anything" that can identify and isolate individual objects within an image, such as a fish in an aquarium. 

The tool could be useful not just in photo editing and content creation, Meta says, but in fields as diverse as augmented reality and Earth studies. Meanwhile, Meta plans to unveil AI-powered tools later this year that will help companies create ads, Chief Technology Officer Andrew Bosworth tells Nikkei Asia. 

Key executives including CEO Mark Zuckerberg are now spending most of their time working on Meta's AI efforts, Bosworth says.


Contribution: Owais Orakzai

Exciting News! Meta has recently launched Segment Anything, an advanced AI model that can instantly "cut out" any object in any image or video. This model is promptable, enabling it to adapt to new image distributions and tasks without requiring additional training.

The Segment Anything Model (SAM) is an AI model designed for promptable segmentation. It has three components: an image encoder, a flexible prompt encoder, and a fast mask decoder.

- Image encoder: The SAM uses a pre-trained Vision Transformer (ViT) adapted to process high-resolution images. This encoder is applied once per image and can be used prior to prompting the model.

- Prompt encoder: The SAM considers two sets of prompts: sparse (points, boxes, text) and dense (masks). Points and boxes are represented by positional encodings and learned embeddings, while free-form text is encoded using an off-the-shelf text encoder. Dense prompts, such as masks, are embedded using convolutions and summed element-wise with the image embedding.

- Mask decoder: The mask decoder maps the image embedding, prompt embeddings, and an output token to a mask. It employs a modification of a Transformer decoder block followed by a dynamic mask prediction head. The model predicts multiple output masks for a single prompt to address ambiguity.

Source: LinkedIN

No comments: