AIã«ãããããœã³ã³æäœã®èªååãæåç·ãAnthropicãäžæ©ãªãŒãããã€ã¯ããœãããã°ãŒã°ã«ã泚ç®
INDEX
AIãšãŒãžã§ã³ãã®ããã³ãã£ã¢ãããœã³ã³æäœãAnthropicãäžæ©ãªãŒã
çæAIã®æŽ»çšãæ¡å€§ããäžãAIãšãŒãžã§ã³ãã®æ°ããªéçºããã³ãã£ã¢ãšããŠãããœã³ã³æäœã®èªååããæ³šç®ãéããŠããããã®åéã§äžæ©ãªãŒãããã®ããOpenAIã®æå€§ã®ã©ã€ãã«ãšç®ãããAnthropicã ã
Anthropicã¯2024幎10æ22æ¥ãå瀟ã®AIã¢ãã«ãClaude 3.5 Sonnetãã®ã¢ããã°ã¬ãŒãçãçºè¡šããããšåæã«ã人éã®ããã«ããœã³ã³ãæäœã§ããæ©èœãComputer UseïŒãããªãã¯ããŒã¿çïŒããå ¬éããã
ãã®æ©èœã«ãããAIãšãŒãžã§ã³ãã¯ããœã³ã³ã®ã¹ã¯ãªãŒã³ã·ã§ãããéããŠç»é¢ããèŠãŠãçè§£ããããŠã¹æäœãããŒããŒãå ¥åãè¡ãããšãã§ããããã«ãªããããšãã°ãã¹ãã¬ããã·ãŒããéããŠããŒã¿ãåæããããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ãäœæãããã顧客æ å ±ã·ã¹ãã ïŒCRMïŒãæäœããŠæ å ±ãæŽæ°ããããšãã£ãäœæ¥ãå¯èœã«ãªãã
ãã§ã«GitLabãCanvaãReplitãªã©ã®äŒæ¥ãããã®æ°æ©èœã®æŽ»çšãéå§ãããšãã°ã³ãŒãã£ã³ã°ãã©ãããã©ãŒã ã®Replitã¯ãã¢ããªã±ãŒã·ã§ã³éçºã«ããããã¹ãã®èªååã«ãã®æ©èœã掻çšããŠãããšããããœãããŠã§ã¢éçºã¯ããã¹ãããã»ã¹ãããã«ããã¯ã«ãªãå Žåãå€ããéçºã¹ã±ãžã¥ãŒã«ã®é å»¶èŠå ã«ãªã£ãŠããããã¹ãããã»ã¹ã®èªååãããŸãããã°ãéçºã³ã¹ããå€§å¹ ã«åæžã§ããèŠèŸŒã¿ã ã

https://www.youtube.com/watch?v=ODaHJzOyVCQ
Anthropicã«ãããšããã®æ°æ©èœã¯ãç¹å®ã®ã¯ãŒã¯ãããŒããœãããŠã§ã¢ã«éå®ããããæ§ã ãªã¢ããªã±ãŒã·ã§ã³ã«å¯Ÿå¿ã§ããæè»æ§ãåããŠããç¹ã§ãåŸæ¥ã®èªååããŒã«ãšã¯äžç·ãç»ããããšãã°ãååŒå ã®æ å ±ãå ¥åãããã©ãŒã ã宿ãããéãå¿ èŠãªæ å ±ãã¹ãã¬ããã·ãŒãã«ãªãå Žåãèªåçã«CRMã·ã¹ãã ã«ç§»åããŠããŒã¿ãååŸãããã©ãŒã ã«å ¥åããããšãã§ããã
ãã ããçŸæç¹ã§ã¯ã¹ã¯ããŒã«ããºãŒã ãšãã£ã人éã«ãšã£ãŠå®¹æãªæäœãAIã«ãšã£ãŠã¯èª²é¡ãšãªã£ãŠããããã®ãããAnthropicã¯ãªã¹ã¯ã®äœãã¿ã¹ã¯ããéå§ããããšãæšå¥šãã¹ãã ã誀æ å ±ãäžæ£è¡çºãªã©ã®è åšã«å¯Ÿããæ°ããªçµè·¯ãšãªãå¯èœæ§ãèžãŸããå®å šæ§ãåªå ããã¢ãããŒãã«ããéçºãé²ããæ¹éã ã
ãã®åéã®éçºååã¯ãOSWorldãšãããã³ãããŒã¯ïŒãªãŒããŒããŒãã§ç¢ºèªããããšãã§ããããã®ãã³ãããŒã¯ã¯ãAIã¢ãã«ã®ããœã³ã³æäœèœåãè©äŸ¡ãããã¹ãã2024幎11æ26æ¥æç¹ã§ã¯ãAnthropicã®Claude 3.5 Sonnetãã2äœã®ã¢ãã«ïŒ17.04ïŒ ïŒã«5ãã€ã³ãã®å·®ãã€ãã22ïŒ ã§éŠäœãèµ°ãã
ãã€ã¯ããœãããAIç»é¢æäœã§ç°ãªãã¢ãããŒã
ãã€ã¯ããœãããAIãšãŒãžã§ã³ãã«ããããœã³ã³æäœåéã§åãçµã¿ãé²ããŠããã
å瀟ã¯2024幎10æãã¹ã¯ãªãŒã³ã·ã§ãããAIãšãŒãžã§ã³ããçè§£ãããã圢åŒã«å€æãããOmniParserãããªãŒãã³ãœãŒã¹ãšããŠå ¬éããããã®ã¢ãã«ã¯ãAIéçºãã©ãããã©ãŒã Hugging Faceã§æãæ³šç®ãéããã¢ãã«ã«æ¥æµ®äžãHugging Faceã®å ±å嵿¥è å ŒCEOã®ã¯ã¬ã ã»ãã©ã³ãžã¥æ°ã«ãããšããšãŒãžã§ã³ãé¢é£ã®ã¢ãã«ãšããŠã¯åã®å¿«æã«ãªããšããã
OmniParserã®ç¹åŸŽã¯ã3ã€ã®ç°ãªãAIã¢ãã«ãçµã¿åãããã¢ãããŒãã«ãããç»åèªèã¢ãã«ãYOLOv8ãããã¿ã³ããªã³ã¯ãªã©ã®æäœå¯èœãªèŠçŽ ãæ€åºãããã®åº§æšæ å ±ãæäŸã次ã«ããã«ãã¢ãŒãã«ã¢ãã«ãBLIP-2ããæ€åºãããèŠçŽ ã®ç®çãåæããããšãã°ç¹å®ã®ã¢ã€ã³ã³ããéä¿¡ããã¿ã³ãªã®ããããã²ãŒã·ã§ã³ããªã³ã¯ãªã®ãã倿ããããããŠGPT-4VããYOLOv8ãšBLIP-2ããåŸãããããŒã¿ãåºã«ããã¿ã³ã®ã¯ãªãã¯ããã©ãŒã ã®å ¥åãšãã£ãã¿ã¹ã¯ãå®è¡ãããããã«ãOCRïŒå åŠæåèªèïŒã¢ãžã¥ãŒã«ãGUIèŠçŽ åšèŸºã®ããã¹ããæœåºããããšã§ãæèçè§£ãå©ããŠããã

https://microsoft.github.io/OmniParser/
OmniParserã¯ãªãŒãã³ãœãŒã¹ãšããŠå ¬éãããŠããããGPT-4V以å€ã«ãããã€ã¯ããœããã®Phi-3.5-Vãã¡ã¿ã®Llama-3.2-Vãªã©ãããŸããŸãªããžã§ã³èšèªã¢ãã«ãšé£æºã§ããæè»æ§ãæã€ç¹ã匷ã¿ã®1ã€ãšãªãã
ãã ããOmniParserã«ã課é¡ã¯æ®ãããŠãããããšãã°ãåãããŒãžå ã«è€æ°ååšãããéä¿¡ããã¿ã³ã®åºå¥ãé£ãããç¹ã«ç°ãªãç®çã§äœ¿çšãããé¡äŒŒã®ãã¿ã³ã®èå¥ã«èŠå¿ããŠããç¶æ³ã ããŸããOCRã³ã³ããŒãã³ãã«ãããŠããããã¹ããéãªãåãå Žåã®èªè粟床ã«åé¡ããããã¯ãªãã¯äœçœ®ã®äºæž¬ãäžæ£ç¢ºã«ãªãããšããããšããã
ãªãŒãã³ãœãŒã¹ãšããç¹æ§ã«ãããä»åŸã¯å€ãã®éçºè ãã³ã³ããŒãã³ãã®åŸ®èª¿æŽãã€ã³ãµã€ãã®å ±æã«è²¢ç®ããããšãäºæ³ãããããã«äŒŽãã¢ãã«ã®èœåãé«ãŸãèŠèŸŒã¿ã ã
ã°ãŒã°ã«ãUIã«ç¹åããããžã§ã³èšèªã¢ãã«ãéçº
ãã®åéã§ã¯ãã°ãŒã°ã«ãã¢ããã«ãç ç©¶éçºãé²ããŠããããã®å Anthropicããã€ã¯ããœããã®ãããªãããã¯ããšããŠãªãªãŒã¹ãããå¯èœæ§ãããã
ããšãã°ãã°ãŒã°ã«ã2024幎3æã«çºè¡šãããScreenAIããæãããããããã¯ããœã³ã³ãã¢ãã€ã«ã®ãŠãŒã¶ãŒã€ã³ã¿ãŒãã§ãŒã¹ïŒUIïŒãã€ã³ãã©ã°ã©ãã£ãã¯ã«ç¹åããããžã§ã³èšèªã¢ãã«ã§ãUIã®ãã¿ã³ãå ¥åæ¬ã®äœçœ®ãææ¡ããã¯ãªãã¯ãªã©ã®ã¢ã¯ã·ã§ã³ã«ã€ãªããããšãã§ããã

https://research.google/blog/screenai-a-visual-language-model-for-ui-and-visually-situated-language-understanding/
ã°ãŒã°ã«ã«ãããšãUIãã€ã³ãã©ã°ã©ãã£ãã¯ã¹ã¯ã人éãšã³ã³ãã¥ãŒã¿ã®å¯Ÿè©±ã«ãããŠéèŠãªåœ¹å²ãæãããããã®è€éããšå€æ§ãªè¡šçŸåœ¢åŒã«ãããã¢ãã«åã¯å°é£ãªèª²é¡ãšãããŠãããScreenAIã¯ãç»åèªèã®åºæ¬èšèšãšããŠã°ãŒã°ã«ã®ãPaLIããšããæè¡ãæ¡çšãããã«ãç»åã®çžŠæšªæ¯ïŒã¢ã¹ãã¯ãæ¯ïŒã厩ãããšãªãåŠçã§ããç¬èªã®ç»ååæææ³ãåãå ¥ããããšã§ãã¹ããŒããã©ã³ã®çžŠé·ã®ç»é¢ããPCã®æšªé·ã®ç»é¢ãŸã§ãæ§ã ãªåœ¢ç¶ã®ç»é¢ã«å¯Ÿå¿ã§ããããã«ãªã£ãã
ScreenAIã¯50åãã©ã¡ãŒã¿ãšããæ¯èŒçå°èŠæš¡ãªã¢ãã«ã§ãããªãããåèŠæš¡ã®ã¢ãã«ãšæ¯èŒããŠãã£ãŒãèªã¿åãèœåãæž¬ãChart QAãããã¥ã¡ã³ãèªèèœåãæž¬ãDocVQAãã€ã³ãã©ã°ã©ãã£ãã¯èªèèœåãè©äŸ¡ããInfographicVQAãªã©ã®ãã³ãããŒã¯ãã¹ãã§é«ãããã©ãŒãã³ã¹ãå®çŸããŸãããŠã§ãã®æ§é èªèèœåãæž¬ãWebSRCãMoTIFãªã©ã®UIããŒã¹ã®ã¿ã¹ã¯ã§ãè¯å¥œãªçµæã瀺ãããšããã
ScreenAIã®éçºã¯ãäºååŠç¿ãšåŸ®èª¿æŽãšãã2段éã§é²ããããã第1段éã®äºååŠç¿ã§ã¯ãAIãèªãåŠç¿ãããèªå·±æåž«ããåŠç¿ããçšããŠç»åèªèã¢ãã«ïŒViTïŒãšèšèªã¢ãã«ã®èšç·ŽããŒã¿ãèªåçã«çæã第2段éã®åŸ®èª¿æŽã§ã¯ã人éãçŽæ¥ç¢ºèªã»è©äŸ¡ããããŒã¿ã䜿çšããŠã¢ãã«ã®ç²ŸåºŠãé«ããäœæ¥ã宿œãããã
äºååŠç¿ããŒã¿ã»ããã®äœæã«ããã£ãŠã¯ããã¹ã¯ããããã¢ãã€ã«ãã¿ãã¬ãããªã©ãæ§ã ãªããã€ã¹ã®ã¹ã¯ãªãŒã³ã·ã§ãããåéãDETRïŒç©äœæ€åºïŒã¢ãã«ãããŒã¹ãšããã¬ã€ã¢ãŠãã¢ãããŒã¿ã䜿çšããŠãç»åããã¯ãã°ã©ã ããã¿ã³ãããã¹ããªã©ã®UIèŠçŽ ãšãã®ç©ºéçé¢ä¿ãç¹å®ã»ã©ãã«ä»ãããããŸããã¢ã€ã³ã³åé¡åšãçšããŠ77çš®é¡ã®ã¢ã€ã³ã³ã¿ã€ããåºå¥ããæªåé¡ã®ã¢ã€ã³ã³ãã€ã³ãã©ã°ã©ãã£ãã¯ã¹ãç»åã«å¯ŸããŠã¯PaLIç»åãã£ãã·ã§ã³ã¢ãã«ã䜿çšããŠèª¬æãçæãããšããã
ãã ããã°ãŒã°ã«ã¯çŸæç¹ã§ãScreenAIãå€§èŠæš¡ã¢ãã«ã«æ¯ã¹ãŠæ§èœé¢ã§å£ãããšãèªããŠããããã®ã®ã£ãããåããããã«ã¯ãããªãç ç©¶ãå¿ èŠã ãšããŠããã
泚ç®ãããããœã³ã³æäœã®èªååãšããæ°ããªããã³ãã£ã¢ãå瀟ã®éçºç«¶äºã®æ¿åã¯é¿ããããªãããã ã
æïŒçŽ°è°·å ïŒLivitïŒ