Follow

Anyone have an easy way to extract "AlternativeText" from embedded images in a Word document via Linux? Asking for a friend.

@JohnsNotHere Office documents can be unzipped typically, then you can dig through the contents as a file tree. Not sure if what you're after will be human-readable there, but it's a start.

@qsrmvt I actually went down that road and also tried opening the file via LibreOffice, but I'm not seeing any alternate text on the two images they have in the doc. Having said that, I'm not sure if the import into Libre caused any issues.

@JohnsNotHere As @qsrmvt said, it’s a zip file. If you know there’s some alternate text “foo” in the doc, do:
unzip doc.docx
grep -ril foo *

Now you know which XML file “foo” appears in. Open that file and look at it. Depending on your purposes you either need to use an xpath search utility or just grep. I don’t know whether you’re trying to do this once, so any quick hack works, or whether you’re trying to solve this at scale so you need precision and detail.

@paco @qsrmvt This was some malware that I was investigating. I ended up setting up a sandbox and just opening up the document without macros enabled. The macro was looking to execute code from the "AlternativeText" in one of the embedded images.

Turned out whoever set it up forgot to include the payload, so there wasn't anything there. But I was curious if I had to go that far given I used olevba to get the macro code, hence the question. 🙂

Sign in to participate in the conversation
Infosec Exchange

A Mastodon instance for info/cyber security-minded people.