[04:10] i may have found something interesting [04:11] so it turns out that wal-mart has a 3rd-party host some pdfs [04:11] the 3rd parts is vo.msecnd.net domain [04:26] godane: something microsoft [04:36] any ways there is a msrvideo.vo.msecnd.net domain and it has pdfs of there research [04:51] SketchCow, undersco2, someone is reporting issues with warc2zip - where do I report this? [05:14] joepie92: the pdfs are from microsoft research labs [05:39] so the microsoft research pdfs are going to be a uploaded as a zip file [05:40] this way i can do the dump slowly per a 1000 ids [06:15] i'm starting to upload the microsoft research pdfs i found: https://archive.org/details/msrvideo.vo.msecnd.net-pdf-grab-103000-to-104000 [06:15] the range alone is over 1gb [06:16] also know i try downloading from 100000 but got nothing until 103339 [07:24] Lord_Nigh: I typically imagine warriors as a fleet of express trains that spawn out of nowhere, run over a due-to-be-shutdown service, somehow magically transport all the data into their cargo hold, and then vanish into thin air [07:29] I imagine "THIS! IS! ARCHIVING!" *kick* followed immediately by a Battle of Thermopylae situation [07:43] Battle of Thermopylae? [07:45] The famous battle with King Leonidas and his 300 spartans, joepie92 [07:45] ahhh [07:45] heh [09:43] huh, i cant lpay the ogv from https://archive.org/details/internetarchivecelebration20131024 with mplayer or vlc. [09:58] question about scanning schematics: for large 11x17 schematics which only ever had about a 1140x1720 or so image printed as the schematic istelf on them, i think 400dpi is sufficient [09:58] for most hand drawn or greyscale schems 800 is needed but in this case i think 400 is fine [10:45] just know that 105000 to 120000 range of microsoft research papers is going to be very small cause there are very few files there [13:03] /join #archiveteam [19:21] SketchCow: re your scanning blog post: I personally don't mind destroying something to scan it if it's very common and easy to replace [19:21] reminds me though, I need to upload some recent scans [20:04] * phillipsj has pasted his root password into IRC chat. [20:47] wget's -D matches all subdomains for the specified domains, is that new or did i never notice [20:47] ev tumblr.com will match ALL *.tumblr.com [20:47] ev = eg [21:38] https://fbcdn-sphotos-f-a.akamaihd.net/hphotos-ak-prn1/1385827_651805974859860_2051484422_n.jpg [21:41] ersi: I love it! [21:42] lol. My comeback always was: "No, but 'u' and 'i' are in community!"