[01:44] is there a project to archive the continuing going-ons in Ferguson and surrounding areas? [02:02] https://www.youtube.com/watch?v=YUU3DrBMXm4 [02:09] https://archive.org/details/TVRecSeptember71990 [05:22] ohhdemgir: the video is available in 480p but your grab on ia is only 360p [05:23] holy shit david the gnome [06:10] «I believe that the CSS code we write today will be readable by computers 500 years from now.» https://dev.opera.com/articles/css-twenty-years-hakon/ [06:16] Nemo_bis: CSS is one of the few things where I would consider that statement even remotely reasonable [06:17] (assuming humanity hasn't made itself extinct by then, that is) [06:17] mostly because it can be extended without touching the base syntax [06:17] same reason XML (as base format) will probably be just as readable (and just as awful to work with) in 500 years [06:17] :P [06:18] they're also both well-documented, and relatively easy to implement [06:18] .. this is not -bs [06:23] ;) [06:34] #quitpic [14:12] ohhdemgir: 480p version: http://interbutt.com/temp/TV%20Recording%20-%20CBS%20+%20Fox%20+%20Nickelodeon%20-%20September%207,%201990-YUU3DrBMXm4.mp4 [17:41] Trying to empty out FOS. [17:41] Quite involved when people have all this joy pouring onto the drive. [17:53] SketchCow: Muad-Dib came up with a few halo websites [17:53] No new information is added anymore to those halo websites [17:53] They contain multiplayer event information and maps (I believe) [17:54] those stats websites werent fed to archivebot [17:55] Since now new information is being added and the websites will go offline some time (can be tomorrow, can be in 100 years), we might be doing good creating a warrior project [17:55] I saw at least 100.000.000 different pages, their numbers are incremental [17:56] example: [17:56] http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955762 [17:56] http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955763 [17:56] etc. [18:01] I *did* however feed archivebot halowaypoint.com, that filled up way faster then I expected [18:02] arkiver: yeah, halo 3, odst and reach are going to be big [18:02] but halo 2 should be doable, thats just match and player atatistics [18:02] maybe just split it up per game [18:03] We're not talking about halowaypoint.com , that one still get's new updates and stuff [18:03] Halo.bungie.net doesn't, it's will just go away some day [18:04] Agreed [18:04] waypoint got overhauled yesterday, and it broke all the links in their forum threads :P [18:04] (Halo data should be archived) [18:04] Make it a thing [18:04] Also, maybe we can convince SketchCow to do some PR magic for us and get us a DB copy from bungie :P [18:04] Bungie doesn't have that data [18:05] SketchCow: will make sure it runs in a few days [18:05] not the halo.bungie.net, aged statistics? [18:06] SketchCow: If there is any data that can only be accesed through an account (maybe like maps) can I grab stuff behind a login? [18:06] Or am I allowed to use a login for the whole download? to get everything behind accounts? [18:07] the stats on halo.bungie.net stopped updating when the responsibility for the Halo franchise was shifted to 343 Industries/Microsoft [18:07] sketch around? [18:07] he was just a minute ago :P [18:08] SketchCow ^ [18:08] fanx [18:09] http://rss.sfbg.com/ [18:10] www.sfbg.com is down.. some business types bought it a year or two ago, fired the editor and then just shut it down [18:10] like 20-30 years of news (SF Bay Guardian, liberal weekly in SF) [18:10] What [18:10] Yes [18:10] I'm making noise. [18:11] i know some employees wonder if they can get usb key of website archive [18:11] political community in SF is reeling due to loss of progressive collective memory, all reporting that revealed all manner of wrongdoing all gone [18:11] I think the only part I can do is welcome the data [18:12] nod [18:12] i'm hostscanning their domain [18:12] to see if there's any cms still up or wahtever [18:12] rss feeds or whatever [18:12] $ telnet www.sfbg.com 80 [18:12] Trying 206.169.91.254... [18:12] ^C [18:12] is completely gone but [18:12] $ telnet rss.sfbg.com 80 [18:12] Connected to rss.sfbg.com. [18:12] Escape character is '^]'. [18:12] Trying 209.104.5.201... [18:13] is still up and running vulnerable Apache [18:14] https://en.wikipedia.org/wiki/San_Francisco_Bay_Guardian [19:45] .. [19:45] Oh, THAT Marc [19:45] How's dad-hood [20:13] `@sketch [20:13] http://issuu.com/sf.guardian [20:13] looks like all the old back issues are cross published to some issuu.com shit [20:14] doesnt save any dead links tho [20:16] Now we just need to grab all the .pdfs that comes from. [20:17] cool loading charles proxy to see what the flash movie is communicating with [20:18] http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg [20:18] i've pulled from issuu before [20:19] there's pdfs on there for everything [20:19] some pubs don't enable it [20:19] but you can use the format from other pubs to construct new urls [20:19] http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_40.jpg etc [20:19] you get the original pdfs that the pub sent to the printer usually [20:20] urgh, issuu's the fucking worst [20:20] xmc: I've messed about with issuu before [20:20] after an hour I gave up [20:20] heh [20:20] It's not hard. [20:20] it's a drag but the data is there [20:20] It's all document ID. [20:20] the .JPG files will work. [20:20] I mean the original PDF download [20:21] Yes, and if we get the original PDFs great. [20:21] which - at least back then - didn't have a filename corresponding to *anything* on the page [20:21] :( [20:21] it's all jpeg files yah [20:21] yes, the document ids are a serial number assigned by issuu [20:21] iiuc [20:21] api.issu.com provides issue lookup 49.02 -> document ID 141007194746-7819f25a55f483150eb9462df2f4be34 [20:22] "publicationId": "7819f25a55f483150eb9462df2f4be34", [20:22] "title": "San Francisco Bay Guardian", [20:22] oh hello [20:22] "orgDocName": "49.02.pdf", [20:22] "orgDocType": "pdf", [20:22] there's the orgdocname :) [20:23] I doubt that'll work for documents where PDF download has been disabled [20:24] there's some s3 bucket http://document.issuu.com that i'm trying to scrape for 49.02.pdf [20:24] I'm doing a jpg test just to do it. [20:24] that's the one that feeds a document.xml -> the flash movie [20:26] looks like each page is its own .swf that's loaded [20:27] the only jpegs are the index page of covers [20:27] eg [20:27] http://page.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/swf/page_17.swf [20:27] another s3 bucket [20:27] no luck looking for pdf in there :) [20:28] oh wait no, it wasn't issuu i played with, it was "etypeservices.com" [20:28] sorry [20:28] So, here's my take. [20:29] However we come up with it, a pile of publications can be turned into a collection on archive.org. [20:29] More resolution is better, but the JPG works now. [20:29] u mean screenshotting the swf pages of each issue? [20:29] I am not averse to there being two collections, one with the JPG grab and then a better quality one. "Publisher's edition" [20:29] No, I am able to pull the .JPGs directly. [20:29] oh nice [20:29] url [20:29] view-source:http://issuu.com/sf.guardian/docs/49.02 [20:29] oh cool [20:29] [20:30] just for those covers tho right? [20:30] oh great [20:30] k [20:30] And then in increments for page_x.jpg [20:30] awesome [20:30] nice [20:30] yah didnt lookit that source was only sniffing api calls [20:30] I just proof of concept pulled the pages of an issue. [20:30] nice1 [20:30] https://archive.org/details/SFBG-10-08-2014 [20:30] The archive will turn that into a readable document from the zip I uploaded. [20:31] i guess it makes sense to look at other issuu.com periodicals that make pdf download avail and see if same url scheme is avail [20:31] cool [20:31] def better than nothing [20:31] I agree the party shouldn't stop there. [20:32] I do think it would be good to get a list of all the issues on Issuu, get those document Ids, and have that list. Shared wiki or google doc? [20:32] sf alternative weekly archive going back to 2011 [20:32] also they have an ios app [20:32] so they definitely have pdf download avail in ios [20:32] since there's no swf support [20:32] i can reverse that ios app [20:33] wiki or google doc either way [20:33] Does it just go back to 2011? [20:34] the issuu web archive does yeah [20:34] :( [20:34] com.issuu.app is the ios app [20:35] "192 publications" it says in issuu.com header which gels with going back to January 2011 [20:36] cracking com.issuu.app and mitming the web requests to see if i can get link to pdfs [20:37] http://api.issuu.com/query?action=issuu.document.get_user_doc&format=json&documentUsername=sf.guardian&name=49.02 [20:37] that api.issuu.com rest endpoint for issue# -> Document id is [20:37] so [20:38] issues only go back to 45.11 [20:39] 45.11->45.52, 46.01->46.52, 47.01->47.52, 48.02->48.52, 49.01->49.03 [20:39] are the 192 back issues that issuu.com has [20:45] okay decrypted+cracked com.issuu.app loading into IDAPro to look for pdf endpoint [20:45] that was fast [20:46] yup they use pdf rendering yeehaw [20:46] http://publish.issuu.com/getPdfPageSplitter [20:46] that looks interesting [20:47] ah more important [20:47] http://publication.issuu.com/??/??/ios_1.json [20:47] maybe links to pdfs [20:47] that's ios-specific API endpoint that probably feeds pdfs [20:48] http://image.issuu.com/%@/jpg/page_%ld%@.jpg [20:48] that's the endpoint jason discovered [20:49] oh and they have an undocument test environment api here: http://api-uat.issuu.com okay gunna lookit that pdf splitter thing [20:50] (%@ is objective-c format printf format shite) [20:54] okay investigating on device (ios app using burp proxy) [21:08] okay looks like [21:09] formati s http://publication.issuu.com/sf.guardian/49.02/ios_1.json [21:09] http://publication.issuu.com/sf.guardian/49.02/ios_1.json [21:09] {"pdf_pages_available": true, [21:09] http://page-pdf.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/8.pdf [21:09] so every page is a pdf [21:10] ios API revealed that the pdf link s http://page-pdf.issuu.com/$DOCUMENTID/1.pdf etc etc [21:10] er [21:10] 2.pdf [21:10] starting at 2 because covers arent 1.pdf (and 404) [21:11] cover are still jpegs, and have null pdfURLs [21:11] http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg [21:12] @jason do u hav a wiki to document this stuff or were u asking me if i could make one [21:12] pdfs have full text so that's dope [21:13] marc: archiveteam.org [21:13] k [21:13] thx [21:13] not sure if registration works though :s [21:13] do i just make a new page or what [21:13] cool what yr login ;P [21:13] registration works if you use the secret code 'yahoosucks' [21:14] lol nice [21:14] thx [21:14] NUP: YAHOOSUCKS [21:14] AOL Keyword YAHOOSUCKS [21:14] just do http://archiveteam.org/index.php?title=Issuu i'd say [21:15] yah [21:15] neg issue, sfbay guardian [21:15] but it can be repurposed for issuu [21:20] http://archiveteam.org/index.php?title=SF_Bay_Guardian [21:20] okay [21:20] updated with issuu.com pdf urls [21:21] pardon my shit formatting [21:23] nice [21:25] now go grab them all! [21:26] okay fixed formatting [21:26] @jason am i done yet? :) [21:28] Like, can you move on? [21:28] Yes, you can move on. [21:28] 192 issues is better than nothing! [21:29] okay done added that document name -> document ID endpoint [21:29] thx duders!!! u da best :) [21:29] No problem. [21:29] Obviously more should be tracked down. [21:29] hope the issuu.com stuff is reusable [21:30] Issuu.com uses a Flash viewer to stream JPEGS, but the iPhone App has access to the full pdfs [21:30] I'd suggest you use your skills to go after local SF reporters and poeple you could get stuff for. [21:30] god bless steve :) [21:30] Archive.org will host anything, obviously. [21:30] yah i've been mailing and calling people thx [21:30] reporters said all the docs are owned by SF MEdia Co (the new owners) [21:31] they seem to be locked out of that host, I might try and tell some more interested parties about the vulnerable version of apache running on rss.sfbg.com [21:33] they might try to buy the paper but that sounds unrealistic [21:34] ayt [21:34] er [21:34] anything.sfbg.com will return the 'closed down' page [21:34] no matter what prefixes it [21:34] (unless a specific host is specified, probably) [21:34] which makes me believe the content may still be there [21:36] [23:34] anything.sfbg.com will return the 'closed down' page [21:36] [23:34] no matter what prefixes it [21:36] [23:34] (unless a specific host is specified, probably) [21:36] [23:34] which makes me believe the content may still be there [21:36] but just hidden by a catch-all HTTPd config block [21:40] ah thx [21:40] good call [21:40] j'agree [21:40] also what is a joe pie it sounds delicious [21:41] okay gotta bounce tx again jason et al [21:41] if yall ever need any ios reversing help u know where to find me! har [21:47] hehe [21:48] issue pdf link: http://s3.amazonaws.com/document.issuu.com/141014184412-c62aa490968318abf1d4684f9ec4f158/original.file?AWSAccessKeyId=AKIAJY7E3JMLFKPAGP7A&Expires=1413326827&Signature=m6kKoMNHg6sFa1J4nIWZvK23RAs%3D [21:51] wooooo [21:51] hell of a link [21:52] origina lfile nice [21:54] not sure how to generate that signature [21:54] eg change document url [21:54] how'd you get the link, godane? [21:56] i login using facebook [22:20] no download link on this issue: http://issuu.com/sf.guardian/docs/48.15 [22:20] godane is archiveteam's secret weapon against "where is the best quality file" [22:22] so looks like it will be comic book files from January 2014 and earlier [22:23] ok 48.07 have download link [22:26] I'm still not convinced that godane isn't an archiving robot from the future sent back through time to prevent some horrible calamity caused by not having enough old data. [22:31] i'm ok with this [22:36] hahaha [22:36] as am I [22:36] btw i think like a time traveler [22:38] anyways i think issue 48.15 to 48.10 don't have download links [22:39] 48.06 doesn't have a download link either [22:41] so 48.06 to 48.01. don't have download links [22:42] i'm going to upload these pdfs as a zip archive [22:42] only cause it will be incomplete and to save time on my end [22:48] looks like i may got them all now