#archiveteam 2014-10-14,Tue

โ†‘back Search

Time Nickname Message
01:44 ๐Ÿ”— dashcloud is there a project to archive the continuing going-ons in Ferguson and surrounding areas?
02:02 ๐Ÿ”— ohhdemgir https://www.youtube.com/watch?v=YUU3DrBMXm4
02:09 ๐Ÿ”— ohhdemgir https://archive.org/details/TVRecSeptember71990
05:22 ๐Ÿ”— DFJustin ohhdemgir: the video is available in 480p but your grab on ia is only 360p
05:23 ๐Ÿ”— DFJustin holy shit david the gnome
06:10 ๐Ÿ”— Nemo_bis ร‚ยซI believe that the CSS code we write today will be readable by computers 500 years from now.ร‚ยป https://dev.opera.com/articles/css-twenty-years-hakon/
06:16 ๐Ÿ”— joepie91 Nemo_bis: CSS is one of the few things where I would consider that statement even remotely reasonable
06:17 ๐Ÿ”— joepie91 (assuming humanity hasn't made itself extinct by then, that is)
06:17 ๐Ÿ”— joepie91 mostly because it can be extended without touching the base syntax
06:17 ๐Ÿ”— joepie91 same reason XML (as base format) will probably be just as readable (and just as awful to work with) in 500 years
06:17 ๐Ÿ”— joepie91 :P
06:18 ๐Ÿ”— joepie91 they're also both well-documented, and relatively easy to implement
06:18 ๐Ÿ”— joepie91 .. this is not -bs
06:23 ๐Ÿ”— Nemo_bis ;)
06:34 ๐Ÿ”— mcpaige #quitpic
14:12 ๐Ÿ”— DFJustin ohhdemgir: 480p version: http://interbutt.com/temp/TV%20Recording%20-%20CBS%20+%20Fox%20+%20Nickelodeon%20-%20September%207,%201990-YUU3DrBMXm4.mp4
17:41 ๐Ÿ”— SketchCow Trying to empty out FOS.
17:41 ๐Ÿ”— SketchCow Quite involved when people have all this joy pouring onto the drive.
17:53 ๐Ÿ”— arkiver SketchCow: Muad-Dib came up with a few halo websites
17:53 ๐Ÿ”— arkiver No new information is added anymore to those halo websites
17:53 ๐Ÿ”— arkiver They contain multiplayer event information and maps (I believe)
17:54 ๐Ÿ”— Muad-Dib those stats websites werent fed to archivebot
17:55 ๐Ÿ”— arkiver Since now new information is being added and the websites will go offline some time (can be tomorrow, can be in 100 years), we might be doing good creating a warrior project
17:55 ๐Ÿ”— arkiver I saw at least 100.000.000 different pages, their numbers are incremental
17:56 ๐Ÿ”— arkiver example:
17:56 ๐Ÿ”— arkiver http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955762
17:56 ๐Ÿ”— arkiver http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955763
17:56 ๐Ÿ”— arkiver etc.
18:01 ๐Ÿ”— Muad-Dib I *did* however feed archivebot halowaypoint.com, that filled up way faster then I expected
18:02 ๐Ÿ”— Muad-Dib arkiver: yeah, halo 3, odst and reach are going to be big
18:02 ๐Ÿ”— Muad-Dib but halo 2 should be doable, thats just match and player atatistics
18:02 ๐Ÿ”— Muad-Dib maybe just split it up per game
18:03 ๐Ÿ”— arkiver We're not talking about halowaypoint.com , that one still get's new updates and stuff
18:03 ๐Ÿ”— arkiver Halo.bungie.net doesn't, it's will just go away some day
18:04 ๐Ÿ”— SketchCow Agreed
18:04 ๐Ÿ”— Muad-Dib waypoint got overhauled yesterday, and it broke all the links in their forum threads :P
18:04 ๐Ÿ”— SketchCow (Halo data should be archived)
18:04 ๐Ÿ”— SketchCow Make it a thing
18:04 ๐Ÿ”— Muad-Dib Also, maybe we can convince SketchCow to do some PR magic for us and get us a DB copy from bungie :P
18:04 ๐Ÿ”— SketchCow Bungie doesn't have that data
18:05 ๐Ÿ”— arkiver SketchCow: will make sure it runs in a few days
18:05 ๐Ÿ”— Muad-Dib not the halo.bungie.net, aged statistics?
18:06 ๐Ÿ”— arkiver SketchCow: If there is any data that can only be accesed through an account (maybe like maps) can I grab stuff behind a login?
18:06 ๐Ÿ”— arkiver Or am I allowed to use a login for the whole download? to get everything behind accounts?
18:07 ๐Ÿ”— Muad-Dib the stats on halo.bungie.net stopped updating when the responsibility for the Halo franchise was shifted to 343 Industries/Microsoft
18:07 ๐Ÿ”— marc sketch around?
18:07 ๐Ÿ”— Muad-Dib he was just a minute ago :P
18:08 ๐Ÿ”— Muad-Dib SketchCow ^
18:08 ๐Ÿ”— marc fanx
18:09 ๐Ÿ”— marc http://rss.sfbg.com/
18:10 ๐Ÿ”— marc www.sfbg.com is down.. some business types bought it a year or two ago, fired the editor and then just shut it down
18:10 ๐Ÿ”— marc like 20-30 years of news (SF Bay Guardian, liberal weekly in SF)
18:10 ๐Ÿ”— SketchCow What
18:10 ๐Ÿ”— SketchCow Yes
18:10 ๐Ÿ”— SketchCow I'm making noise.
18:11 ๐Ÿ”— marc i know some employees wonder if they can get usb key of website archive
18:11 ๐Ÿ”— marc political community in SF is reeling due to loss of progressive collective memory, all reporting that revealed all manner of wrongdoing all gone
18:11 ๐Ÿ”— SketchCow I think the only part I can do is welcome the data
18:12 ๐Ÿ”— marc nod
18:12 ๐Ÿ”— marc i'm hostscanning their domain
18:12 ๐Ÿ”— marc to see if there's any cms still up or wahtever
18:12 ๐Ÿ”— marc rss feeds or whatever
18:12 ๐Ÿ”— marc $ telnet www.sfbg.com 80
18:12 ๐Ÿ”— marc Trying 206.169.91.254...
18:12 ๐Ÿ”— marc ^C
18:12 ๐Ÿ”— marc is completely gone but
18:12 ๐Ÿ”— marc $ telnet rss.sfbg.com 80
18:12 ๐Ÿ”— marc Connected to rss.sfbg.com.
18:12 ๐Ÿ”— marc Escape character is '^]'.
18:12 ๐Ÿ”— marc Trying 209.104.5.201...
18:13 ๐Ÿ”— marc is still up and running vulnerable Apache
18:14 ๐Ÿ”— marc https://en.wikipedia.org/wiki/San_Francisco_Bay_Guardian
19:45 ๐Ÿ”— SketchCow ..
19:45 ๐Ÿ”— SketchCow Oh, THAT Marc
19:45 ๐Ÿ”— SketchCow How's dad-hood
20:13 ๐Ÿ”— marc `@sketch
20:13 ๐Ÿ”— marc http://issuu.com/sf.guardian
20:13 ๐Ÿ”— marc looks like all the old back issues are cross published to some issuu.com shit
20:14 ๐Ÿ”— marc doesnt save any dead links tho
20:16 ๐Ÿ”— SketchCow Now we just need to grab all the .pdfs that comes from.
20:17 ๐Ÿ”— marc cool loading charles proxy to see what the flash movie is communicating with
20:18 ๐Ÿ”— SketchCow http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg
20:18 ๐Ÿ”— xmc i've pulled from issuu before
20:19 ๐Ÿ”— xmc there's pdfs on there for everything
20:19 ๐Ÿ”— xmc some pubs don't enable it
20:19 ๐Ÿ”— xmc but you can use the format from other pubs to construct new urls
20:19 ๐Ÿ”— SketchCow http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_40.jpg etc
20:19 ๐Ÿ”— xmc you get the original pdfs that the pub sent to the printer usually
20:20 ๐Ÿ”— joepie91 urgh, issuu's the fucking worst
20:20 ๐Ÿ”— joepie91 xmc: I've messed about with issuu before
20:20 ๐Ÿ”— joepie91 after an hour I gave up
20:20 ๐Ÿ”— xmc heh
20:20 ๐Ÿ”— SketchCow It's not hard.
20:20 ๐Ÿ”— xmc it's a drag but the data is there
20:20 ๐Ÿ”— SketchCow It's all document ID.
20:20 ๐Ÿ”— SketchCow the .JPG files will work.
20:20 ๐Ÿ”— joepie91 I mean the original PDF download
20:21 ๐Ÿ”— SketchCow Yes, and if we get the original PDFs great.
20:21 ๐Ÿ”— joepie91 which - at least back then - didn't have a filename corresponding to *anything* on the page
20:21 ๐Ÿ”— joepie91 :(
20:21 ๐Ÿ”— marc it's all jpeg files yah
20:21 ๐Ÿ”— xmc yes, the document ids are a serial number assigned by issuu
20:21 ๐Ÿ”— xmc iiuc
20:21 ๐Ÿ”— marc api.issu.com provides issue lookup 49.02 -> document ID 141007194746-7819f25a55f483150eb9462df2f4be34
20:22 ๐Ÿ”— marc "publicationId": "7819f25a55f483150eb9462df2f4be34",
20:22 ๐Ÿ”— marc "title": "San Francisco Bay Guardian",
20:22 ๐Ÿ”— marc oh hello
20:22 ๐Ÿ”— marc "orgDocName": "49.02.pdf",
20:22 ๐Ÿ”— marc "orgDocType": "pdf",
20:22 ๐Ÿ”— marc there's the orgdocname :)
20:23 ๐Ÿ”— joepie91 I doubt that'll work for documents where PDF download has been disabled
20:24 ๐Ÿ”— marc there's some s3 bucket http://document.issuu.com that i'm trying to scrape for 49.02.pdf
20:24 ๐Ÿ”— SketchCow I'm doing a jpg test just to do it.
20:24 ๐Ÿ”— marc that's the one that feeds a document.xml -> the flash movie
20:26 ๐Ÿ”— marc looks like each page is its own .swf that's loaded
20:27 ๐Ÿ”— marc the only jpegs are the index page of covers
20:27 ๐Ÿ”— marc eg
20:27 ๐Ÿ”— marc http://page.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/swf/page_17.swf
20:27 ๐Ÿ”— marc another s3 bucket
20:27 ๐Ÿ”— marc no luck looking for pdf in there :)
20:28 ๐Ÿ”— xmc oh wait no, it wasn't issuu i played with, it was "etypeservices.com"
20:28 ๐Ÿ”— xmc sorry
20:28 ๐Ÿ”— SketchCow So, here's my take.
20:29 ๐Ÿ”— SketchCow However we come up with it, a pile of publications can be turned into a collection on archive.org.
20:29 ๐Ÿ”— SketchCow More resolution is better, but the JPG works now.
20:29 ๐Ÿ”— marc u mean screenshotting the swf pages of each issue?
20:29 ๐Ÿ”— SketchCow I am not averse to there being two collections, one with the JPG grab and then a better quality one. "Publisher's edition"
20:29 ๐Ÿ”— SketchCow No, I am able to pull the .JPGs directly.
20:29 ๐Ÿ”— marc oh nice
20:29 ๐Ÿ”— marc url
20:29 ๐Ÿ”— SketchCow view-source:http://issuu.com/sf.guardian/docs/49.02
20:29 ๐Ÿ”— marc oh cool
20:29 ๐Ÿ”— SketchCow <link rel="image_src" href="http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg">
20:30 ๐Ÿ”— marc just for those covers tho right?
20:30 ๐Ÿ”— marc oh great
20:30 ๐Ÿ”— marc k
20:30 ๐Ÿ”— SketchCow And then in increments for page_x.jpg
20:30 ๐Ÿ”— marc awesome
20:30 ๐Ÿ”— marc nice
20:30 ๐Ÿ”— marc yah didnt lookit that source was only sniffing api calls
20:30 ๐Ÿ”— SketchCow I just proof of concept pulled the pages of an issue.
20:30 ๐Ÿ”— marc nice1
20:30 ๐Ÿ”— SketchCow https://archive.org/details/SFBG-10-08-2014
20:30 ๐Ÿ”— SketchCow The archive will turn that into a readable document from the zip I uploaded.
20:31 ๐Ÿ”— marc i guess it makes sense to look at other issuu.com periodicals that make pdf download avail and see if same url scheme is avail
20:31 ๐Ÿ”— marc cool
20:31 ๐Ÿ”— marc def better than nothing
20:31 ๐Ÿ”— SketchCow I agree the party shouldn't stop there.
20:32 ๐Ÿ”— SketchCow I do think it would be good to get a list of all the issues on Issuu, get those document Ids, and have that list. Shared wiki or google doc?
20:32 ๐Ÿ”— marc sf alternative weekly archive going back to 2011
20:32 ๐Ÿ”— marc also they have an ios app
20:32 ๐Ÿ”— marc so they definitely have pdf download avail in ios
20:32 ๐Ÿ”— marc since there's no swf support
20:32 ๐Ÿ”— marc i can reverse that ios app
20:33 ๐Ÿ”— marc wiki or google doc either way
20:33 ๐Ÿ”— SketchCow Does it just go back to 2011?
20:34 ๐Ÿ”— marc the issuu web archive does yeah
20:34 ๐Ÿ”— marc :(
20:34 ๐Ÿ”— marc com.issuu.app is the ios app
20:35 ๐Ÿ”— marc "192 publications" it says in issuu.com header which gels with going back to January 2011
20:36 ๐Ÿ”— marc cracking com.issuu.app and mitming the web requests to see if i can get link to pdfs
20:37 ๐Ÿ”— marc http://api.issuu.com/query?action=issuu.document.get_user_doc&format=json&documentUsername=sf.guardian&name=49.02
20:37 ๐Ÿ”— marc that api.issuu.com rest endpoint for issue# -> Document id is
20:37 ๐Ÿ”— marc so
20:38 ๐Ÿ”— marc issues only go back to 45.11
20:39 ๐Ÿ”— marc 45.11->45.52, 46.01->46.52, 47.01->47.52, 48.02->48.52, 49.01->49.03
20:39 ๐Ÿ”— marc are the 192 back issues that issuu.com has
20:45 ๐Ÿ”— marc okay decrypted+cracked com.issuu.app loading into IDAPro to look for pdf endpoint
20:45 ๐Ÿ”— xmc that was fast
20:46 ๐Ÿ”— marc yup they use pdf rendering yeehaw
20:46 ๐Ÿ”— marc http://publish.issuu.com/getPdfPageSplitter
20:46 ๐Ÿ”— marc that looks interesting
20:47 ๐Ÿ”— marc ah more important
20:47 ๐Ÿ”— marc http://publication.issuu.com/??/??/ios_1.json
20:47 ๐Ÿ”— marc maybe links to pdfs
20:47 ๐Ÿ”— marc that's ios-specific API endpoint that probably feeds pdfs
20:48 ๐Ÿ”— marc http://image.issuu.com/%@/jpg/page_%ld%@.jpg
20:48 ๐Ÿ”— marc that's the endpoint jason discovered
20:49 ๐Ÿ”— marc oh and they have an undocument test environment api here: http://api-uat.issuu.com okay gunna lookit that pdf splitter thing
20:50 ๐Ÿ”— marc (%@ is objective-c format printf format shite)
20:54 ๐Ÿ”— marc okay investigating on device (ios app using burp proxy)
21:08 ๐Ÿ”— marc okay looks like
21:09 ๐Ÿ”— marc formati s http://publication.issuu.com/sf.guardian/49.02/ios_1.json
21:09 ๐Ÿ”— marc http://publication.issuu.com/sf.guardian/49.02/ios_1.json
21:09 ๐Ÿ”— marc {"pdf_pages_available": true,
21:09 ๐Ÿ”— marc http://page-pdf.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/8.pdf
21:09 ๐Ÿ”— marc so every page is a pdf
21:10 ๐Ÿ”— marc ios API revealed that the pdf link s http://page-pdf.issuu.com/$DOCUMENTID/1.pdf etc etc
21:10 ๐Ÿ”— marc er
21:10 ๐Ÿ”— marc 2.pdf
21:10 ๐Ÿ”— marc starting at 2 because covers arent 1.pdf (and 404)
21:11 ๐Ÿ”— marc cover are still jpegs, and have null pdfURLs
21:11 ๐Ÿ”— marc http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg
21:12 ๐Ÿ”— marc @jason do u hav a wiki to document this stuff or were u asking me if i could make one
21:12 ๐Ÿ”— marc pdfs have full text so that's dope
21:13 ๐Ÿ”— schbirid marc: archiveteam.org
21:13 ๐Ÿ”— marc k
21:13 ๐Ÿ”— marc thx
21:13 ๐Ÿ”— schbirid not sure if registration works though :s
21:13 ๐Ÿ”— marc do i just make a new page or what
21:13 ๐Ÿ”— marc cool what yr login ;P
21:13 ๐Ÿ”— xmc registration works if you use the secret code 'yahoosucks'
21:14 ๐Ÿ”— marc lol nice
21:14 ๐Ÿ”— marc thx
21:14 ๐Ÿ”— marc NUP: YAHOOSUCKS
21:14 ๐Ÿ”— xmc AOL Keyword YAHOOSUCKS
21:14 ๐Ÿ”— schbirid just do http://archiveteam.org/index.php?title=Issuu i'd say
21:15 ๐Ÿ”— marc yah
21:15 ๐Ÿ”— marc neg issue, sfbay guardian
21:15 ๐Ÿ”— marc but it can be repurposed for issuu
21:20 ๐Ÿ”— marc http://archiveteam.org/index.php?title=SF_Bay_Guardian
21:20 ๐Ÿ”— marc okay
21:20 ๐Ÿ”— marc updated with issuu.com pdf urls
21:21 ๐Ÿ”— marc pardon my shit formatting
21:23 ๐Ÿ”— schbirid nice
21:25 ๐Ÿ”— schbirid now go grab them all!
21:26 ๐Ÿ”— marc okay fixed formatting
21:26 ๐Ÿ”— marc @jason am i done yet? :)
21:28 ๐Ÿ”— SketchCow Like, can you move on?
21:28 ๐Ÿ”— SketchCow Yes, you can move on.
21:28 ๐Ÿ”— SketchCow 192 issues is better than nothing!
21:29 ๐Ÿ”— marc okay done added that document name -> document ID endpoint
21:29 ๐Ÿ”— marc thx duders!!! u da best :)
21:29 ๐Ÿ”— SketchCow No problem.
21:29 ๐Ÿ”— SketchCow Obviously more should be tracked down.
21:29 ๐Ÿ”— marc hope the issuu.com stuff is reusable
21:30 ๐Ÿ”— marc Issuu.com uses a Flash viewer to stream JPEGS, but the iPhone App has access to the full pdfs
21:30 ๐Ÿ”— SketchCow I'd suggest you use your skills to go after local SF reporters and poeple you could get stuff for.
21:30 ๐Ÿ”— marc god bless steve :)
21:30 ๐Ÿ”— SketchCow Archive.org will host anything, obviously.
21:30 ๐Ÿ”— marc yah i've been mailing and calling people thx
21:30 ๐Ÿ”— marc reporters said all the docs are owned by SF MEdia Co (the new owners)
21:31 ๐Ÿ”— marc they seem to be locked out of that host, I might try and tell some more interested parties about the vulnerable version of apache running on rss.sfbg.com
21:33 ๐Ÿ”— marc they might try to buy the paper but that sounds unrealistic
21:34 ๐Ÿ”— joepie91 ayt
21:34 ๐Ÿ”— joepie91 er
21:34 ๐Ÿ”— joepie91 anything.sfbg.com will return the 'closed down' page
21:34 ๐Ÿ”— joepie91 no matter what prefixes it
21:34 ๐Ÿ”— joepie91 (unless a specific host is specified, probably)
21:34 ๐Ÿ”— joepie91 which makes me believe the content may still be there
21:36 ๐Ÿ”— joepie91 [23:34] <joepie91> anything.sfbg.com will return the 'closed down' page
21:36 ๐Ÿ”— joepie91 [23:34] <joepie91> no matter what prefixes it
21:36 ๐Ÿ”— joepie91 [23:34] <joepie91> (unless a specific host is specified, probably)
21:36 ๐Ÿ”— joepie91 [23:34] <joepie91> which makes me believe the content may still be there
21:36 ๐Ÿ”— joepie91 but just hidden by a catch-all HTTPd config block
21:40 ๐Ÿ”— marc ah thx
21:40 ๐Ÿ”— marc good call
21:40 ๐Ÿ”— marc j'agree
21:40 ๐Ÿ”— marc also what is a joe pie it sounds delicious
21:41 ๐Ÿ”— marc okay gotta bounce tx again jason et al
21:41 ๐Ÿ”— marc if yall ever need any ios reversing help u know where to find me! har
21:47 ๐Ÿ”— xmc hehe
21:48 ๐Ÿ”— godane issue pdf link: http://s3.amazonaws.com/document.issuu.com/141014184412-c62aa490968318abf1d4684f9ec4f158/original.file?AWSAccessKeyId=AKIAJY7E3JMLFKPAGP7A&Expires=1413326827&Signature=m6kKoMNHg6sFa1J4nIWZvK23RAs%3D
21:51 ๐Ÿ”— xmc wooooo
21:51 ๐Ÿ”— xmc hell of a link
21:52 ๐Ÿ”— marc origina lfile nice
21:54 ๐Ÿ”— marc not sure how to generate that signature
21:54 ๐Ÿ”— marc eg change document url
21:54 ๐Ÿ”— xmc how'd you get the link, godane?
21:56 ๐Ÿ”— godane i login using facebook
22:20 ๐Ÿ”— godane no download link on this issue: http://issuu.com/sf.guardian/docs/48.15
22:20 ๐Ÿ”— xmc godane is archiveteam's secret weapon against "where is the best quality file"
22:22 ๐Ÿ”— godane so looks like it will be comic book files from January 2014 and earlier
22:23 ๐Ÿ”— godane ok 48.07 have download link
22:26 ๐Ÿ”— aaaaaaaaa I'm still not convinced that godane isn't an archiving robot from the future sent back through time to prevent some horrible calamity caused by not having enough old data.
22:31 ๐Ÿ”— xmc i'm ok with this
22:36 ๐Ÿ”— joepie91 hahaha
22:36 ๐Ÿ”— joepie91 as am I
22:36 ๐Ÿ”— godane btw i think like a time traveler
22:38 ๐Ÿ”— godane anyways i think issue 48.15 to 48.10 don't have download links
22:39 ๐Ÿ”— godane 48.06 doesn't have a download link either
22:41 ๐Ÿ”— godane so 48.06 to 48.01. don't have download links
22:42 ๐Ÿ”— godane i'm going to upload these pdfs as a zip archive
22:42 ๐Ÿ”— godane only cause it will be incomplete and to save time on my end
22:48 ๐Ÿ”— godane looks like i may got them all now

irclogger-viewer