Time |
Nickname |
Message |
01:44
๐
|
dashcloud |
is there a project to archive the continuing going-ons in Ferguson and surrounding areas? |
02:02
๐
|
ohhdemgir |
https://www.youtube.com/watch?v=YUU3DrBMXm4 |
02:09
๐
|
ohhdemgir |
https://archive.org/details/TVRecSeptember71990 |
05:22
๐
|
DFJustin |
ohhdemgir: the video is available in 480p but your grab on ia is only 360p |
05:23
๐
|
DFJustin |
holy shit david the gnome |
06:10
๐
|
Nemo_bis |
รยซI believe that the CSS code we write today will be readable by computers 500 years from now.รยป https://dev.opera.com/articles/css-twenty-years-hakon/ |
06:16
๐
|
joepie91 |
Nemo_bis: CSS is one of the few things where I would consider that statement even remotely reasonable |
06:17
๐
|
joepie91 |
(assuming humanity hasn't made itself extinct by then, that is) |
06:17
๐
|
joepie91 |
mostly because it can be extended without touching the base syntax |
06:17
๐
|
joepie91 |
same reason XML (as base format) will probably be just as readable (and just as awful to work with) in 500 years |
06:17
๐
|
joepie91 |
:P |
06:18
๐
|
joepie91 |
they're also both well-documented, and relatively easy to implement |
06:18
๐
|
joepie91 |
.. this is not -bs |
06:23
๐
|
Nemo_bis |
;) |
06:34
๐
|
mcpaige |
#quitpic |
14:12
๐
|
DFJustin |
ohhdemgir: 480p version: http://interbutt.com/temp/TV%20Recording%20-%20CBS%20+%20Fox%20+%20Nickelodeon%20-%20September%207,%201990-YUU3DrBMXm4.mp4 |
17:41
๐
|
SketchCow |
Trying to empty out FOS. |
17:41
๐
|
SketchCow |
Quite involved when people have all this joy pouring onto the drive. |
17:53
๐
|
arkiver |
SketchCow: Muad-Dib came up with a few halo websites |
17:53
๐
|
arkiver |
No new information is added anymore to those halo websites |
17:53
๐
|
arkiver |
They contain multiplayer event information and maps (I believe) |
17:54
๐
|
Muad-Dib |
those stats websites werent fed to archivebot |
17:55
๐
|
arkiver |
Since now new information is being added and the websites will go offline some time (can be tomorrow, can be in 100 years), we might be doing good creating a warrior project |
17:55
๐
|
arkiver |
I saw at least 100.000.000 different pages, their numbers are incremental |
17:56
๐
|
arkiver |
example: |
17:56
๐
|
arkiver |
http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955762 |
17:56
๐
|
arkiver |
http://halo.bungie.net/Online/Halo3UserContentDetails.aspx?h3fileid=123955763 |
17:56
๐
|
arkiver |
etc. |
18:01
๐
|
Muad-Dib |
I *did* however feed archivebot halowaypoint.com, that filled up way faster then I expected |
18:02
๐
|
Muad-Dib |
arkiver: yeah, halo 3, odst and reach are going to be big |
18:02
๐
|
Muad-Dib |
but halo 2 should be doable, thats just match and player atatistics |
18:02
๐
|
Muad-Dib |
maybe just split it up per game |
18:03
๐
|
arkiver |
We're not talking about halowaypoint.com , that one still get's new updates and stuff |
18:03
๐
|
arkiver |
Halo.bungie.net doesn't, it's will just go away some day |
18:04
๐
|
SketchCow |
Agreed |
18:04
๐
|
Muad-Dib |
waypoint got overhauled yesterday, and it broke all the links in their forum threads :P |
18:04
๐
|
SketchCow |
(Halo data should be archived) |
18:04
๐
|
SketchCow |
Make it a thing |
18:04
๐
|
Muad-Dib |
Also, maybe we can convince SketchCow to do some PR magic for us and get us a DB copy from bungie :P |
18:04
๐
|
SketchCow |
Bungie doesn't have that data |
18:05
๐
|
arkiver |
SketchCow: will make sure it runs in a few days |
18:05
๐
|
Muad-Dib |
not the halo.bungie.net, aged statistics? |
18:06
๐
|
arkiver |
SketchCow: If there is any data that can only be accesed through an account (maybe like maps) can I grab stuff behind a login? |
18:06
๐
|
arkiver |
Or am I allowed to use a login for the whole download? to get everything behind accounts? |
18:07
๐
|
Muad-Dib |
the stats on halo.bungie.net stopped updating when the responsibility for the Halo franchise was shifted to 343 Industries/Microsoft |
18:07
๐
|
marc |
sketch around? |
18:07
๐
|
Muad-Dib |
he was just a minute ago :P |
18:08
๐
|
Muad-Dib |
SketchCow ^ |
18:08
๐
|
marc |
fanx |
18:09
๐
|
marc |
http://rss.sfbg.com/ |
18:10
๐
|
marc |
www.sfbg.com is down.. some business types bought it a year or two ago, fired the editor and then just shut it down |
18:10
๐
|
marc |
like 20-30 years of news (SF Bay Guardian, liberal weekly in SF) |
18:10
๐
|
SketchCow |
What |
18:10
๐
|
SketchCow |
Yes |
18:10
๐
|
SketchCow |
I'm making noise. |
18:11
๐
|
marc |
i know some employees wonder if they can get usb key of website archive |
18:11
๐
|
marc |
political community in SF is reeling due to loss of progressive collective memory, all reporting that revealed all manner of wrongdoing all gone |
18:11
๐
|
SketchCow |
I think the only part I can do is welcome the data |
18:12
๐
|
marc |
nod |
18:12
๐
|
marc |
i'm hostscanning their domain |
18:12
๐
|
marc |
to see if there's any cms still up or wahtever |
18:12
๐
|
marc |
rss feeds or whatever |
18:12
๐
|
marc |
$ telnet www.sfbg.com 80 |
18:12
๐
|
marc |
Trying 206.169.91.254... |
18:12
๐
|
marc |
^C |
18:12
๐
|
marc |
is completely gone but |
18:12
๐
|
marc |
$ telnet rss.sfbg.com 80 |
18:12
๐
|
marc |
Connected to rss.sfbg.com. |
18:12
๐
|
marc |
Escape character is '^]'. |
18:12
๐
|
marc |
Trying 209.104.5.201... |
18:13
๐
|
marc |
is still up and running vulnerable Apache |
18:14
๐
|
marc |
https://en.wikipedia.org/wiki/San_Francisco_Bay_Guardian |
19:45
๐
|
SketchCow |
.. |
19:45
๐
|
SketchCow |
Oh, THAT Marc |
19:45
๐
|
SketchCow |
How's dad-hood |
20:13
๐
|
marc |
`@sketch |
20:13
๐
|
marc |
http://issuu.com/sf.guardian |
20:13
๐
|
marc |
looks like all the old back issues are cross published to some issuu.com shit |
20:14
๐
|
marc |
doesnt save any dead links tho |
20:16
๐
|
SketchCow |
Now we just need to grab all the .pdfs that comes from. |
20:17
๐
|
marc |
cool loading charles proxy to see what the flash movie is communicating with |
20:18
๐
|
SketchCow |
http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg |
20:18
๐
|
xmc |
i've pulled from issuu before |
20:19
๐
|
xmc |
there's pdfs on there for everything |
20:19
๐
|
xmc |
some pubs don't enable it |
20:19
๐
|
xmc |
but you can use the format from other pubs to construct new urls |
20:19
๐
|
SketchCow |
http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_40.jpg etc |
20:19
๐
|
xmc |
you get the original pdfs that the pub sent to the printer usually |
20:20
๐
|
joepie91 |
urgh, issuu's the fucking worst |
20:20
๐
|
joepie91 |
xmc: I've messed about with issuu before |
20:20
๐
|
joepie91 |
after an hour I gave up |
20:20
๐
|
xmc |
heh |
20:20
๐
|
SketchCow |
It's not hard. |
20:20
๐
|
xmc |
it's a drag but the data is there |
20:20
๐
|
SketchCow |
It's all document ID. |
20:20
๐
|
SketchCow |
the .JPG files will work. |
20:20
๐
|
joepie91 |
I mean the original PDF download |
20:21
๐
|
SketchCow |
Yes, and if we get the original PDFs great. |
20:21
๐
|
joepie91 |
which - at least back then - didn't have a filename corresponding to *anything* on the page |
20:21
๐
|
joepie91 |
:( |
20:21
๐
|
marc |
it's all jpeg files yah |
20:21
๐
|
xmc |
yes, the document ids are a serial number assigned by issuu |
20:21
๐
|
xmc |
iiuc |
20:21
๐
|
marc |
api.issu.com provides issue lookup 49.02 -> document ID 141007194746-7819f25a55f483150eb9462df2f4be34 |
20:22
๐
|
marc |
"publicationId": "7819f25a55f483150eb9462df2f4be34", |
20:22
๐
|
marc |
"title": "San Francisco Bay Guardian", |
20:22
๐
|
marc |
oh hello |
20:22
๐
|
marc |
"orgDocName": "49.02.pdf", |
20:22
๐
|
marc |
"orgDocType": "pdf", |
20:22
๐
|
marc |
there's the orgdocname :) |
20:23
๐
|
joepie91 |
I doubt that'll work for documents where PDF download has been disabled |
20:24
๐
|
marc |
there's some s3 bucket http://document.issuu.com that i'm trying to scrape for 49.02.pdf |
20:24
๐
|
SketchCow |
I'm doing a jpg test just to do it. |
20:24
๐
|
marc |
that's the one that feeds a document.xml -> the flash movie |
20:26
๐
|
marc |
looks like each page is its own .swf that's loaded |
20:27
๐
|
marc |
the only jpegs are the index page of covers |
20:27
๐
|
marc |
eg |
20:27
๐
|
marc |
http://page.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/swf/page_17.swf |
20:27
๐
|
marc |
another s3 bucket |
20:27
๐
|
marc |
no luck looking for pdf in there :) |
20:28
๐
|
xmc |
oh wait no, it wasn't issuu i played with, it was "etypeservices.com" |
20:28
๐
|
xmc |
sorry |
20:28
๐
|
SketchCow |
So, here's my take. |
20:29
๐
|
SketchCow |
However we come up with it, a pile of publications can be turned into a collection on archive.org. |
20:29
๐
|
SketchCow |
More resolution is better, but the JPG works now. |
20:29
๐
|
marc |
u mean screenshotting the swf pages of each issue? |
20:29
๐
|
SketchCow |
I am not averse to there being two collections, one with the JPG grab and then a better quality one. "Publisher's edition" |
20:29
๐
|
SketchCow |
No, I am able to pull the .JPGs directly. |
20:29
๐
|
marc |
oh nice |
20:29
๐
|
marc |
url |
20:29
๐
|
SketchCow |
view-source:http://issuu.com/sf.guardian/docs/49.02 |
20:29
๐
|
marc |
oh cool |
20:29
๐
|
SketchCow |
<link rel="image_src" href="http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg"> |
20:30
๐
|
marc |
just for those covers tho right? |
20:30
๐
|
marc |
oh great |
20:30
๐
|
marc |
k |
20:30
๐
|
SketchCow |
And then in increments for page_x.jpg |
20:30
๐
|
marc |
awesome |
20:30
๐
|
marc |
nice |
20:30
๐
|
marc |
yah didnt lookit that source was only sniffing api calls |
20:30
๐
|
SketchCow |
I just proof of concept pulled the pages of an issue. |
20:30
๐
|
marc |
nice1 |
20:30
๐
|
SketchCow |
https://archive.org/details/SFBG-10-08-2014 |
20:30
๐
|
SketchCow |
The archive will turn that into a readable document from the zip I uploaded. |
20:31
๐
|
marc |
i guess it makes sense to look at other issuu.com periodicals that make pdf download avail and see if same url scheme is avail |
20:31
๐
|
marc |
cool |
20:31
๐
|
marc |
def better than nothing |
20:31
๐
|
SketchCow |
I agree the party shouldn't stop there. |
20:32
๐
|
SketchCow |
I do think it would be good to get a list of all the issues on Issuu, get those document Ids, and have that list. Shared wiki or google doc? |
20:32
๐
|
marc |
sf alternative weekly archive going back to 2011 |
20:32
๐
|
marc |
also they have an ios app |
20:32
๐
|
marc |
so they definitely have pdf download avail in ios |
20:32
๐
|
marc |
since there's no swf support |
20:32
๐
|
marc |
i can reverse that ios app |
20:33
๐
|
marc |
wiki or google doc either way |
20:33
๐
|
SketchCow |
Does it just go back to 2011? |
20:34
๐
|
marc |
the issuu web archive does yeah |
20:34
๐
|
marc |
:( |
20:34
๐
|
marc |
com.issuu.app is the ios app |
20:35
๐
|
marc |
"192 publications" it says in issuu.com header which gels with going back to January 2011 |
20:36
๐
|
marc |
cracking com.issuu.app and mitming the web requests to see if i can get link to pdfs |
20:37
๐
|
marc |
http://api.issuu.com/query?action=issuu.document.get_user_doc&format=json&documentUsername=sf.guardian&name=49.02 |
20:37
๐
|
marc |
that api.issuu.com rest endpoint for issue# -> Document id is |
20:37
๐
|
marc |
so |
20:38
๐
|
marc |
issues only go back to 45.11 |
20:39
๐
|
marc |
45.11->45.52, 46.01->46.52, 47.01->47.52, 48.02->48.52, 49.01->49.03 |
20:39
๐
|
marc |
are the 192 back issues that issuu.com has |
20:45
๐
|
marc |
okay decrypted+cracked com.issuu.app loading into IDAPro to look for pdf endpoint |
20:45
๐
|
xmc |
that was fast |
20:46
๐
|
marc |
yup they use pdf rendering yeehaw |
20:46
๐
|
marc |
http://publish.issuu.com/getPdfPageSplitter |
20:46
๐
|
marc |
that looks interesting |
20:47
๐
|
marc |
ah more important |
20:47
๐
|
marc |
http://publication.issuu.com/??/??/ios_1.json |
20:47
๐
|
marc |
maybe links to pdfs |
20:47
๐
|
marc |
that's ios-specific API endpoint that probably feeds pdfs |
20:48
๐
|
marc |
http://image.issuu.com/%@/jpg/page_%ld%@.jpg |
20:48
๐
|
marc |
that's the endpoint jason discovered |
20:49
๐
|
marc |
oh and they have an undocument test environment api here: http://api-uat.issuu.com okay gunna lookit that pdf splitter thing |
20:50
๐
|
marc |
(%@ is objective-c format printf format shite) |
20:54
๐
|
marc |
okay investigating on device (ios app using burp proxy) |
21:08
๐
|
marc |
okay looks like |
21:09
๐
|
marc |
formati s http://publication.issuu.com/sf.guardian/49.02/ios_1.json |
21:09
๐
|
marc |
http://publication.issuu.com/sf.guardian/49.02/ios_1.json |
21:09
๐
|
marc |
{"pdf_pages_available": true, |
21:09
๐
|
marc |
http://page-pdf.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/8.pdf |
21:09
๐
|
marc |
so every page is a pdf |
21:10
๐
|
marc |
ios API revealed that the pdf link s http://page-pdf.issuu.com/$DOCUMENTID/1.pdf etc etc |
21:10
๐
|
marc |
er |
21:10
๐
|
marc |
2.pdf |
21:10
๐
|
marc |
starting at 2 because covers arent 1.pdf (and 404) |
21:11
๐
|
marc |
cover are still jpegs, and have null pdfURLs |
21:11
๐
|
marc |
http://image.issuu.com/141007194746-7819f25a55f483150eb9462df2f4be34/jpg/page_1.jpg |
21:12
๐
|
marc |
@jason do u hav a wiki to document this stuff or were u asking me if i could make one |
21:12
๐
|
marc |
pdfs have full text so that's dope |
21:13
๐
|
schbirid |
marc: archiveteam.org |
21:13
๐
|
marc |
k |
21:13
๐
|
marc |
thx |
21:13
๐
|
schbirid |
not sure if registration works though :s |
21:13
๐
|
marc |
do i just make a new page or what |
21:13
๐
|
marc |
cool what yr login ;P |
21:13
๐
|
xmc |
registration works if you use the secret code 'yahoosucks' |
21:14
๐
|
marc |
lol nice |
21:14
๐
|
marc |
thx |
21:14
๐
|
marc |
NUP: YAHOOSUCKS |
21:14
๐
|
xmc |
AOL Keyword YAHOOSUCKS |
21:14
๐
|
schbirid |
just do http://archiveteam.org/index.php?title=Issuu i'd say |
21:15
๐
|
marc |
yah |
21:15
๐
|
marc |
neg issue, sfbay guardian |
21:15
๐
|
marc |
but it can be repurposed for issuu |
21:20
๐
|
marc |
http://archiveteam.org/index.php?title=SF_Bay_Guardian |
21:20
๐
|
marc |
okay |
21:20
๐
|
marc |
updated with issuu.com pdf urls |
21:21
๐
|
marc |
pardon my shit formatting |
21:23
๐
|
schbirid |
nice |
21:25
๐
|
schbirid |
now go grab them all! |
21:26
๐
|
marc |
okay fixed formatting |
21:26
๐
|
marc |
@jason am i done yet? :) |
21:28
๐
|
SketchCow |
Like, can you move on? |
21:28
๐
|
SketchCow |
Yes, you can move on. |
21:28
๐
|
SketchCow |
192 issues is better than nothing! |
21:29
๐
|
marc |
okay done added that document name -> document ID endpoint |
21:29
๐
|
marc |
thx duders!!! u da best :) |
21:29
๐
|
SketchCow |
No problem. |
21:29
๐
|
SketchCow |
Obviously more should be tracked down. |
21:29
๐
|
marc |
hope the issuu.com stuff is reusable |
21:30
๐
|
marc |
Issuu.com uses a Flash viewer to stream JPEGS, but the iPhone App has access to the full pdfs |
21:30
๐
|
SketchCow |
I'd suggest you use your skills to go after local SF reporters and poeple you could get stuff for. |
21:30
๐
|
marc |
god bless steve :) |
21:30
๐
|
SketchCow |
Archive.org will host anything, obviously. |
21:30
๐
|
marc |
yah i've been mailing and calling people thx |
21:30
๐
|
marc |
reporters said all the docs are owned by SF MEdia Co (the new owners) |
21:31
๐
|
marc |
they seem to be locked out of that host, I might try and tell some more interested parties about the vulnerable version of apache running on rss.sfbg.com |
21:33
๐
|
marc |
they might try to buy the paper but that sounds unrealistic |
21:34
๐
|
joepie91 |
ayt |
21:34
๐
|
joepie91 |
er |
21:34
๐
|
joepie91 |
anything.sfbg.com will return the 'closed down' page |
21:34
๐
|
joepie91 |
no matter what prefixes it |
21:34
๐
|
joepie91 |
(unless a specific host is specified, probably) |
21:34
๐
|
joepie91 |
which makes me believe the content may still be there |
21:36
๐
|
joepie91 |
[23:34] <joepie91> anything.sfbg.com will return the 'closed down' page |
21:36
๐
|
joepie91 |
[23:34] <joepie91> no matter what prefixes it |
21:36
๐
|
joepie91 |
[23:34] <joepie91> (unless a specific host is specified, probably) |
21:36
๐
|
joepie91 |
[23:34] <joepie91> which makes me believe the content may still be there |
21:36
๐
|
joepie91 |
but just hidden by a catch-all HTTPd config block |
21:40
๐
|
marc |
ah thx |
21:40
๐
|
marc |
good call |
21:40
๐
|
marc |
j'agree |
21:40
๐
|
marc |
also what is a joe pie it sounds delicious |
21:41
๐
|
marc |
okay gotta bounce tx again jason et al |
21:41
๐
|
marc |
if yall ever need any ios reversing help u know where to find me! har |
21:47
๐
|
xmc |
hehe |
21:48
๐
|
godane |
issue pdf link: http://s3.amazonaws.com/document.issuu.com/141014184412-c62aa490968318abf1d4684f9ec4f158/original.file?AWSAccessKeyId=AKIAJY7E3JMLFKPAGP7A&Expires=1413326827&Signature=m6kKoMNHg6sFa1J4nIWZvK23RAs%3D |
21:51
๐
|
xmc |
wooooo |
21:51
๐
|
xmc |
hell of a link |
21:52
๐
|
marc |
origina lfile nice |
21:54
๐
|
marc |
not sure how to generate that signature |
21:54
๐
|
marc |
eg change document url |
21:54
๐
|
xmc |
how'd you get the link, godane? |
21:56
๐
|
godane |
i login using facebook |
22:20
๐
|
godane |
no download link on this issue: http://issuu.com/sf.guardian/docs/48.15 |
22:20
๐
|
xmc |
godane is archiveteam's secret weapon against "where is the best quality file" |
22:22
๐
|
godane |
so looks like it will be comic book files from January 2014 and earlier |
22:23
๐
|
godane |
ok 48.07 have download link |
22:26
๐
|
aaaaaaaaa |
I'm still not convinced that godane isn't an archiving robot from the future sent back through time to prevent some horrible calamity caused by not having enough old data. |
22:31
๐
|
xmc |
i'm ok with this |
22:36
๐
|
joepie91 |
hahaha |
22:36
๐
|
joepie91 |
as am I |
22:36
๐
|
godane |
btw i think like a time traveler |
22:38
๐
|
godane |
anyways i think issue 48.15 to 48.10 don't have download links |
22:39
๐
|
godane |
48.06 doesn't have a download link either |
22:41
๐
|
godane |
so 48.06 to 48.01. don't have download links |
22:42
๐
|
godane |
i'm going to upload these pdfs as a zip archive |
22:42
๐
|
godane |
only cause it will be incomplete and to save time on my end |
22:48
๐
|
godane |
looks like i may got them all now |