[00:06] http://tracker.archiveteam.org/gamefrontforums/ is now active! [00:43] *** BlueMaxim has quit IRC (Read error: Operation timed out) [00:44] *** WinterFox has joined #archiveteam [00:46] *** sigkell has quit IRC (Ping timeout: 260 seconds) [00:48] *** philpem has quit IRC (Ping timeout: 260 seconds) [00:57] *** sigkell has joined #archiveteam [01:17] *** JesseW has joined #archiveteam [01:43] *** Honno has quit IRC (Read error: Operation timed out) [01:53] *** xmc has quit IRC (Read error: Operation timed out) [01:53] *** mismatch has joined #archiveteam [01:54] *** Fletcher has quit IRC (Read error: Operation timed out) [01:54] *** mismatch_ has quit IRC (Read error: Operation timed out) [01:54] *** Famicoma1 has quit IRC (Read error: Operation timed out) [01:55] *** xmc has joined #archiveteam [01:55] *** swebb sets mode: +o xmc [01:55] *** robink has quit IRC (Read error: Connection reset by peer) [01:56] *** Fletcher has joined #archiveteam [01:59] *** robink has joined #archiveteam [02:03] arkiver: In the gamefrontforums grab did you mean to include the geo-IP block check for downloading GameFront files? [02:03] I don't think the forums have the same block [02:04] *** dashcloud has quit IRC (Read error: Operation timed out) [02:08] *** JesseW has quit IRC (Ping timeout: 370 seconds) [02:13] *** dashcloud has joined #archiveteam [02:48] *** Famicoma1 has joined #archiveteam [03:55] *** JesseW has joined #archiveteam [03:59] arkiver: since I'm not banned from gamefront, shall I stay with that one, rather than switching over to forums? [04:28] up to 6 concurrency on gamefront [04:47] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:53] *** Sk1d has joined #archiveteam [05:24] *** metalcamp has joined #archiveteam [05:39] *** Honno has joined #archiveteam [05:43] *** BlueMaxim has joined #archiveteam [06:21] *** signius has joined #archiveteam [06:22] *** mismatch has quit IRC (Ping timeout: 633 seconds) [06:33] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [07:24] *** Honno has quit IRC (Read error: Operation timed out) [07:30] *** schbirid has joined #archiveteam [07:35] *** morbus_ has joined #archiveteam [07:41] *** Morbus has quit IRC (Read error: Operation timed out) [07:57] *** JesseW has quit IRC (Ping timeout: 370 seconds) [08:01] *** redlob has quit IRC (Read error: Operation timed out) [08:02] *** redlob has joined #archiveteam [08:09] *** metalcamp has joined #archiveteam [08:37] *** Wuked has joined #archiveteam [09:14] *** Smiley has joined #archiveteam [09:25] *** atomotic has joined #archiveteam [09:34] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [09:35] *** bwn has quit IRC (Ping timeout: 492 seconds) [09:45] *** metalcamp has joined #archiveteam [09:50] *** bwn has joined #archiveteam [10:15] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [10:25] *** Wuked has quit IRC (My Mac has gone to sleep. ZZZzzz…) [10:35] *** metalcamp has joined #archiveteam [10:35] *** Wuked has joined #archiveteam [10:41] *** Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [10:53] *** Wuked has joined #archiveteam [11:07] *** Wuked has quit IRC (My Mac has gone to sleep. ZZZzzz…) [11:26] *** Lord_Nigh has quit IRC (Ping timeout: 244 seconds) [11:29] *** Crocatowa has quit IRC (Read error: Operation timed out) [11:30] *** Crocatowa has joined #archiveteam [11:35] *** Medowar has joined #archiveteam [11:37] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:41] *** vitzli has joined #archiveteam [11:45] *** Lord_Nigh has joined #archiveteam [11:54] *** kris33 has joined #archiveteam [12:02] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [12:25] *** RichardG has quit IRC (Ping timeout: 260 seconds) [12:25] *** atomotic has joined #archiveteam [12:32] *** RichardG has joined #archiveteam [12:45] *** Honno has joined #archiveteam [12:47] *** Wuked has joined #archiveteam [12:59] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [13:06] *** Honno has quit IRC (Ping timeout: 1208 seconds) [13:23] *** suggestio has joined #archiveteam [13:26] Image hosting site run by ThePirateBay crew has been temporarily revived after sudden shutdown in 2014. Old images now accessible again. http://bayimg.com/ [13:28] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:33] suggestio: Know anything about how their image URLs are generated? [13:33] *** suggestio has quit IRC (Ping timeout: 268 seconds) [13:53] *** VADemon has joined #archiveteam [13:53] *** scyther has joined #archiveteam [13:53] *** scyther has quit IRC (Connection closed) [14:26] *** metalcamp has joined #archiveteam [14:43] phuzion: potential vector for mapping stuff out: http://bayimg.com/album/ [14:43] it seems to try to load everything [14:44] I can't imagine that it's more than a few TB, think it might be worth trying to archive? [14:44] everything lives on image.bayimg.com [14:44] seemingly using hashes of files [14:44] http://image.bayimg.com/d3099f010b848bd079b53d0c985e409f67914928.jpg [14:44] gross [14:44] the 'view' pages are easier [14:44] http://bayimg.com/PaiLPAAgH [14:44] lol Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) in /var/data/bayimg.com/www/ajax_album.php on line 70 [14:44] album sample: http://bayimg.com/album/MAANGaaaC [14:45] phuzion: yes, I think it tries to list everything [14:45] note the ajax_ prefix [14:45] yeah [14:45] I think it's a sort of API as well [14:45] *** atomotic has quit IRC (Ping timeout: 260 seconds) [14:45] might be able to enumerate it with certain params [14:45] ah, hold on [14:46] http://bayimg.com/album/- also fails [14:47] it seems to always fail.. [14:48] strange... [14:50] Google knows a lot of them anyway [14:51] interesting [14:51] phuzion: the image IDs are NOT randomly generated [14:52] phuzion: first two pages of Google: https://gist.github.com/joepie91/ac014769a3446e074d62c2792b9c05b2 [14:52] entirely too consistent [14:52] always a lowercase or uppercase A in the 2nd 6th position [14:53] 2nd and 6th* [14:53] various other apparent patterns [14:53] always a lowercase or uppercase A in the 7th posuition to, it seems [14:53] position too* [14:54] bing results: http://paste.nerds.io/edixocitog.txt [14:55] yeah, very not random [14:55] lol [14:58] PurpleSym: phuzion: combined and sorted: http://sprunge.us/ceTE [14:58] time to find patterns :P [15:04] *** Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [15:06] Ids seem to be case-insensitive. [15:06] whoa, fucking seriously? [15:07] wow [15:07] that is great [15:10] PurpleSym: updated list: http://sprunge.us/UiPM [15:10] seems the 8th position is just a-f? [15:12] Could be a coincidence. The sample size is quite small. [15:12] seems unlikely [15:12] PurpleSym: can you get a larger list? [15:12] via Google or w/e [15:13] I could try the common crawl index. [15:14] please do :P [15:15] What's the status of maxfile.ro? Has anyone more detailed information? [15:16] *** atomotic has joined #archiveteam [15:16] PurpleSym: I've grabbed yandex, duckduckgo and bing for maxfile.ro. Turns out 50 of your results were still unique to mine [15:17] Better get them all, VADemon. [15:17] joepie91: Nothing in the Common Crawl Index, as far as I see. [15:17] *** atomotic has quit IRC (Client Quit) [15:17] PurpleSym: try Google then? [15:17] *** atomotic has joined #archiveteam [15:18] I don’t have scripts for that. [15:18] PurpleSym: if you use Chrome, "Link Grabber" is greatly useful for this [15:18] lets you ignore internal stuff [15:19] and only extract actual search results [15:19] makes it a semi-automated process [15:22] Well, I don’t. [15:31] *** Rotab has joined #archiveteam [15:31] PurpleSym: char frequency counts: http://storage2.static.itmages.com/i/16/0426/h_1461684754_5932817_5afad23c2c.png [15:32] *** Wuked has joined #archiveteam [15:34] http://bayimg.com/fajkkaadd and http://bayimg.com/fajkkaaddd are the same image [15:34] so it ignores everything beyond 8 chars [15:34] also doesn't seem to go beyond P anywhere [15:34] are you murdering gamefront (filefront?) forums? [15:35] so... [15:35] * joepie91 makes permutation calc [15:35] Rotab: we did start archiving them last night, so probably [15:35] Ping arkiver ^^^^ [15:35] phuzion: PurpleSym: 6291456 permutations [15:35] I think we can pull that off [15:35] (for bayimg) [15:36] 6.2m, that's not bad [15:36] Sure, that’s not too bad. [15:36] i cant even join in on gamefront, the ip check fails :S [15:37] joepie91: Wrt frequency counts: Could be an increasing 32 bit counter with nibbles shuffled around. [15:37] *shifted [15:37] Rotab: doesn't seem to be caused by us, the file downloading from their servers still works for me [15:38] The issue is not the file hosting, it's the forums [15:38] I can access them but it takes about 10 seconds for each page to load [15:38] *** luckcolor has joined #archiveteam [15:38] Hello [15:39] Hello [15:39] *** WinterFox has quit IRC (Remote host closed the connection) [15:39] yeah, it is very slow [15:39] PurpleSym: that goes beyond my capabilities :) [15:39] the distribution is a bit odd but I think we can just treat it as randomized [15:39] with the given ranges [15:39] although the forumgrab needlessly checks if you can download files [15:39] and have it be Good Enough [15:39] Yeah, I mentioned that last night but arkiver is AFK [15:40] Yeah, that should work fine, joepie91. [15:44] *** luckcolor has quit IRC (Quit: Page closed) [15:48] *** JesseW has joined #archiveteam [15:49] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [15:53] *** atomotic has joined #archiveteam [15:56] *** atomotic has quit IRC (Client Quit) [15:58] - Started grabbing maxfile.ro, with 660 links from search engines - [15:59] joepie91: Three pictures I just uploaded in this order: http://bayimg.com/aAimFAAgH http://bayimg.com/aaiMgaagH http://bayimg.com/aAiMHaagh [16:02] incremental? excellent :) [16:03] huh [16:03] that's odd [16:03] those are 9 chars [16:19] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:40] hi [16:40] I limited filefront [16:40] What's up with bayimg and maxfile.ro? [16:43] its come back up for a bit [16:44] arkiver: Started mirroring with 660 links off search engines [16:44] Ah yeah, it was shutting down [16:44] Great! [16:44] *** vitzli has quit IRC (Quit: Leaving) [16:48] arkiver: can you write something for bayimg given the above information? [16:48] may be a warrior project thing [16:48] sure [16:48] I'm not sure about the 9th char though [16:48] what is wrong with the website [16:48] I haven't read it all [16:48] it ignores everything after the 8th char but we did get a 9th char [16:48] arkiver: https://torrentfreak.com/pirate-bays-image-hosting-site-bayimg-returns-for-a-bit-160425/ [16:48] "The site will remain online for a week or so. This allows people to secure their files, if needed, but in a few days the site will close its doors again. Apparently, the TPB team prefers to focus exclusively on the torrent site." [16:49] that sounds like an invitation ;) [16:51] so it seems like they've simply shifted it by an A [16:51] for the newer uploads [16:52] maybe they ran out of keyspace? [16:52] hm, maybe not [16:52] wow, nevermind [16:52] I'm blind [16:52] it has always been 9 chars, not 8 [16:52] ignore everything I just said [16:52] kik [16:53] lol* [16:57] I'll have a look at the site this evening [17:08] *** Honno has joined #archiveteam [17:12] *** philpem has joined #archiveteam [17:16] arkiver: alright. [17:16] arkiver: the essential information: [17:16] 1) char frequency information: http://storage2.static.itmages.com/i/16/0426/h_1461684754_5932817_5afad23c2c.png [17:16] 2) everything after 9 chars is ignored (so you can ignore `position 9` there) [17:16] 3) image IDs are case-insensitive [17:17] 4) album URLs are linked from the pages of the images that belong to them, so you can discover albums just by scraping the "Album" buttons (eg. http://bayimg.com/PaiLPAAgH ) [17:18] 5) nothing ever goes beyond P in the image IDs [17:19] arkiver: For the FileFront Forums did you mean to include the GameFront geo-IP block check in the scripts? I don't think the forums have any geoblocking [17:23] arkiver: oh and 6) it's gone in a week :) [17:24] okay, so some quick calculation work [17:24] 6 million permutations and change [17:24] in a week's time [17:25] say 1 million permutations a day [17:25] works out to ~12 requests per second [17:25] not sure they're going to be able to handle that [17:25] and that's just for the images, not the discovered albums [17:25] they're already slow, so chances are they will start complaining at us [18:19] *** Medowar has quit IRC (Quit: Connection closed for inactivity) [18:27] https://archive.org/details/roiocollection [18:37] *** hictooth has joined #archiveteam [18:42] *** hictooth has quit IRC (Quit: Bye!) [18:44] *** Peetz0r has quit IRC (Read error: Operation timed out) [18:48] *** hictooth has joined #archiveteam [19:03] *** BartoCH has joined #archiveteam [19:18] If gangsta art and music is your thing, you're in luck with https://archive.org/details/@sketch_the_cow?and[]=mediatype%3A%22audio%22&and[]=collection:audio [19:18] https://www.flickr.com/photos/textfiles/sets/72157594265759470 is getting all the papers I'm now scanning. [19:19] https://www.flickr.com/photos/textfiles/albums/72157663634874672 is getting all CD-ROM faces I'm scanning (ISOs will go up on hard drives mailed to IA) [19:22] I'm also describing Negativland items, but that's once every 5-8 hours, that's hardly worth noting. [19:27] *** bwn has quit IRC (Ping timeout: 246 seconds) [19:30] *** dashcloud has quit IRC (Read error: Operation timed out) [19:33] *** dashcloud has joined #archiveteam [19:56] *** sivoais has quit IRC (Read error: Operation timed out) [20:03] *** Wuked has quit IRC (Read error: Connection reset by peer) [20:03] *** bwn has joined #archiveteam [20:07] *** sivoais has joined #archiveteam [20:07] joepie91, bayimg: if my bruteforce string generator is correct and we exclude strings which has "a" in positions 2,6,7 then our search space is totalling at 1,048,600 URLs [20:08] *** Wuked has joined #archiveteam [20:09] arkiver: basically I've generated items for warrior here: https://github.com/VADemon/bayimg-brute/blob/40f3bfd130e3e405ec83dd874ee8990b9c0bc192/bayimg-portion-list.txt portion;;;;; [20:11] each individual string would be generated on the fly by lua and given to wget-lua, that's how I imagine this to work [20:11] can anyone recommend a feed reader that stores warc files of each post automatically? [20:13] *** Wuked has quit IRC (Read error: Connection reset by peer) [20:13] VADemon: huh. hold on. [20:14] *** Wuked has joined #archiveteam [20:14] VADemon: [20:14] > 16 * 1 * 16 * 16 * 16 * 1 * 1 * 6 * 16 [20:14] 6291456 [20:14] what am I missing? [20:18] *** Sanqui has quit IRC (Remote host closed the connection) [20:19] *** Sanqui has joined #archiveteam [20:31] *** Medowar has joined #archiveteam [20:37] *** Wuked_ has joined #archiveteam [20:37] *** Wuked has quit IRC (Read error: Connection reset by peer) [20:38] *** Ravenloft has joined #archiveteam [20:46] *** Wuked_ has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [20:51] *** schbirid has quit IRC (Quit: Leaving) [20:54] *** Lord_Nigh has quit IRC (Ping timeout: 250 seconds) [20:54] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [20:54] *** Lord_Nigh has joined #archiveteam [21:13] *** Emcy_ has joined #archiveteam [21:15] *** Emcy has quit IRC (Ping timeout: 246 seconds) [21:24] *** Emcy_ has quit IRC (Ping timeout: 246 seconds) [21:25] *** Emcy has joined #archiveteam [21:37] *** Emcy has quit IRC (Ping timeout: 370 seconds) [22:25] bayimg is not case sensitive. It seems to randomly use some case [22:25] *** Honno has quit IRC (Read error: Operation timed out) [22:29] we'll get it [22:30] they also have albums a tags [22:30] will have to add some discovery for that [22:30] Who has some rsync space for the discovery part? [22:33] arkiver: see above, albums can be derived from images [22:35] Yeah, http://bayimg.com/cAIMfAaGH the album button [22:52] will be 458752 items, 16 images/item [23:20] *** JW_work has quit IRC (Read error: Operation timed out) [23:29] *** JW_work has joined #archiveteam [23:29] ack [23:31] *** dashcloud has quit IRC (Read error: Operation timed out) [23:43] *** Ymgve__ has joined #archiveteam [23:46] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [23:47] *** dashcloud has joined #archiveteam [23:50] *** Ravenloft has quit IRC (Ping timeout: 260 seconds) [23:52] *** Rye has quit IRC (Ping timeout: 244 seconds) [23:52] *** Ravenloft has joined #archiveteam [23:58] *** Rye has joined #archiveteam