[00:05] *** primus104 has quit IRC (Leaving.) [01:15] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [01:16] *** Ymgve has joined #archiveteam [01:17] *** d6e has left :3 [01:30] *** Jonimus has joined #archiveteam [01:42] *** schbirid2 has joined #archiveteam [01:45] *** schbirid has quit IRC (Read error: Operation timed out) [01:46] *** Emcy has quit IRC (Ping timeout: 306 seconds) [01:54] *** aaaaaaaaa has joined #archiveteam [02:03] *** mistym has quit IRC (Remote host closed the connection) [02:04] *** mistym has joined #archiveteam [02:04] *** Ymgve has quit IRC () [02:16] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [02:16] *** Coderjoe has joined #archiveteam [02:24] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [02:25] *** aaaaaaaaa has joined #archiveteam [02:28] *** bzc6p__ has joined #archiveteam [02:32] *** bzc6p_ has quit IRC (Read error: Operation timed out) [02:35] *** mistym has quit IRC (Remote host closed the connection) [02:45] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [02:46] *** aaaaaaaaa has joined #archiveteam [02:47] *** BlueMaxim has joined #archiveteam [03:00] *** JesseW has joined #archiveteam [03:20] *** mistym has joined #archiveteam [04:03] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [04:04] *** aaaaaaaaa has joined #archiveteam [04:17] *** mr_rippit has joined #archiveteam [04:17] *** ripvanwin has quit IRC (Read error: Connection reset by peer) [04:22] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [04:26] *** mistym has quit IRC (Remote host closed the connection) [04:40] *** Emcy has joined #archiveteam [04:59] *** mistym has joined #archiveteam [05:43] *** JesseW has quit IRC (Quit: Leaving.) [05:45] *** godane has quit IRC (Read error: Operation timed out) [05:47] *** JesseW has joined #archiveteam [06:08] *** godane has joined #archiveteam [06:51] *** mistym has quit IRC (Remote host closed the connection) [07:11] *** JesseW has quit IRC (Quit: Leaving.) [07:20] *** bzc6p__ is now known as bzc6p [07:29] *** primus104 has joined #archiveteam [07:31] *** rolf has joined #archiveteam [07:33] *** Laverne has quit IRC (Read error: Operation timed out) [07:51] *** mistym has joined #archiveteam [08:03] *** mistym has quit IRC (Read error: Operation timed out) [08:53] *** mistym has joined #archiveteam [09:01] *** mistym has quit IRC (Ping timeout: 483 seconds) [09:25] *** rolf has quit IRC (Leaving...) [09:28] *** rolf has joined #archiveteam [09:32] *** primus104 has quit IRC (Leaving.) [09:33] *** rolf has quit IRC (Leaving...) [10:05] *** SadDM has quit IRC (Ping timeout: 370 seconds) [10:17] *** rolf has joined #archiveteam [10:17] *** rolf has quit IRC (Client Quit) [10:51] *** sirdancea has quit IRC (Remote host closed the connection) [10:55] *** Ymgve has joined #archiveteam [11:00] *** sirdancea has joined #archiveteam [12:18] *** primus104 has joined #archiveteam [12:19] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [12:24] *** Morbus has joined #archiveteam [12:24] *** p9ne has joined #archiveteam [12:45] *** p9ne has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [12:55] *** primus104 has quit IRC (Leaving.) [13:00] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [13:30] *** McGEE has quit IRC (Quit: Connection closed for inactivity) [13:31] *** sankin has joined #archiveteam [13:37] *** p9ne has joined #archiveteam [13:45] *** vOYtEC has quit IRC (Ping timeout: 362 seconds) [13:57] *** primus104 has joined #archiveteam [14:32] *** mistym has joined #archiveteam [14:36] *** bzc6p_ has joined #archiveteam [14:40] *** bzc6p has quit IRC (Read error: Operation timed out) [14:40] *** mistym has quit IRC (Remote host closed the connection) [14:58] *** JesseW has joined #archiveteam [15:09] *** mistym has joined #archiveteam [15:27] *** JesseW has quit IRC (Quit: Leaving.) [15:40] *** sankin has quit IRC (Leaving.) [15:46] *** mistym has quit IRC (Remote host closed the connection) [16:03] *** mistym has joined #archiveteam [16:11] *** sirdancea has quit IRC (Read error: Operation timed out) [16:21] *** lytv has quit IRC (Read error: Connection reset by peer) [16:30] *** nox has quit IRC () [16:32] *** lexicon has joined #archiveteam [16:43] *** lytv has joined #archiveteam [16:47] *** aaaaaaaaa has joined #archiveteam [16:47] *** philpem has joined #archiveteam [16:48] *** p9ne has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [16:57] *** Mayonaise has quit IRC (Read error: Operation timed out) [16:58] *** bzc6p_ is now known as bzc6p [16:58] *** joepie91_ has quit IRC (Read error: Operation timed out) [16:58] *** tephra has quit IRC (Read error: Operation timed out) [16:58] *** closure has quit IRC (Read error: Operation timed out) [16:59] *** joepie91 has joined #archiveteam [17:03] *** closure has joined #archiveteam [17:03] *** Mayonaise has joined #archiveteam [17:06] *** nox has joined #archiveteam [17:07] *** aaaaaaaa_ has joined #archiveteam [17:09] *** tephra has joined #archiveteam [17:13] *** aaaaaaaaa has quit IRC (Ping timeout: 370 seconds) [17:14] *** aaaaaaaa_ is now known as aaaaaaaaa [17:24] *** mistym has quit IRC (Remote host closed the connection) [17:27] *** mistym has joined #archiveteam [17:30] *** mistym has quit IRC (Remote host closed the connection) [17:46] *** primus104 has quit IRC (Leaving.) [17:46] *** McGEE has joined #archiveteam [17:50] *** mistym has joined #archiveteam [17:51] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [18:22] *** sirdancea has joined #archiveteam [18:56] *** mistym has quit IRC (Remote host closed the connection) [19:01] *** mistym has joined #archiveteam [19:02] *** db48x has joined #archiveteam [19:21] *** mistym has quit IRC (Remote host closed the connection) [19:22] *** mistym has joined #archiveteam [19:31] *** SimpBrain has joined #archiveteam [19:37] *** neku has joined #archiveteam [19:37] Hi [19:38] hey [19:38] hello [19:39] Was told by a friend (wub) to go here, I run Pomf.se and would need to archive it. [19:40] *** deafnet has joined #archiveteam [19:41] neku: it's basically a file sharing service, right? [19:41] bzc6p, correct [19:42] At first glance, I guess it doesn't have an index of uploaded files. [19:42] possible to get a list of all URLs/files? [19:42] Does it? [19:42] It does not have a index, however files are public. [19:43] Hey. You said you run it. [19:43] I do.. [19:44] Erm... so you know the structure the most. [19:45] As said above, it does not have a index list, however files are not private as anyone can view them once they get a hold of the link. [19:45] judging by the fact that each file seems to keep its original extension, bruteforcing is out of the question [19:45] I think what he was asking is whether you have or can generate a list of all the file links. [19:45] archival will require some sort of 'inside knowledge' and a list of files [19:46] neku: I'd be surprised if a site admin couldn't provide a list of stored files. [19:47] Well of course I can, it's all in a database, however my question is how would I archive this in the best way. [19:47] *** WubTheCap has joined #archiveteam [19:47] neku: the basic way to archive a site is in a warc file. This records both the request to the website and the response. [19:47] wubbois [19:48] So if it didn't became clear yet, it seems Pomf.se has to shutdown soon due to lack of money and CDN issues [19:48] It's like what happened to MediaCrush, which was archived [19:48] http://archiveteam.org/index.php?title=MediaCrush [19:48] neku: split up the 4tb into zips or something [19:48] How big is this, approximately? [19:48] 1.6 million files, 4 TB [19:48] there are tools that can do that, like wpull and wget. The most basic method is to prepare a list of urls and feed it to a warc aware tool. [19:48] neku: what's the total size? 4TB (assuming raid1) [19:49] oh, there we go [19:49] too big for archivebot, possibly a warrior project? (cc arkiver) [19:49] I guess neku can make an index file from the database' filenames [19:49] it took us 8 mins to get to this point [19:50] apparently, I'm a autist neckbeard [19:50] WubTheCap, could work I guess [19:50] *** McGEE has quit IRC (Quit: Connection closed for inactivity) [19:51] There's some 404s in that database though, because of removed malware files and keeping a database record to prevent uploading them again [19:51] those can be recorded in the warc too. [19:51] But, would it still be a better idea to contact info@archive.org for this? [19:52] aaaaaaaaa: +1 [19:52] (autist neckbeard) [19:52] *** mistym has quit IRC (Remote host closed the connection) [19:53] WubTheCap: are you concerned about the space? [19:53] aaaaaaaaa: It was SketchCow's recommendation yesterday [19:53] Well, content should be warced because original URLs should be available through wayback machine [19:53] Yeah that's true [19:54] Also I don't know why but Wayback Machine doesn't like manually inputted a.pomf.se URLs [19:54] ArchiveTeam uses 50GB warc pieces, that would result in 80 files. Neither size nor num of files is extremely large I think. However, SketchCow was who suggested contacting IA. [19:54] WARCs are a different thing though [19:54] I think the concern may be the cost. I think 4 TB is $8000 in costs for them [19:54] WubTheCap: http://a.pomf.se/robots.txt [19:54] that's why [19:54] I have no idea about file consensus on Pomf.se, but I assume most of them are under 2 MB screenshots or maybe under 10 MB WebM videos [19:55] Kazzy: ia_archiver though, also it used to allow everything [19:55] aaaaaaaaa: This is archiveteam. I don't remember IA rejected 4 TB. [19:56] I don't even think IA counts the terabytes we upload. It's "nothing" on this scale. [19:56] 4tb is awkwardly large to put in a single item [19:56] MediaCrush was split into some 60 chunks I think [19:56] an IA item, afaik, lives as one thing on a computer [19:56] I though it should be done like AT does: 50GB warcs per item [19:56] then go to a collection [19:56] so 4TB would kind of monopolize a disk [19:56] e.g. https://archive.org/details/mediacrush_coldstorage_part_1 [19:57] so split it up for making IA's life easier [19:57] I think we should make this a warrior project [19:57] 4TB is fine [19:58] If WubTheCap, neku don't have the space to do the scraping alone, it's obviously a Warrior project [19:58] bzc6p: There's 8 TB storage on Pomf's colocated server [19:59] 4 TB in use [19:59] I'd rather like it to be a warrior project then a .tar packup project [19:59] if they do, we can give clear instructions on how to do [19:59] ok [19:59] 4x4 TB actually [19:59] how? [19:59] 2x4 TB RAID1 | 2x4 TB RAID1 [19:59] ok [20:00] neku: are you able to provide us with a list of all files? [20:00] we'll take care of the rest then [20:00] Sorry for interrupting gentlemen, but I think it's time to go to a new channel [20:00] arkiver: Uploads are still enabled though, do you want to wait for those to be disabled? [20:00] suggestion #pomfret [20:00] a moving target is harder to hit, WubTheCap [20:00] Yes, we should do that [20:01] waiting until I disable uploading would probably be good [20:01] neku: ok, we'll do that [20:01] can you then provide us with a list of the files? [20:02] (after uploading is disabled) [20:02] At that point I will provide a list of all files [20:03] suggesting movement to #pomfret, to keep this channel clean ^^ [20:03] neku: ok, thank you! [20:04] Right [20:13] *** mistym has joined #archiveteam [20:13] *** deafnet has left [20:26] *** Jonimus has joined #archiveteam [20:37] *** philpem has quit IRC (Remote host closed the connection) [20:38] *** rolfb has joined #archiveteam [20:48] *** mistym has quit IRC (Remote host closed the connection) [20:48] *** rolfb has quit IRC (Leaving...) [20:56] *** rolfb has joined #archiveteam [20:57] *** rolfb has quit IRC (Linkinus - http://linkinus.com) [21:06] *** mistym has joined #archiveteam [21:25] http://archiveteam.org/index.php?title=MediaCrush could use a link tot he archive [22:07] *** sirdancea has quit IRC (Read error: Operation timed out) [22:12] *** SimpBrain has quit IRC (Quit: Leaving) [23:24] *** neku has quit IRC (Quit: Leaving) [23:25] so now that apple music's been introduced, looks like beats music doesn't have much longer to live [23:25] *** Ymgve has quit IRC () [23:26] http://www.beatsmusic.com/robots.txt [23:26] heh [23:27] most of the content appears to be on on.beatmusic.com [23:27] https://encrypted.google.com/search?q=site%3Aon.beatsmusic.com [23:48] *** primus104 has joined #archiveteam