[00:04] *** VADemon_ has quit IRC (left4dead) [00:15] *** dxrt_ is now known as dxrt [01:24] *** antonizoo has quit IRC (Read error: Connection reset by peer) [01:24] *** antonizoo has joined #archiveteam-bs [01:41] *** godane1 has quit IRC (Quit: Leaving.) [02:04] *** w0rp_ has joined #archiveteam-bs [02:04] *** w0rp has quit IRC (Read error: Connection reset by peer) [02:04] *** w0rp_ is now known as w0rp [02:49] *** nightpool has joined #archiveteam-bs [03:17] ffs https://i.imgur.com/GYpCSXa.png [03:24] nice url [03:26] trying to choose a fs for newsfroups. suggestions? many small files, unless i configure the newsserver to clump them [03:26] workload: basically, a text nntp server with retention. like gmane. [03:41] supposedly reiserfs/reiser4 is pretty good at small files [03:41] yea but [03:41] but those run the unsupported risk [03:41] isn't it, like, unmaintained [03:41] reiser4 isn't [03:41] i remember using it and liking it [03:41] orly [03:42] oh, not mainline [03:42] bleh [03:42] right [03:42] I guess, like, do you have prior evidence that ext4 is gonna be a chokepoint [03:44] ext4 claims a max of 4 billion files in the fs. i have like 200k already, from brief testing [03:44] 4 billion messages is not a theoretical production ceiling for me [03:44] so my options right now are: [03:45] - ext4, have newsserver do clumping [03:45] - btrfs, file-per-message [03:46] newsserver clumping is to be avoided because it is old code, but most of its production usage is for workloads where dropping everything on the floor is ok if you don't do it often [03:46] ah [03:46] btrfs is also not very productiony? [03:46] yeah [03:46] well, I dunno [03:47] I've been told it is but I am loathe to give up my zfs pools [03:47] zfs might be an option if you're open to e.g. FreeBSD or Solaris [03:47] i'm normally a big fan of jfs but linux doesn't support using it with anything other than 4k blocks [03:47] which is going to give me, like, 40% space utilization tops [03:48] ugh [03:49] i may as well just go with btrfs i guess? [03:50] I guess, I don't know much about i [03:50] t [03:50] someone else here might [03:50] i will be shipping hourly diffs off-box, so stability isn't a *huge* problem, but i'd like to not have to deal with that ever [03:52] eh, btrfs + file-per-message is probably the current winner here [04:28] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:34] *** Sk1d has joined #archiveteam-bs [04:52] *** RichardG_ has joined #archiveteam-bs [04:53] *** RichardG has quit IRC (Ping timeout: 255 seconds) [05:29] *** nightpool has quit IRC (what the water wants is hurricanes) [05:35] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [05:48] *** Aranje has quit IRC (Quit: Three sheets to the wind) [06:54] *** GE has joined #archiveteam-bs [07:07] *** brayden has joined #archiveteam-bs [07:14] *** GE has quit IRC (Remote host closed the connection) [07:18] *** PurpleSym sets mode: +o swebb [07:18] *** swebb sets mode: +o brayden [07:18] *** swebb sets mode: +o xmc [08:06] *** ravetcofx has quit IRC (Read error: Operation timed out) [09:17] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [09:24] *** BartoCH has joined #archiveteam-bs [09:46] *** hawc145 is now known as HCross [10:30] *** BlueMaxim has quit IRC (Quit: Leaving) [10:37] *** godane has joined #archiveteam-bs [10:44] *** Fletcher has quit IRC (Ping timeout: 244 seconds) [11:15] *** Fletcher has joined #archiveteam-bs [11:38] *** VADemon has joined #archiveteam-bs [11:43] *** Smiley has quit IRC (Read error: Operation timed out) [11:45] *** Smiley has joined #archiveteam-bs [12:36] *** Kksmkrn has quit IRC (Ping timeout: 250 seconds) [13:32] *** kristian_ has joined #archiveteam-bs [13:55] *** nightpool has joined #archiveteam-bs [14:12] *** kristian_ has quit IRC (Read error: Operation timed out) [14:29] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [14:32] *** BartoCH has joined #archiveteam-bs [14:57] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [15:11] *** Petri152 has quit IRC (Read error: Operation timed out) [15:14] *** BartoCH has joined #archiveteam-bs [15:29] *** VADemon has quit IRC (Quit: left4dead) [15:31] *** dashcloud has quit IRC (Ping timeout: 244 seconds) [16:16] *** Petri152 has joined #archiveteam-bs [16:28] *** Simpbrain has joined #archiveteam-bs [17:00] *** Swizzle has joined #archiveteam-bs [17:07] *** GE has joined #archiveteam-bs [17:11] *** dashcloud has joined #archiveteam-bs [17:32] *** Swizzle has quit IRC (Read error: Operation timed out) [18:23] https://github.com/blog/2267-introducing-github-community-guidelines [18:23] "What behavior is not allowed on GitHub - the community will not tolerate threats of violence, hate speech, bullying, harassment, impersonation, invasions of privacy, sexually explicit content, or active malware." [18:24] looks like github is going to baleet content from projects that they don't like, regardless of the maintainers' wishes [18:52] Well, that's how rules work [18:53] Get out of the pool, regardless of your current pissing-in-it position, etc [18:53] *** Simpbrain has quit IRC (Read error: Connection reset by peer) [18:56] *** Simpbrain has joined #archiveteam-bs [18:56] all depends how they go about enforcing it [19:07] i was just thinking [19:07] so if WARC isn't the best for utterly massive sites [19:07] what would be best for, say, newegg? [19:08] WARC is fine [19:08] would be nice to just download their images > 50K or something [19:08] who told you that WARC can't handle that [19:08] their hardware shots are pretty good [19:09] i thought WARC wasn't good above 2-5m pages? [19:09] uh no? [19:09] ArchiveBot isn't WARC [19:09] ah [19:10] the limit on ArchiveBot has to do with its current pipeline model, it has nothing to do with the file format [19:10] i don't have the balls to suggest a project as big as newegg, tho @_@ [19:10] someone chimed in in a convo that they used to work for newegg [19:10] I don't see the point unless you think Newegg is somehow going under [19:11] i noted how their customer experience had gone down, but their item images were the best out there (for computer hardware) [19:11] well, they're now supposedly controlled/owned by a chinese company majority? [19:11] maybe like 51% stock ownership [19:12] https://www.google.com/search?q=newegg+chinese+owned [19:12] I got news for you [19:12] the CEO is Chinese [19:12] has been for a while [19:12] but ownership [19:13] control etc [19:13] CEO is/was chinese-american? or chinese? i thought the former [19:14] I mean, ok, so [19:14] what [19:15] are they suddenly going to start selling more stuff made in China [19:15] you know, like every bit of electronics is [19:18] i just have no idea what the end result of owners being outside the US would be for newegg's service [19:20] *** ravetcofx has joined #archiveteam-bs [19:21] *** Swizzle has joined #archiveteam-bs [19:28] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [19:38] *** kristian_ has joined #archiveteam-bs [19:45] *** BartoCH has joined #archiveteam-bs [20:16] *** Swizzle has quit IRC (Read error: Operation timed out) [21:47] *** dashcloud has quit IRC (Read error: Operation timed out) [22:02] *** Swizzle has joined #archiveteam-bs [22:12] i'm starting to upload 2004 nasa docs [22:34] *** nightpool has quit IRC (what the water wants is hurricanes) [22:45] the more I work on this CD-ROM parsing stuff [22:45] the more of a miracle it seems that any CD-ROM ever worked on any system at all, ever [22:45] every goddamn mastering tool seems to violate spec somehow, and all of them in a subtly different way [23:05] *** Swizzle has quit IRC (Read error: Operation timed out) [23:08] *** GE has quit IRC (Remote host closed the connection) [23:12] *** BlueMaxim has joined #archiveteam-bs [23:13] *** Arkiver2 is now known as arkiver [23:29] *** ndiddy has joined #archiveteam-bs