[00:07] looks like there was alot of double issues with infoworld 2003 issues [00:08] *** sigkell has joined #archiveteam-bs [00:08] *** sigkell has quit IRC (Connection closed) [00:08] *** sigkell has joined #archiveteam-bs [00:11] has in 2003-03-24 has 2003-03-31 in it [00:11] some thing happen with 2003-04-07 having 2003-04-14 in it [00:22] looks like i got all of infoworld 2003 issues [00:22] there are some pages missing in 2003-03-03 and 2003-03-10 [00:47] *** REiN^ has quit IRC (Read error: Operation timed out) [01:01] *** REiN^ has joined #archiveteam-bs [01:02] *** spiko has quit IRC (Read error: Operation timed out) [01:20] *** pizzaiolo has left [01:37] I'm finishing up archiving fanfiction.net ao3 and now fictionpress (cause it's way smaller than i thought it was) and i'm trying to make a sql db of the metadata for easy indexing of the millions of stories. https://uploadfiles.io/4dc111 I have these, i'd like munged into one giant csv like this http://paste.ubuntu.com/24037559/ ... the stupid thing is, i cant get the script i have to scan a whole directory structure ecursively. here's what i have so [02:27] *** vitzli has joined #archiveteam-bs [02:49] *** vitzli has quit IRC (Quit: Leaving) [03:04] *** icedice has quit IRC (Quit: Leaving) [03:49] *** BlueMaxim has joined #archiveteam-bs [04:29] *** Asparag-1 has joined #archiveteam-bs [04:31] *** BlueMaxim has quit IRC (Read error: Operation timed out) [04:34] *** Asparagir has quit IRC (Read error: Operation timed out) [05:18] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:21] *** Asparag-1 has quit IRC (Read error: Operation timed out) [05:25] *** Sk1d has joined #archiveteam-bs [05:33] *** dashcloud has quit IRC (Read error: Operation timed out) [05:37] *** dashcloud has joined #archiveteam-bs [05:39] *** ndizzle has quit IRC (Read error: Connection reset by peer) [05:46] *** wm_ has quit IRC (Ping timeout: 260 seconds) [05:50] *** Aoede has quit IRC (Ping timeout: 260 seconds) [05:51] *** Aoede has joined #archiveteam-bs [05:52] *** wm_ has joined #archiveteam-bs [06:12] *** vitzli has joined #archiveteam-bs [06:23] *** vitzli has quit IRC (Quit: Leaving) [06:31] *** BlueMaxim has joined #archiveteam-bs [06:40] Just in case people here haven't seen: SketchCow had a heart attack, is in the hospital in Melbourne, got a stent, and appears to be doing ok. https://twitter.com/textfiles/status/833928137243176960 [06:42] O_o [06:48] "Slight issue" [06:50] I hope he will be ok. otherwise archiving would get seriously impaired. Ans we would loose a nice guy, from what I saw in his talks. [06:59] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:27] *** anonymoos has quit IRC () [07:36] oh damn [07:38] geez [07:45] *raises glass* To SketchCow! [07:45] feel better dude [08:05] *** odemg has quit IRC (Remote host closed the connection) [08:33] *** zhongfu_ has joined #archiveteam-bs [08:33] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [08:59] *** GE has joined #archiveteam-bs [09:09] *** zhongfu_ has quit IRC (Ping timeout: 260 seconds) [09:10] *** zhongfu has joined #archiveteam-bs [09:31] *** tapedrive has joined #archiveteam-bs [09:32] *** whydomain has joined #archiveteam-bs [10:05] *** spiko has joined #archiveteam-bs [10:19] Damn... Get well soon SketchCow! [10:23] bsmith093: I scraped literotica.com a while back, I can send the files if you're interested [10:27] *** GE has quit IRC (Remote host closed the connection) [11:04] *** LastNinj_ has joined #archiveteam-bs [11:06] *** LastNinja has quit IRC (Ping timeout: 255 seconds) [11:25] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:45] No Wiki front page access. Someone with special powers, move imdb project to "recently finished". [12:08] *** ZexaronS has quit IRC (Leaving) [12:09] *** GE has joined #archiveteam-bs [13:12] *** JensRex has quit IRC (Remote host closed the connection) [13:12] *** JensRex has joined #archiveteam-bs [13:14] *** GE has quit IRC (Remote host closed the connection) [13:26] *** JensRex has quit IRC (Remote host closed the connection) [13:26] *** JensRex has joined #archiveteam-bs [13:43] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [14:56] *** GE has joined #archiveteam-bs [15:18] JensRex: We're not quite done with IMDB yet. We're still planning to scrape all their non-discussion information [15:40] isn't that available for download as flat files? [15:40] Some but not all. E.g. user reviews [15:41] Also none of the flat files have IMDB's ID numbers so they're useless if you're trying to make a mirror of the website [15:41] (Which is almost certainly by design) [15:41] might as well i guess [15:42] might be easier now that other people arent mirroring it [15:56] *** BartoCH has joined #archiveteam-bs [16:01] *** pizzaiolo has joined #archiveteam-bs [16:02] *** bwn has quit IRC (Ping timeout: 244 seconds) [16:12] *** bwn has joined #archiveteam-bs [16:18] *** Nemo_bis has left [16:31] i'm starting to upload metro korea seoul edition pdfs: https://archive.org/details/metro_korea_seoul_20140102 [16:31] this will help us get get issues past 2015-05-20 [16:32] since issuu doesn't have every issue [16:37] *** schbirid has joined #archiveteam-bs [16:48] *** vitzli has joined #archiveteam-bs [16:55] *** vitzli has quit IRC (Quit: Leaving) [17:00] MrRadar: Oh right. [17:08] *** Aranje has joined #archiveteam-bs [17:15] *** pizzaiolo has quit IRC (Ping timeout: 246 seconds) [17:15] Is there a way to get a list of domain names that are going to expire soon but haven't yet? Like ones that are still technically operational and can still be accessed. [17:16] hook54321: I think there's a way to scrape whois records for it [17:17] I know that domain parkers will find domains that have been previously registered but are now unregistered to find domains to park [17:17] (this happened to an old domain of mine I was no longer using--it wasn't publically linked, so there would be no way to find it except whois) [17:17] I don't know how to get access to those whois databases though [17:22] nightpool, google found this: http://stackoverflow.com/questions/307553/possible-to-download-entire-whois-database-list-of-registered-domains [17:30] *** pizzaiolo has joined #archiveteam-bs [17:32] *** pizzaiolo has quit IRC (Remote host closed the connection) [17:34] *** pizzaiolo has joined #archiveteam-bs [18:22] *** VADemon has joined #archiveteam-bs [18:27] *** pizzaiolo has quit IRC (Remote host closed the connection) [18:27] I should probably have let some of my domains expire. I have at least one 16 year old domain I've never done anything with. [18:28] But fuckup.dk is the only fuckup.* domain that isn't a pornsite. [18:28] Maybe I should use it for business email :D [18:29] heh [18:29] same here...if anyone wants ganja.is, do tell [18:29] no fee or anything, just pay the domain instead of me ;) [18:30] and for the lurkers in the back, that is obviously fake whois info [18:30] ae_g_i_s: hmmmmm [18:31] but i'm off for the night, ping me in case...hope jason makes a speedy and full recovery in the meantime [18:36] SpaffGarg: There could definitely be some improvements made in how tracker hands out jobs. [18:37] Like don't feed hundreds or thousands of jobs to people who never return any. [18:37] Automatically return jobs to the pool after a timeout. [18:37] A lot of time was wasted in the imdb project duplicating work, and waiting for requeued jobs. [18:38] Maybe even an ability to force concurrency N for clients, to avoid bans. [18:39] *** fusl has joined #archiveteam-bs [18:40] yeah i feel forced concurrency would help a lot [18:42] *** Jonison has joined #archiveteam-bs [18:43] it needs to hand back jobs that have failed in the client as well, maybe make the client check in every now and then to confirm its still doing a job [18:45] also i have no idea how to implement this because i cant really code [18:45] *** pizzaiolo has joined #archiveteam-bs [18:47] My web coding experience is 15 years out of date. [18:54] *** dvd has joined #archiveteam-bs [18:59] *** dvd has quit IRC (Konversation terminated!) [19:13] *** pizzaiolo has quit IRC (Ping timeout: 250 seconds) [19:16] *** schbirid has quit IRC (Read error: Operation timed out) [19:21] *** RichardG has quit IRC (Ping timeout: 245 seconds) [19:21] *** RichardG has joined #archiveteam-bs [19:24] Please kill me. I'm now engaged in a long discussion with Backblaze about base 10 vs base 2 units, after my clarifying help ticket yesterday. [19:25] I'm not shitting on them for using base 10. They just don't mention it anywhere. [19:40] *** odemg has joined #archiveteam-bs [19:51] Just heard back from Scaleway on my issue with corrupted data. Unsurprisingly, it turned out that the VM host was having hardware issues so they just had me transfer my VMs to a different host. [19:55] *** odemg has quit IRC (Remote host closed the connection) [19:56] *** odemg has joined #archiveteam-bs [20:04] MrRadar: That sounds unprofessional. [20:09] *** schbirid has joined #archiveteam-bs [20:09] any sugegstions for network saturation monitoring on linux that lets me zoom in to seconds data as well as overviews? [20:10] i run vnstat but that is only for aggregate statustics [20:10] need to provide isp support with details of their terrible quality... [20:16] *** VADemon has quit IRC (Quit: left4dead) [20:17] schbirid: Will nload do what you need? Otherwise you could probably write a script to log that data from the /sys or /proc entries for your NICs [20:17] thanks! i use speedometer sometimes which seems similar [20:18] i need something that lets me investigate "later" though [20:18] my plan is to saturate my connection 24/7 and then look at the bigger picture and be able to zoom in [20:18] Hmm... I don't think nload itself supports logging, just displaying the current conditions (with a rolling graph) [20:18] i could just log byte counts myself but then i would just have ugly data :) [20:20] maybe i should just do that and throw it into dc.js [20:20] http://square.github.io/crossfilter/ <3 [20:26] yeah, doing that [21:04] schbirid: mtr? [21:04] Not exactly saturation monitoring, but it's useful in debugging network problems. [21:05] For finding out who to yell at. [21:07] *** pizzaiolo has joined #archiveteam-bs [21:09] https://www.neowin.net/news/verizon-to-proceed-with-yahoo-acquistion-albeit-at-a-discount-of-350-million [21:09] This dumpster keeps on burning! [21:26] JensRex: nah, i really need to just show the bad bandwidth [21:26] it goes on and off [21:34] sch brother [21:37] *** Jonison has quit IRC (Read error: Connection reset by peer) [21:41] *** odemg has quit IRC (Remote host closed the connection) [21:48] *** BlueMaxim has joined #archiveteam-bs [22:28] schbirid: I run observium and have it poll every minute, I'm sure with some tweaking you could poll every 5-10s or so [22:32] *** odemg has joined #archiveteam-bs [22:32] *** odemg has quit IRC (Connection closed) [22:32] *** odemg has joined #archiveteam-bs [22:38] *** ndiddy has joined #archiveteam-bs [22:39] *** pikhq has quit IRC (Ping timeout: 244 seconds) [22:46] *** pikhq has joined #archiveteam-bs [23:00] *** GE has quit IRC (Remote host closed the connection) [23:11] *** odemg has quit IRC (Remote host closed the connection) [23:19] JensRex: if you can rate the scariness of the codebase of that controler, as well as programming language, I can tell you if I could fix it like next week or so (depending on some irl scheduling that is not clear yet, nothing major). [23:32] *** zino has quit IRC (Remote host closed the connection)