#archiveteam-bs 2017-02-21,Tue

↑back Search

Time Nickname Message
00:07 🔗 godane looks like there was alot of double issues with infoworld 2003 issues
00:08 🔗 sigkell has joined #archiveteam-bs
00:08 🔗 sigkell has quit IRC (Connection closed)
00:08 🔗 sigkell has joined #archiveteam-bs
00:11 🔗 godane has in 2003-03-24 has 2003-03-31 in it
00:11 🔗 godane some thing happen with 2003-04-07 having 2003-04-14 in it
00:22 🔗 godane looks like i got all of infoworld 2003 issues
00:22 🔗 godane there are some pages missing in 2003-03-03 and 2003-03-10
00:47 🔗 REiN^ has quit IRC (Read error: Operation timed out)
01:01 🔗 REiN^ has joined #archiveteam-bs
01:02 🔗 spiko has quit IRC (Read error: Operation timed out)
01:20 🔗 pizzaiolo has left
01:37 🔗 bsmith093 I'm finishing up archiving fanfiction.net ao3 and now fictionpress (cause it's way smaller than i thought it was) and i'm trying to make a sql db of the metadata for easy indexing of the millions of stories. https://uploadfiles.io/4dc111 I have these, i'd like munged into one giant csv like this http://paste.ubuntu.com/24037559/ ... the stupid thing is, i cant get the script i have to scan a whole directory structure ecursively. here's what i have so
02:27 🔗 vitzli has joined #archiveteam-bs
02:49 🔗 vitzli has quit IRC (Quit: Leaving)
03:04 🔗 icedice has quit IRC (Quit: Leaving)
03:49 🔗 BlueMaxim has joined #archiveteam-bs
04:29 🔗 Asparag-1 has joined #archiveteam-bs
04:31 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
04:34 🔗 Asparagir has quit IRC (Read error: Operation timed out)
05:18 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:21 🔗 Asparag-1 has quit IRC (Read error: Operation timed out)
05:25 🔗 Sk1d has joined #archiveteam-bs
05:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:37 🔗 dashcloud has joined #archiveteam-bs
05:39 🔗 ndizzle has quit IRC (Read error: Connection reset by peer)
05:46 🔗 wm_ has quit IRC (Ping timeout: 260 seconds)
05:50 🔗 Aoede has quit IRC (Ping timeout: 260 seconds)
05:51 🔗 Aoede has joined #archiveteam-bs
05:52 🔗 wm_ has joined #archiveteam-bs
06:12 🔗 vitzli has joined #archiveteam-bs
06:23 🔗 vitzli has quit IRC (Quit: Leaving)
06:31 🔗 BlueMaxim has joined #archiveteam-bs
06:40 🔗 phuzion Just in case people here haven't seen: SketchCow had a heart attack, is in the hospital in Melbourne, got a stent, and appears to be doing ok. https://twitter.com/textfiles/status/833928137243176960
06:42 🔗 Frogging O_o
06:48 🔗 pikhq "Slight issue"
06:50 🔗 mkram I hope he will be ok. otherwise archiving would get seriously impaired. Ans we would loose a nice guy, from what I saw in his talks.
06:59 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
07:27 🔗 anonymoos has quit IRC ()
07:36 🔗 Selavi oh damn
07:38 🔗 dxrt geez
07:45 🔗 bsmith093 *raises glass* To SketchCow!
07:45 🔗 bsmith093 feel better dude
08:05 🔗 odemg has quit IRC (Remote host closed the connection)
08:33 🔗 zhongfu_ has joined #archiveteam-bs
08:33 🔗 zhongfu has quit IRC (Ping timeout: 260 seconds)
08:59 🔗 GE has joined #archiveteam-bs
09:09 🔗 zhongfu_ has quit IRC (Ping timeout: 260 seconds)
09:10 🔗 zhongfu has joined #archiveteam-bs
09:31 🔗 tapedrive has joined #archiveteam-bs
09:32 🔗 whydomain has joined #archiveteam-bs
10:05 🔗 spiko has joined #archiveteam-bs
10:19 🔗 Aoede Damn... Get well soon SketchCow!
10:23 🔗 Aoede bsmith093: I scraped literotica.com a while back, I can send the files if you're interested
10:27 🔗 GE has quit IRC (Remote host closed the connection)
11:04 🔗 LastNinj_ has joined #archiveteam-bs
11:06 🔗 LastNinja has quit IRC (Ping timeout: 255 seconds)
11:25 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:45 🔗 JensRex No Wiki front page access. Someone with special powers, move imdb project to "recently finished".
12:08 🔗 ZexaronS has quit IRC (Leaving)
12:09 🔗 GE has joined #archiveteam-bs
13:12 🔗 JensRex has quit IRC (Remote host closed the connection)
13:12 🔗 JensRex has joined #archiveteam-bs
13:14 🔗 GE has quit IRC (Remote host closed the connection)
13:26 🔗 JensRex has quit IRC (Remote host closed the connection)
13:26 🔗 JensRex has joined #archiveteam-bs
13:43 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
14:56 🔗 GE has joined #archiveteam-bs
15:18 🔗 MrRadar JensRex: We're not quite done with IMDB yet. We're still planning to scrape all their non-discussion information
15:40 🔗 SpaffGarg isn't that available for download as flat files?
15:40 🔗 MrRadar Some but not all. E.g. user reviews
15:41 🔗 MrRadar Also none of the flat files have IMDB's ID numbers so they're useless if you're trying to make a mirror of the website
15:41 🔗 MrRadar (Which is almost certainly by design)
15:41 🔗 SpaffGarg might as well i guess
15:42 🔗 SpaffGarg might be easier now that other people arent mirroring it
15:56 🔗 BartoCH has joined #archiveteam-bs
16:01 🔗 pizzaiolo has joined #archiveteam-bs
16:02 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
16:12 🔗 bwn has joined #archiveteam-bs
16:18 🔗 Nemo_bis has left
16:31 🔗 godane i'm starting to upload metro korea seoul edition pdfs: https://archive.org/details/metro_korea_seoul_20140102
16:31 🔗 godane this will help us get get issues past 2015-05-20
16:32 🔗 godane since issuu doesn't have every issue
16:37 🔗 schbirid has joined #archiveteam-bs
16:48 🔗 vitzli has joined #archiveteam-bs
16:55 🔗 vitzli has quit IRC (Quit: Leaving)
17:00 🔗 JensRex MrRadar: Oh right.
17:08 🔗 Aranje has joined #archiveteam-bs
17:15 🔗 pizzaiolo has quit IRC (Ping timeout: 246 seconds)
17:15 🔗 hook54321 Is there a way to get a list of domain names that are going to expire soon but haven't yet? Like ones that are still technically operational and can still be accessed.
17:16 🔗 nightpool hook54321: I think there's a way to scrape whois records for it
17:17 🔗 nightpool I know that domain parkers will find domains that have been previously registered but are now unregistered to find domains to park
17:17 🔗 nightpool (this happened to an old domain of mine I was no longer using--it wasn't publically linked, so there would be no way to find it except whois)
17:17 🔗 nightpool I don't know how to get access to those whois databases though
17:22 🔗 spiko nightpool, google found this: http://stackoverflow.com/questions/307553/possible-to-download-entire-whois-database-list-of-registered-domains
17:30 🔗 pizzaiolo has joined #archiveteam-bs
17:32 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
17:34 🔗 pizzaiolo has joined #archiveteam-bs
18:22 🔗 VADemon has joined #archiveteam-bs
18:27 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
18:27 🔗 JensRex I should probably have let some of my domains expire. I have at least one 16 year old domain I've never done anything with.
18:28 🔗 JensRex But fuckup.dk is the only fuckup.* domain that isn't a pornsite.
18:28 🔗 JensRex Maybe I should use it for business email :D
18:29 🔗 xmc heh
18:29 🔗 ae_g_i_s same here...if anyone wants ganja.is, do tell
18:29 🔗 ae_g_i_s no fee or anything, just pay the domain instead of me ;)
18:30 🔗 ae_g_i_s and for the lurkers in the back, that is obviously fake whois info
18:30 🔗 xmc ae_g_i_s: hmmmmm
18:31 🔗 ae_g_i_s but i'm off for the night, ping me in case...hope jason makes a speedy and full recovery in the meantime
18:36 🔗 JensRex SpaffGarg: There could definitely be some improvements made in how tracker hands out jobs.
18:37 🔗 JensRex Like don't feed hundreds or thousands of jobs to people who never return any.
18:37 🔗 JensRex Automatically return jobs to the pool after a timeout.
18:37 🔗 JensRex A lot of time was wasted in the imdb project duplicating work, and waiting for requeued jobs.
18:38 🔗 JensRex Maybe even an ability to force concurrency N for clients, to avoid bans.
18:39 🔗 fusl has joined #archiveteam-bs
18:40 🔗 SpaffGarg yeah i feel forced concurrency would help a lot
18:42 🔗 Jonison has joined #archiveteam-bs
18:43 🔗 SpaffGarg it needs to hand back jobs that have failed in the client as well, maybe make the client check in every now and then to confirm its still doing a job
18:45 🔗 SpaffGarg also i have no idea how to implement this because i cant really code
18:45 🔗 pizzaiolo has joined #archiveteam-bs
18:47 🔗 JensRex My web coding experience is 15 years out of date.
18:54 🔗 dvd has joined #archiveteam-bs
18:59 🔗 dvd has quit IRC (Konversation terminated!)
19:13 🔗 pizzaiolo has quit IRC (Ping timeout: 250 seconds)
19:16 🔗 schbirid has quit IRC (Read error: Operation timed out)
19:21 🔗 RichardG has quit IRC (Ping timeout: 245 seconds)
19:21 🔗 RichardG has joined #archiveteam-bs
19:24 🔗 JensRex Please kill me. I'm now engaged in a long discussion with Backblaze about base 10 vs base 2 units, after my clarifying help ticket yesterday.
19:25 🔗 JensRex I'm not shitting on them for using base 10. They just don't mention it anywhere.
19:40 🔗 odemg has joined #archiveteam-bs
19:51 🔗 MrRadar Just heard back from Scaleway on my issue with corrupted data. Unsurprisingly, it turned out that the VM host was having hardware issues so they just had me transfer my VMs to a different host.
19:55 🔗 odemg has quit IRC (Remote host closed the connection)
19:56 🔗 odemg has joined #archiveteam-bs
20:04 🔗 JensRex MrRadar: That sounds unprofessional.
20:09 🔗 schbirid has joined #archiveteam-bs
20:09 🔗 schbirid any sugegstions for network saturation monitoring on linux that lets me zoom in to seconds data as well as overviews?
20:10 🔗 schbirid i run vnstat but that is only for aggregate statustics
20:10 🔗 schbirid need to provide isp support with details of their terrible quality...
20:16 🔗 VADemon has quit IRC (Quit: left4dead)
20:17 🔗 MrRadar schbirid: Will nload do what you need? Otherwise you could probably write a script to log that data from the /sys or /proc entries for your NICs
20:17 🔗 schbirid thanks! i use speedometer sometimes which seems similar
20:18 🔗 schbirid i need something that lets me investigate "later" though
20:18 🔗 schbirid my plan is to saturate my connection 24/7 and then look at the bigger picture and be able to zoom in
20:18 🔗 MrRadar Hmm... I don't think nload itself supports logging, just displaying the current conditions (with a rolling graph)
20:18 🔗 schbirid i could just log byte counts myself but then i would just have ugly data :)
20:20 🔗 schbirid maybe i should just do that and throw it into dc.js
20:20 🔗 schbirid http://square.github.io/crossfilter/ <3
20:26 🔗 schbirid yeah, doing that
21:04 🔗 JensRex schbirid: mtr?
21:04 🔗 JensRex Not exactly saturation monitoring, but it's useful in debugging network problems.
21:05 🔗 JensRex For finding out who to yell at.
21:07 🔗 pizzaiolo has joined #archiveteam-bs
21:09 🔗 JensRex https://www.neowin.net/news/verizon-to-proceed-with-yahoo-acquistion-albeit-at-a-discount-of-350-million
21:09 🔗 JensRex This dumpster keeps on burning!
21:26 🔗 schbirid JensRex: nah, i really need to just show the bad bandwidth
21:26 🔗 schbirid it goes on and off
21:34 🔗 SchroSct sch brother
21:37 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
21:41 🔗 odemg has quit IRC (Remote host closed the connection)
21:48 🔗 BlueMaxim has joined #archiveteam-bs
22:28 🔗 Kaz schbirid: I run observium and have it poll every minute, I'm sure with some tweaking you could poll every 5-10s or so
22:32 🔗 odemg has joined #archiveteam-bs
22:32 🔗 odemg has quit IRC (Connection closed)
22:32 🔗 odemg has joined #archiveteam-bs
22:38 🔗 ndiddy has joined #archiveteam-bs
22:39 🔗 pikhq has quit IRC (Ping timeout: 244 seconds)
22:46 🔗 pikhq has joined #archiveteam-bs
23:00 🔗 GE has quit IRC (Remote host closed the connection)
23:11 🔗 odemg has quit IRC (Remote host closed the connection)
23:19 🔗 mkram JensRex: if you can rate the scariness of the codebase of that controler, as well as programming language, I can tell you if I could fix it like next week or so (depending on some irl scheduling that is not clear yet, nothing major).
23:32 🔗 zino has quit IRC (Remote host closed the connection)

irclogger-viewer