[00:00] its a rca home theater vcr [00:01] it powers one but the machine will not take the tape [00:09] *** Atom has quit IRC (Read error: Connection reset by peer) [00:10] *** icedice has quit IRC (Read error: Operation timed out) [00:14] so i now have a tape hitting the 10000k mark [00:14] not of your tapes didn't for some reason [00:19] anyways i'm only putting this tape at 6000k now [00:20] most cause i only capture tv stuff around 5000k [00:38] btw i got the money from patreon now [00:40] *** BlueMaxim has quit IRC (Quit: Leaving) [00:43] *** pizzaiolo has joined #archiveteam-bs [01:15] *** BlueMaxim has joined #archiveteam-bs [01:41] glad to hear you got the payment situation fixed godane [01:49] *** schbirid2 has joined #archiveteam-bs [01:50] btw i found a youtube channel with all Sightings epsiodes [01:50] i'm grabbing it for the archive and myspleen cause they been looking for all episodes of it [01:54] *** username1 has quit IRC (Read error: Operation timed out) [01:55] *** dashcloud has quit IRC (Remote host closed the connection) [01:56] so got sci-fi airing of star trek [02:05] why does sci-fi airing it matter? [02:05] just for commercials and stuff? [02:07] mundus: what's up with your server? [02:11] it has intro with william shatner talking about the episode [02:12] so its for commericals and stuff [02:12] some times bad edits on stations [02:13] anyways i got 14 tapes from the guy for $10.01 [02:19] *** VerifiedJ has joined #archiveteam-bs [02:36] godane: I did some more digging and found a way to get full PDFs. Details here https://verifiedjoseph.com/f68qUv7lqs/archiveteam/pagesuite-pdfs.txt (i hope it makes sense) [02:42] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [02:44] *** VerifiedJ has left [02:46] *** r3c0d3x has joined #archiveteam-bs [03:13] *** Asparagir has quit IRC (Asparagir) [03:22] *** Stilett0 has joined #archiveteam-bs [03:22] *** Stilett0 is now known as Stiletto [03:24] second, can't afford it [03:24] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [03:25] mundus: how much was it costing? [03:25] $7/mo [03:28] hmm [03:28] Bandwidth cost? [03:28] I know it's not much [03:29] But I don't have much money [03:29] Unlimited bw [04:02] Does anyone know where I can find the old imdb database? [04:04] Or a movie database dataset? [04:07] Can someone archive this? ftp://ftp.fu-berlin.de/pub/misc/movies/database/temporaryaccess/ [04:07] https://sourceforge.net/p/imdbpy/mailman/message/35922484/ [04:07] And perhaps this ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/ [04:08] IMDB got rid of their database dumps and got a new format but it is missing a lot of data [04:08] a lot of cast / crew meta [04:10] second: how big are those? [04:11] A few gigs [04:12] Post said it was Old files: 49 files, 1.9 GB [04:12] nods [04:12] New files: 6 files, 361 MB on S3 [04:12] I can download the S3 stuff and pay for it just need to know where I can upload it for archiveteam to take it and how they want it [04:13] Or I can pay someone to get it and they give me a copy ;D [04:13] second: you can upload it to the Internet Archive. [04:14] Just make an account (all you need is an email address, which will be permanently and (not very) publically attached to whatever you upload). [04:14] You can download each file and upload them before downloading the next, which should avoid you needing to hold on to larger amounts of space. [04:15] How is the IA doing with space? [04:15] second: they've got plenty. [04:16] a few gigs won't even be noticed. [04:16] I hope they have backups, also california is on fire, I hope the IA doesn't burn down too [04:16] a few *terabytes* wouldn't be noticed [04:16] once you get up to a petabyte, it's polite to ask first. [04:16] (I'm somewhat exagerating, but only somewhat) [04:17] They have backups; they are working on a backup in Canada, although I haven't heard much about it lately. [04:19] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:25] second: I'm grabbing the fu-berlin one now. [04:26] *** Sk1d has joined #archiveteam-bs [04:27] second: also grabbing [04:42] Up to 2.8G so far [04:45] second: it looks like the other address, funet.fi, is a mirror; are you sure the data is different? [04:46] Somebody2: IMDB releases snapshots with diffs--compare something outside the diffs folder [04:46] They will either be the same data at 2 points in time or the same point in time is what I was trying to convey [04:47] kisspunch: not sure what you mean? [04:47] the reported file sizes are identical [04:47] the timestamps are a few hours different [04:49] Somebody2: I am saying, compare a file that's not a diff if you're going to do that check. In any case, I'm just grabbing both and running a deduplicator after [04:49] sounds good. [04:50] Let me know if your deduplication finds that they are different, and I'll grab the second one. [04:56] Pretty sure they're the same (actors.list.gz is the same size) but I'll double check tomorrow or so [04:58] Expect it to be 14G [04:58] My internet's not that fast, I just have an old dump :) [05:11] JAA: send me an SSH public key over query or email or whatnot, I can grant you access to archivebot@archivebot-proto2 and then you can register new pipelines [05:13] *** wp494 has quit IRC (Ping timeout: 506 seconds) [05:19] YAYAYAY! New pipeline energy! [05:20] https://blog.archive.org/2017/10/10/books-from-1923-to-1941-now-liberated/ [05:21] One of the points about this focus on whether copies can be bought for a fair price. [05:21] If there are only a few copies, can someone buy them, and announce that they are no longer for sale, and thereby trigger section 108(h)? [05:23] How shockingly reasonable of US copyright law. [05:25] pikhq: yeah, ain't it? [05:44] *** wp494 has joined #archiveteam-bs [05:48] *** BlueMaxim has quit IRC (Quit: Leaving) [06:10] second: I've now got the fu-berlin one; it's 13G in size. I'll wait to hear from kisspunch about whether the funet.fi one is different before going after that. [06:24] *** BlueMaxim has joined #archiveteam-bs [06:35] *** loadup has quit IRC (Read error: Operation timed out) [07:02] *** Honno has joined #archiveteam-bs [08:01] *** atrocity has quit IRC () [08:23] *** BlueMaxim has quit IRC (Ping timeout: 255 seconds) [08:23] *** BlueMaxim has joined #archiveteam-bs [08:44] *** wp494 has quit IRC (Ping timeout: 492 seconds) [08:51] *** wp494 has joined #archiveteam-bs [09:26] *** tfgbd_znc has quit IRC (Read error: Connection reset by peer) [09:46] *** wabu has quit IRC (Read error: Operation timed out) [09:56] *** wabu has joined #archiveteam-bs [09:56] *** kepler45 has joined #archiveteam-bs [10:05] ugh, there are so many versions of fdupes [10:15] *** Honno has quit IRC (Read error: Operation timed out) [10:22] Somebody2: It's the same. [10:28] *** atrocity has joined #archiveteam-bs [10:29] *** ivan has quit IRC (Leaving) [10:40] *** marvinw has joined #archiveteam-bs [10:54] *** Mateon1 has quit IRC (Ping timeout: 250 seconds) [11:02] *** midas has quit IRC (Read error: Connection reset by peer) [11:03] *** midas has joined #archiveteam-bs [11:04] yipdw: Excellent, will do in a bit. [11:27] *** pizzaiolo has joined #archiveteam-bs [12:00] *** qw3rty3 has joined #archiveteam-bs [12:25] *** wabu has quit IRC (Read error: Operation timed out) [12:28] *** Atom has joined #archiveteam-bs [12:35] *** wabu has joined #archiveteam-bs [12:43] *** BlueMaxim has quit IRC (Quit: Leaving) [13:32] Thank you Somebody2 [13:33] Did anyone by chance download the aws bucket for the imdb data? [13:39] Since you have to pay S3's exorbitant bandwidth fees (it's a Requester-Pays bucket), I kind of doubt it. I believe IMDB is still working on an HTTP interface without those fees. [13:39] See: https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3 [13:40] Them not having the HTTP interface up seems to be the reason why the FTP servers are still online. [13:46] Does anyone know where I can find a last.fm dump? [13:47] *** Mateon1 has joined #archiveteam-bs [14:12] *** Pixi has quit IRC (Quit: Pixi) [14:12] *** Pixi has joined #archiveteam-bs [14:14] *** icedice has joined #archiveteam-bs [14:17] Is there a channel for Amazon Forum archival? [14:37] *** sep332 has joined #archiveteam-bs [15:06] *** Asparagir has joined #archiveteam-bs [15:16] *** icedice has quit IRC (Quit: Leaving) [15:21] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [15:28] *** ZexaronS- has joined #archiveteam-bs [15:30] *** ZexaronS has quit IRC (Ping timeout: 260 seconds) [15:34] qw3rty3: No, there isn't. [15:58] *** Asparagir has quit IRC (Asparagir) [16:15] re that new order forum. 125$ for a forum that would run on a 5$ host... wtf [16:16] I don't know the story behind this case, but I've seen similar setups before, and there it was a matter of "never change a running system" mixed with "I'm too lazy to do anything about it". [16:18] *** Stilett0 has joined #archiveteam-bs [16:24] *** Stilett0 is now known as Stiletto [17:22] *** Asparagir has joined #archiveteam-bs [17:39] *** pa has joined #archiveteam-bs [17:50] VerifiedJ: that's basically a slower version of pdfcat though, it's not pristine so to speak [17:52] *** pa has quit IRC (Quit: pa) [17:54] *** pa has joined #archiveteam-bs [17:56] second: a quick google search gives me https://www.demonforums.net/Thread-Last-fm-Dump-Re-upload https://leakninja.com/39243-lastfm-1-8gb-dump-12.html [17:56] oh hey, https://btdig.com/85f39f1d94917d61277725e7da85d8177a5c12eb/ [17:57] /last.fm/lastfm.txt.gz [17:59] Any way to upload a torrent larger than 100gb to internetarchive? [18:44] *** Stiletto has quit IRC () [19:16] *** Asparagir has quit IRC (Asparagir) [20:07] *** schbirid2 has quit IRC (Quit: Leaving) [20:09] *** schbirid has joined #archiveteam-bs [20:32] *** pa has quit IRC (Quit: pa) [20:33] *** pa has joined #archiveteam-bs [20:34] *** pa has quit IRC (Client Quit) [20:39] What's the best way to archive different source code repositories? I know about svnrdump for SVN repos, but what about other softwares? git, Mercurial, Bazaar, CVS, etc. [20:41] In particular, what to do if the repository itself is not public but only accessible through a web frontend? (There's an ArchiveBot job currently grabbing a CVSweb instance; that's the immediate trigger for these questions, though I've been wondering about it for longer.) [20:44] git clone [20:44] etc [20:44] github-backup for stuff on github that might have other useful things like issues, wiki pages, etc [20:53] For git clone the harder part is keeping your mirror up to date--the initial clone yeah, git clone works fine [21:38] *** Stilett0 has joined #archiveteam-bs [22:33] *** Asparagir has joined #archiveteam-bs [22:39] *** kepler45 has quit IRC (Quit: Leaving) [22:49] that guy that I asked to PM me about the whole deal involving NCIX got back to me and he straight up refused despite having PMed someone else already [22:49] /shrug [23:15] *** BlueMaxim has joined #archiveteam-bs [23:37] *** Soni has quit IRC (Ping timeout: 272 seconds) [23:53] *** Asparagir has quit IRC (Asparagir)