#archiveteam 2013-10-28,Mon

↑back Search

Time Nickname Message
00:14 πŸ”— kindle I'm typing from my Kindle to tell you that myloc.gov is closing on November 19th. This site by the Library of Congress has been open for a while, and people's reading lists, collections, etc. will be deleted after this time.
00:17 πŸ”— BiggieJon http://myloc.gov/Pages/notice.aspx
00:43 πŸ”— dashcloud JRWR: if you work at a tech recycling place, let me pitch one of my favorite causes: donate items to iFixit (free shipping!) and repair manuals get made for them by students: http://www.ifixit.com/Info/Device_Donations
02:10 πŸ”— SketchCow For the moment, everything is stream only
02:12 πŸ”— godane ok
02:12 πŸ”— godane just thought it was odd
02:14 πŸ”— SketchCow People need to realize jsmess.textfiles.com is my big demo
02:15 πŸ”— SketchCow And the Internet Archive Historical Software collection is suit and tie
02:29 πŸ”— dashcloud the speech was really good, and the press about it is very good so far
03:14 πŸ”— JRWR dashcloud: ill send over the 30+ versions of coby tablets we got in
03:16 πŸ”— dashcloud anyone have a more automated twitch.tv video downloader? twitch seems to split up streams into 30 minute chunks, and you need to download each chunk separately
03:17 πŸ”— JRWR sounds like HLS
03:17 πŸ”— JRWR is it HLS?
03:22 πŸ”— ryonaloli hello, i just learned about archive team just recently. i realized there's no backup of 4chandata archive, which deleted all its images and went text-only months ago. i have a full backup of all the images with original filenames, is there anything i should do with it?
03:23 πŸ”— BlueMax SketchCow, you're needed
03:26 πŸ”— ryonaloli also, semi-related, is there any plans to archive archive foolz? for some boards like /a/, they massacred a shitload of images when they set full size image limit to 6 months
03:27 πŸ”— dashcloud JRWR: no clue- is there an easy way to tell?
03:31 πŸ”— dashcloud twitch.tv is the site- pick any stream there. http://www.twitchtools.com/video-download.php is the site I'm using right now to download the videos- it presents you all the segments of the video.
03:33 πŸ”— JRWR dashcloud: If i remember right, Twitch is switching over to HLS, should be easy to extract
03:33 πŸ”— fgsfds Has anyone considered using the google cached pages to grab some more of isohunt?
03:33 πŸ”— joepie92 ryonaloli: hi, still there?
03:33 πŸ”— ryonaloli i am joepie92
03:33 πŸ”— joepie92 fgsfds: yes, but anything Google is generally a pain in the ass :(
03:33 πŸ”— joepie92 ryonaloli: you could upload your backup to archive.org
03:33 πŸ”— ryonaloli how do i do that?
03:34 πŸ”— joepie92 basically, go to http://archive.org/, create an account, log in, click the upload button in the top right corner
03:34 πŸ”— joepie92 one note: the e-mail address you used to sign up will be available alongside your upload
03:34 πŸ”— joepie92 (not that anyone ever looks at that, but it's technically there)
03:34 πŸ”— ryonaloli is there any way i can do it anonymously?
03:34 πŸ”— ryonaloli or is it enough just to use 10minutemail
03:35 πŸ”— joepie92 a throw-away email would probably be as anonymous as it gets
03:36 πŸ”— ryonaloli ok, i just don't want it to be rejected if i use a temp email
03:36 πŸ”— joepie92 fgsfds: the issue with Google is primarily that they throttle like crazy on all their services - including the cache
03:36 πŸ”— joepie92 bing/yahoo cache would be more likely to turn up something useful than that of google
03:36 πŸ”— joepie92 (if any of them even cache isohunt in the first place)
03:36 πŸ”— joepie92 ryonaloli: I don't think it would be
03:36 πŸ”— joepie92 I'm not sure, since I upload all my own stuff under my own e-mail address... but I can't see an immediate reason why temp emails would be rejected
03:37 πŸ”— ryonaloli oh, and the other thing, i also have a copy of all of gelbooru's deleted (but not purged) images along with gzipped html from the posts, is that worth uploading? i scrapped and archived it myself to capture images which me and others enjoy, but where deleted because of ToS violations (guro, etc). is that worth uploading as well or should i just make my own torrent like i planned to previously?
03:39 πŸ”— joepie92 ryonaloli: everything is worth uploading :)
03:39 πŸ”— joepie92 can't say whether it will stay accessible (depending on content), but either way it'll be archived
03:40 πŸ”— ryonaloli conetent like, nsfw?
03:41 πŸ”— joepie92 ryonaloli: see, the main purpose of archive.org is to store; not to distribute (that's the secondary purpose)
03:41 πŸ”— joepie92 even if something were to be blacked out because of issues with the content, it would still be archived
03:42 πŸ”— joepie92 so things like nsfw content really shouldn't be an issue
03:42 πŸ”— joepie92 (as far as my understanding goes)
03:42 πŸ”— joepie92 better to upload it now and have it potentially blacked out later, than to not upload it now and need it later when no copy exists anymore
03:43 πŸ”— ryonaloli if a large percentage of the content is garbage (aka deleted because obvious troll images, etc), would that still be acceptable? i can't sort all the content manually, it's 91k images
03:43 πŸ”— joepie92 sure, just upload it
03:43 πŸ”— joepie92 one persons garbage may be another persons treasure :)
03:44 πŸ”— * joepie92 has "archive first, ask questions later" as personal slogan by now
03:45 πŸ”— ryonaloli ok, and last question, both of the archives are completely uncensored, and some (such as the 4chandata archive) might have occasional highly illegal content (child porn, bestiality, etc). i obviously can't search through such a large amount of images, so if a few such images are accidentally uploaded, is that going to be ok? i'll use Tor to upload anyway, i just want to be on the safe side
03:45 πŸ”— * joepie92 CCs BlueMax
03:46 πŸ”— BlueMax yo
03:46 πŸ”— ryonaloli i don't care about the morality, i believe everything period should be saved, but i care about my safety and the archive's saftey
03:46 πŸ”— joepie92 BlueMax: see above
03:46 πŸ”— joepie92 thoughts?
03:46 πŸ”— BlueMax I don't know much about the archive but if this data's going up either way it should be kept dark until the images are searched
03:48 πŸ”— BlueMax SketchCow, you're needed
04:02 πŸ”— ryonaloli is there any issue with archives obtained illegally or semi-legally? like, will archive.org still accept that kind of thing?
04:04 πŸ”— ryonaloli like, from archiving sites with a no-archive in the ToS, or using compromized computers to archive sites in an emergency
04:07 πŸ”— S[h]O[r]T are you talking about things you did personally, ie comprimized computers to archive or something someone else did
04:07 πŸ”— ryonaloli hypothetically
04:08 πŸ”— S[h]O[r]T so hypotheticaly would it be you using comprimized computers or someone else
04:08 πŸ”— S[h]O[r]T also we are both spelling compromised wrong, lol
04:08 πŸ”— ryonaloli hypothetically if i had such a thing, would archive.org or archive team accept an archive created using such a tool
04:09 πŸ”— ryonaloli yeah well irssi spell checker sucks
04:10 πŸ”— S[h]O[r]T in terms of emergency archives thats not the way everyone here go about doing that. in terms of archive.org they wouldnt know.
04:10 πŸ”— ryonaloli how many people are mobilized during emergency archiving?
04:10 πŸ”— S[h]O[r]T if your going to do that then dont tell anyone your doing it
04:11 πŸ”— S[h]O[r]T whoever steps up. some people here are affilated with archive.org but many archiving projects are not run under archive.org or have any affilation
04:11 πŸ”— S[h]O[r]T aka archiveteam
04:11 πŸ”— ryonaloli >1k?
04:11 πŸ”— S[h]O[r]T doubtful
04:12 πŸ”— ryonaloli oh
04:12 πŸ”— S[h]O[r]T bw/resources is usually not a giant problem. the problem in emergency archiving lies most of the time in someone to write a script to do it all right
04:13 πŸ”— ryonaloli there's no c&c for scripts?
04:13 πŸ”— S[h]O[r]T https://github.com/archiveteam
04:14 πŸ”— S[h]O[r]T http://archiveteam.org/index.php?title=Main_Page
04:14 πŸ”— ryonaloli that's a lot :/
04:14 πŸ”— S[h]O[r]T all the grab scripts are mostly based off of each other and have similar framework
04:15 πŸ”— joepie92 ryonaloli: curious; how'd you find out about archiveteam?
04:15 πŸ”— ryonaloli isohunt's fall
04:15 πŸ”— joepie92 ahh :)
04:16 πŸ”— ryonaloli i never would have thought something like this existed... i always used as much means as i could to archive dying sites, had no idea there was a whole team lol
04:17 πŸ”— joepie92 ryonaloli: I've mostly been there
04:17 πŸ”— joepie92 though I wasn't as busy archiving sites
04:17 πŸ”— joepie92 mostly just saving everything I ran across
04:17 πŸ”— joepie92 "just in case"
04:18 πŸ”— joepie92 (some of those just in cases have actually occurred)
04:18 πŸ”— ryonaloli yeah, same. i only archive sites when i notice a moralfaggy ToS change
04:18 πŸ”— ryonaloli but my shitty DSL and VPN slows things down :/
04:19 πŸ”— ryonaloli so who on this irc is incharge of al this?
04:19 πŸ”— joepie92 there's technically not really one person "in charge" of things
04:19 πŸ”— joepie92 it's mostly just "go get shit done"
04:19 πŸ”— ryonaloli no hierarchy at all?
04:20 πŸ”— joepie92 though a few bits of the 'infrastructure' (think the tracker, etc.) are centralized
04:20 πŸ”— joepie92 no formal hierarchy
04:20 πŸ”— joepie92 I think
04:20 πŸ”— joepie92 although everybody listens to SketchCow
04:20 πŸ”— joepie92 but that's not really a hierarchy thing I think :P
04:20 πŸ”— ryonaloli is i set up a script on my vps or idle computers that are low maintanance, will using the tracker be all that's needed to contribute?
04:21 πŸ”— ryonaloli or do the scripts used and api change all the time or something
04:21 πŸ”— joepie92 ryonaloli: well, the current architecture is that there is A. a VM image that you can run in virtualbox etc., it will automatically pick up new projects and B. you can run scripts manually using the seesaw kit
04:22 πŸ”— joepie92 the former is pretty maintenance-free but not really suited for use on VPSes
04:22 πŸ”— joepie92 the latter requires some work
04:22 πŸ”— ryonaloli is there no low maintanance way with less over head?
04:22 πŸ”— joepie92 not yet :)
04:22 πŸ”— S[h]O[r]T well the warrior vm is pretty low maintainence?
04:22 πŸ”— joepie92 S[h]O[r]T: yes, but not low overhead
04:23 πŸ”— joepie92 seesaw scripts are low overhead, but not low maintenance
04:23 πŸ”— joepie92 (because you have to manually clone and run every one of them)
04:23 πŸ”— S[h]O[r]T yeah, and manually update. the warrior is all automatic
04:23 πŸ”— joepie92 this was one of the reasons I suggested docker a while ago, but that seems to rely on lxc so won't work on many VPSes
04:23 πŸ”— S[h]O[r]T but there are people who have done warrior images on ec2 and whatnot
04:23 πŸ”— joepie92 should probably just have a framework
04:23 πŸ”— joepie92 yes, but ec2 is ec2
04:24 πŸ”— joepie92 that is not at all the same usecase as "I have a VPS that's not really doing much..."
04:24 πŸ”— joepie92 :P
04:24 πŸ”— S[h]O[r]T yeah
04:24 πŸ”— S[h]O[r]T the warrior is also just really a set of scripts so you could pull it apart and run it on a vps probably
04:24 πŸ”— joepie92 anyway, ryonaloli; the takeaway is that for a VPS you'd be pretty much stuck running them manually now (although it's now properly documented, since isohunt)
04:24 πŸ”— joepie92 and help is absolutely needed to automate that more
04:24 πŸ”— joepie92 a la warrior :)
04:25 πŸ”— joepie92 S[h]O[r]T: I had a look at it and you can't just copypaste it into a VPS, basically
04:25 πŸ”— S[h]O[r]T right
04:25 πŸ”— joepie92 can't recall why exactly, but it needed more work than that
04:27 πŸ”— phillipsj I may try to run warrior in a freeBSD jail. Would need to use Debian with the BSD kernel, AFAIK.
04:27 πŸ”— ryonaloli does *BSD support openvz?
04:29 πŸ”— phillipsj I looked that up in the past few weeks... IIRC, my hardware is not new enough for that :P
04:29 πŸ”— joepie92 ryonaloli: nope, openvz is based on a custom Linux kernel
04:29 πŸ”— joepie92 though more and more is being merged into mainline
04:30 πŸ”— joepie92 so perhaps over time you may some stuff transfering over to BSD
04:30 πŸ”— joepie92 but for now it's not possible to run ovz on BSD
04:30 πŸ”— ryonaloli you can run openvz without the vz kernel
04:30 πŸ”— ryonaloli it'll just have less features
04:30 πŸ”— Aranje we will have bhyve, we don't need no openvz :3
04:31 πŸ”— ryonaloli what's bhyve?
04:32 πŸ”— Aranje their 250k container system
04:33 πŸ”— Aranje kinda like a more-native kvm or so
04:33 πŸ”— Aranje similar in idea, but doesn't support old hardware at all
04:34 πŸ”— ryonaloli but kvm works totally differently than openvz
04:34 πŸ”— yipdw ryonaloli: the main reason why there are so many grabbers is that website structures are freeform and there has not been to date much effort into consolidating the common patterns (because there's not much payoff)
04:34 πŸ”— yipdw ArchiveBot is one of a few efforts to do that, but to date it will not handle multi-terabyte dumps
04:34 πŸ”— yipdw (it is also not designed to do that)
04:35 πŸ”— ryonaloli is there any plans for a centralized c&c that will at least give URLs to archive and instructions to individual grabbers?
04:35 πŸ”— yipdw that already exists
04:35 πŸ”— yipdw http://tracker.archiveteam.org/
04:35 πŸ”— joepie92 ryonaloli: the tracker hands out 'tasks'
04:35 πŸ”— joepie92 but the code for actually downloading stuff is distributed separately
04:35 πŸ”— joepie92 yipdw: doesn't that go via warrior hq?
04:35 πŸ”— yipdw that *could* be generalized to "here is a list of URLs, scrape them and report back to me about what you find"
04:35 πŸ”— joepie92 projects.json and all that
04:36 πŸ”— yipdw warriorhq is a separate program
04:36 πŸ”— ryonaloli what information is contained in these "tasks"?
04:36 πŸ”— yipdw project-dependent
04:37 πŸ”— yipdw often they're a URL component, e.g. the MEMBERID in http://www.example.com/[MEMBERID]
04:37 πŸ”— yipdw in some cases they are a reference to a larger data packet
04:38 πŸ”— ryonaloli where can i get documentation on the tracker?
04:39 πŸ”— yipdw :P
04:39 πŸ”— yipdw https://github.com/ArchiveTeam/universal-tracker
04:39 πŸ”— ryonaloli i mean, for the format of tasks, etc
04:40 πŸ”— yipdw oh, that's not documented -- a task is just a string
04:40 πŸ”— ryonaloli example string?
04:40 πŸ”— yipdw more concretely, it's an element in a Redis set
04:41 πŸ”— yipdw ryonaloli: there is no fixed format
04:41 πŸ”— yipdw sometimes it's a username, sometimes it's some other unique identifier
04:41 πŸ”— ryonaloli is it fixed enough that a computer can understand it alone?
04:41 πŸ”— yipdw no
04:41 πŸ”— yipdw well, actually
04:41 πŸ”— yipdw yes
04:41 πŸ”— yipdw the meaning of the identifier is contained in the fetch pipeline
04:42 πŸ”— yipdw but if you're looking for a task schema, there is no such thing
04:42 πŸ”— yipdw there is in theory no reason why tasks could not be URLs, or groups of URLs
04:42 πŸ”— yipdw in practice, that isn't done
04:43 πŸ”— ryonaloli i'm just looking for a way to set up an archiver on a few idle windows computers that i won't be able to maintain all the time, but that i'd like to be able to archive for archiveteam automatically
04:43 πŸ”— yipdw the best way to do that right now is to start up the Warrior VM and set them to "ArchiveTeam's Choice"
04:44 πŸ”— yipdw for Windows machines, that will probably remain the best way for the foreseeable future
04:44 πŸ”— ryonaloli i can't run vms
04:44 πŸ”— yipdw oh uh
04:44 πŸ”— ryonaloli too much overhead
04:44 πŸ”— phillipsj cgywin?
04:44 πŸ”— yipdw so, historically, we have not had good results from Windows systems
04:44 πŸ”— ryonaloli if my computers were all linux, i'd just take whatever scripts the warrior distro is using
04:44 πŸ”— yipdw or Cygwin
04:44 πŸ”— yipdw that was one of the main reasons for the Warrior VM
04:44 πŸ”— ryonaloli can't do cgywin i don't think
04:45 πŸ”— Aranje yeah windows is kind of shit
04:45 πŸ”— ryonaloli i agree Aranje
04:45 πŸ”— Aranje it fucks up filenames
04:45 πŸ”— yipdw as a result, running the Archive Team programs on Windows machines is unsupported and discouraged
04:45 πŸ”— yipdw well, programs save the Warrior VM
04:45 πŸ”— yipdw :P
04:45 πŸ”— phillipsj As of window 8, you can now set the Hardware clock to UTC :)
04:46 πŸ”— Aranje fancy!
04:46 πŸ”— Aranje welcome what, 1993?
04:46 πŸ”— yipdw I'm not saying it's impossible to get good results, just that nobody in AT has put in the effort to make it work *and* maintain it
04:46 πŸ”— yipdw the majority of people writing and maintaining grabber code / tracker code don't run it on Windows, so
04:46 πŸ”— yipdw yeah, the usual reasons
04:47 πŸ”— yipdw it's not just a picking-on-Microsoft thing, either
04:47 πŸ”— yipdw HFS+' case-preserving behavior has also caused problems
04:48 πŸ”— yipdw er, case-insensitive-and-preserving
05:25 πŸ”— JRWR It could be nicer, we could have a Python core that is ran
05:25 πŸ”— JRWR some client that runs in your tray and set and forget
05:26 πŸ”— ryonaloli how difficult would it be to code something like that?
05:26 πŸ”— JRWR about a month's worth
05:27 πŸ”— ryonaloli for one person?
05:27 πŸ”— JRWR ya, I can see that
05:27 πŸ”— JRWR maybe more if a standard is made
05:27 πŸ”— JRWR like how a worker is ment to be made and how they should be ran / throttled
05:28 πŸ”— JRWR and what libs are on a system, and a package system for the workers, but I guess thats kinda already in place
05:28 πŸ”— ryonaloli the tracker would have to begin using a standard format for that to be of much use
05:29 πŸ”— JRWR then a nice config menu, could still be web based, since python supports spinning one up very fast
05:29 πŸ”— ryonaloli if it's python it could much more easily be cross platform
05:30 πŸ”— JRWR Yep
05:30 πŸ”— ryonaloli and have a cli version for linux/windows cmd
05:30 πŸ”— JRWR and having a central tracker would be nice
05:31 πŸ”— JRWR a API with storage for the work done, and a format to be stored on what needs to be done
05:31 πŸ”— ryonaloli it'd make it a lot easier for volunteers to just set up a scraper client and forget about it, like BOINC
05:31 πŸ”— JRWR Yes
05:31 πŸ”— odie5533 What format/API does the tracker currently use?
05:31 πŸ”— JRWR unknown
05:31 πŸ”— Cameron_D it currently runs as a VM
05:31 πŸ”— odie5533 Yeah, but it communicates somehow
05:31 πŸ”— Cameron_D oh, for the tracker
05:32 πŸ”— ryonaloli what does each archiving task involve? a url, probably a deadline... does it communicate with other clients at all or post updates on it's progress?
05:32 πŸ”— JRWR Well most are custom
05:32 πŸ”— ryonaloli and is there any way to detect fake results or bad clients trying to send garbage?
05:32 πŸ”— ryonaloli yeah that's the issue
05:33 πŸ”— ryonaloli there can't be a standard if it's custom and changes each time
05:33 πŸ”— Cameron_D https://github.com/ArchiveTeam/universal-tracker is the tracker
05:33 πŸ”— JRWR nope, but you could do cross checks
05:33 πŸ”— ryonaloli that'd slow down the process though
05:33 πŸ”— Cameron_D and no, items aren't verified
05:34 πŸ”— odie5533 It uses a general HTTP JSON API. That would work in any language.
05:34 πŸ”— odie5533 *and on any platform
05:34 πŸ”— ryonaloli what information would be required to fully automate it?
05:35 πŸ”— ryonaloli maybe it could be as simple as sending new scripts each time, but then there's the security issue, and even in a MAC i wouldn't trust such code
05:35 πŸ”— Cameron_D that is why a VM works well
05:36 πŸ”— Cameron_D its completly sandboxed so people know that what they run won't harm their computer
05:36 πŸ”— ryonaloli but it's very high overhead, requires admin privilages, etc
05:36 πŸ”— ryonaloli also, a VM isn't very good security at ALL
05:36 πŸ”— ryonaloli there's already rootkits in the wild specifically designed to break out of VMs
05:36 πŸ”— odie5533 let me stop your giht there
05:36 πŸ”— yipdw it's a good thing that the Warrior VM image doesn't contain any
05:36 πŸ”— odie5533 yes, there are some vulnerabilities occasionally
05:36 πŸ”— odie5533 but VMs are extremely secure
05:37 πŸ”— ryonaloli they aren't designed for security
05:37 πŸ”— yipdw so
05:37 πŸ”— yipdw ok
05:37 πŸ”— ryonaloli and if they're receiving self-updating code...
05:37 πŸ”— yipdw yes, it is theoretically possible for a Warrior VM to get data that has been modified to contain a rootkit that will break out of a hypervisor
05:37 πŸ”— Cameron_D There is less potential for a VM to be exploite and used to access the host than the code running *directly* on the host
05:38 πŸ”— JRWR we need a sanboxed system.. a set of APIs to do the archiving
05:38 πŸ”— odie5533 yipdw: Warrior VM would not be the major target for a researcher finding that kind of bug.
05:38 πŸ”— JRWR maybe some LUA?
05:38 πŸ”— yipdw in *practice*, this isn't the sort of thing that happens often enough for effort to be spent on it
05:38 πŸ”— yipdw which is why nobody has done it
05:38 πŸ”— JRWR maybe use a lua sandbox with a limited API
05:38 πŸ”— JRWR that should lock it down
05:39 πŸ”— yipdw what you will find is that there is no threat model for Archive Team software because the MO up to now has been "we save stuff"
05:39 πŸ”— yipdw and that has actually worked alright, at least measured by the metric of "how much did you save"
05:39 πŸ”— JRWR if we did use python, maybe just some basic code review would do the trick
05:39 πŸ”— odie5533 a much bigger risk to the project is someone that sends junk data.
05:39 πŸ”— yipdw security improvements are welcome
05:40 πŸ”— JRWR a lua sandbox might work for this use case, limit the API and you should be good
05:40 πŸ”— yipdw further discussion should probably be in #warrior
06:32 πŸ”— fgsfds Is there much effort being made to hold onto IRC logs?
06:34 πŸ”— ryonaloli i thought the IRC was already logged
06:44 πŸ”— Nemo_bis fgsfds: whose logs?
06:45 πŸ”— fgsfds I mean general/popular channels.
06:45 πŸ”— ryonaloli well, i can always give my logs, but i just came here today lol
06:46 πŸ”— Nemo_bis fgsfds: public logging, on most servers, is severely prohibited unless decided by the channel
06:46 πŸ”— ryonaloli i think i read in the wiki that this channel did public logging
06:46 πŸ”— Nemo_bis EFNet seems not to be too paranoid about it, but FreeNode definitely is
06:46 πŸ”— Nemo_bis sure, because we decided so :)
06:47 πŸ”— Nemo_bis IMHO it should be stated in the topic but as I said it seems EFNet doesn't care and I don't know it enough
06:48 πŸ”— Nemo_bis anyway, you should convince channels one by one or just start archiving the already-public logs (there is plenty, from FLOSS support channels for instance)
06:49 πŸ”— fgsfds Even if it's not public logging, a complete copy would probably be handy to someone in about 6-7 years if the service dies or changes.
07:01 πŸ”— SketchCow Where's my hug
07:01 πŸ”— SketchCow Again, I bought into insane travel
07:03 πŸ”— * joepie91 hugs SketchCow
07:03 πŸ”— joepie91 also, SketchCow, there's a question a bit more up that needs your attention
07:03 πŸ”— joepie91 approximately 3 hours and 15 minutes back
07:16 πŸ”— SketchCow What sort?
07:21 πŸ”— godane a guy told us he has tons of 4chan images
07:22 πŸ”— SketchCow I see we had another person go through the "oh, surely we can use windows"
07:23 πŸ”— SketchCow Ah, with a dash of "but what about cygwin"
07:25 πŸ”— SketchCow ryonaloli: How big is this collection? I can give you an FTP for it.
07:26 πŸ”— ryonaloli collection of 4chandata and gelbooru?
07:26 πŸ”— ryonaloli i think it's over 100gb, maybe more
07:26 πŸ”— SketchCow That's no problem.
07:26 πŸ”— SketchCow I have a few extra terabytes
07:27 πŸ”— ryonaloli do you use sftp?
07:27 πŸ”— SketchCow I have regular FTP. Will that work?
07:28 πŸ”— ryonaloli sure, but i'll be using Tor so the exit node will be able to view plaintext traffic (idk if that includes username and pass)
07:39 πŸ”— joepie91 SketchCow: what is the header to create a test collection on IA?
07:40 πŸ”— joepie91 er
07:40 πŸ”— joepie91 test item, sorry
07:42 πŸ”— SketchCow test can be the collection name I believe
07:43 πŸ”— joepie91 aha
07:44 πŸ”— joepie91 SketchCow: decided to try it out with the HTML5 uploader; https://ia801008.us.archive.org/30/items/TestingItemVwMin/TestingItemVwMin_meta.xml
07:44 πŸ”— joepie91 also that's a very noisy description editor
07:44 πŸ”— * joepie91 blinks
07:44 πŸ”— joepie91 oh never mind, it worked as intended
07:47 πŸ”— godane SketchCow: just want you to know i'm saving microsoft research presentations: https://archive.org/details/msrvideo.vo.msecnd.net-pdf-grab-103000-to-104000
07:47 πŸ”— godane there from a 3rd party CDN from what i can tell
10:04 πŸ”— DDG I've been away for some days... Is there a project to do right now or just nothing?
10:04 πŸ”— DDG I don't really mind if there isn't..
10:12 πŸ”— BiggieJon url team is running again as of this morning
10:13 πŸ”— DDG ah.. really?
10:13 πŸ”— DDG i'll start my warrior then
10:14 πŸ”— DDG i'm not at home, but i can run it
10:14 πŸ”— BiggieJon tracker is back up
10:14 πŸ”— DDG good
10:14 πŸ”— DDG I'll have a look at it
10:16 πŸ”— DDG BiggieJon: does it show anywhere on how much you downloaded?
10:17 πŸ”— BiggieJon http://urlteam.terrywri.st/
10:17 πŸ”— DDG yea, but it does show some nice stats
10:17 πŸ”— BiggieJon or do you mean your total bandwidth used ?
10:17 πŸ”— DDG but not the amount of GB?
10:17 πŸ”— DDG yea, that
10:17 πŸ”— BiggieJon you are running the virtualbox warrior in windows ?
10:18 πŸ”— DDG yes
10:18 πŸ”— BiggieJon do you have the web client opwn (http://localhost:8001/)
10:18 πŸ”— DDG yes
10:19 πŸ”— BiggieJon click on current project, at the bottom left it shows your bandwidth, curent and total
10:19 πŸ”— BiggieJon totals on top line abover graph
10:19 πŸ”— DDG ah
10:19 πŸ”— DDG thanks
10:27 πŸ”— godane can someone mirror this for me: http://blip.tv/projectlore
10:27 πŸ”— godane since alex on diggnation made it
10:27 πŸ”— godane and the website is gone now too
11:43 πŸ”— DDG well... my warrior seems to crash my windows every time it starts up...
11:43 πŸ”— DDG (read: BSO
11:43 πŸ”— DDG D
11:47 πŸ”— DDG i only managed to upload 1 task
11:47 πŸ”— DDG before it crashed all the time
12:15 πŸ”— DDG ehm... I'm getting 500 error messages in the console on doing a task.. (urlteam)
12:15 πŸ”— DDG does any one get this also?
12:21 πŸ”— odie5533 Are there any tasks to do right now?
12:21 πŸ”— ersi What crashes? Your windows installation? Or the warrior?
12:22 πŸ”— ersi odie5533: Yes, there are tasks available @ the urlteam project
12:23 πŸ”— balrog http://www.loopinsight.com/2013/10/28/lost-return-of-the-jedi-footage-discovered/?utm_source=loopinsight.com&utm_campaign=loopinsight.com&utm_medium=referral
12:24 πŸ”— odie5533 There was a clip on reddit recently of a wampa attacking the rebel hase in V
12:24 πŸ”— ersi Maybe this is something for #archiveteam-bs?
12:36 πŸ”— DDG ersi: it completely crashes my windows
12:36 πŸ”— DDG causing it to give a BSOD
12:37 πŸ”— DDG altough this is annoying... i'm also getting 500 errors on doing the urlteam task
12:38 πŸ”— DDG I can run the warrior for some minutes, but that the above happens
12:38 πŸ”— ersi What version of Windows do you have? And what version of VirtualBox do you have installed?
12:38 πŸ”— DDG ehm
12:38 πŸ”— DDG 6.1 ( windows 7)
12:39 πŸ”— DDG and virtualbox 4.3 afais
12:39 πŸ”— DDG this computers is using an AMD GPU and CPU
12:39 πŸ”— DDG I heard from someone else this might causes this kind of things to happen?
12:42 πŸ”— ersi What? That AMD GPU and CPU's aren't supported? Sounds like either a miscommunication or a gross misconception
12:43 πŸ”— DDG well, I run the same thing at home, with Intel stuff tough, and there it just runs fine without crashing my windows
12:44 πŸ”— ersi I'd give downgrading to VirtualBox 4.2.X a try on the machine that it continues to crash on.
12:47 πŸ”— DDG ah ok, it just crashed again
12:48 πŸ”— DDG I'll give it a try
14:13 πŸ”— DDG ersi: I'm getting error 500 constantly.. Do you have idea what could be causing it?
14:16 πŸ”— DDG anyone else maybe?
14:25 πŸ”— ersi Does it only say "HTTP 500" or does it actually say something more?
14:52 πŸ”— GLaDOS DDG: I have no idea why you'd be getting 500. It's running smoothl..
14:52 πŸ”— GLaDOS smoothl?
14:52 πŸ”— GLaDOS smoothl!
14:53 πŸ”— DDG mhm ok
14:53 πŸ”— DDG it works fine now
14:54 πŸ”— DDG but no tasks coming now...
14:55 πŸ”— DDG 2013-10-28 14:55:20,440 tinyback.Tracker INFO: No tasks available <- :S
15:07 πŸ”— ersi GLaDOS: I think the problem is that he has claimed tasks. Try doing a search for his IP in the tasks DB and clear his claims
15:08 πŸ”— GLaDOS DDG: IP?
15:08 πŸ”— ersi I think they time out after a while (can't remember) - but it's probably a pretty long while
15:08 πŸ”— GLaDOS Also, time out takes 30 minutes
15:08 πŸ”— DDG what do you mean with IP?
15:08 πŸ”— DDG my adress?
15:08 πŸ”— GLaDOS Yeah
15:08 πŸ”— DDG I'll pm it, is that ok?
15:09 πŸ”— GLaDOS Yeah, thats fine
15:09 πŸ”— GLaDOS EVERYONE, HIS IP IS 127.183.59.34
15:10 πŸ”— ersi Heh
15:10 πŸ”— DDG lol
15:10 πŸ”— DDG but my warrior has done nothing
15:10 πŸ”— DDG the last 30 mins
15:10 πŸ”— DDG at least
15:14 πŸ”— GLaDOS DDG: try again
15:14 πŸ”— DDG i'll reboot the thing then
15:14 πŸ”— ersi No need to reboot it though, it'll do request ever so often
15:15 πŸ”— DDG ever so often = how long?
15:16 πŸ”— DDG just curious
15:16 πŸ”— GLaDOS I think every 300 seconds?
15:16 πŸ”— GLaDOS or 60 maybe
15:16 πŸ”— GLaDOS It says
15:17 πŸ”— DDG strange, it was inactive for me atleast 30 mins....
15:18 πŸ”— ersi As in every few seconds (can't remember detail but 10-300s) it'll contact the tracker to request new work.
15:20 πŸ”— DDG but after a few requests I did it stopped for some strange reason (I didn't touched the thing at all)
15:20 πŸ”— DDG well it works again now
15:38 πŸ”— WiK so...i started cloning repos from bitbucket as well as github... does anyone have a list of bitbucket users?
15:39 πŸ”— WiK i started with a seedlist of 20 users from google using site:bitbucket.org
15:39 πŸ”— WiK and now have a list of 35k users
15:39 πŸ”— WiK just spidering their followers and who they are following
15:49 πŸ”— ersi Whoah :)
15:51 πŸ”— kyan__ SketchCow, pinged you with this a couple days ago, but don't think you saw it Ҁ” writing again: Here's a list of web archives I've uploaded that aren't in the right collections yet (WikiTeam and Archive Team) yetҀ¦ I'd appreciate if you could move them. Thanks. :D http://hastebin.com/raw/hejuvokoru
15:51 πŸ”— kyan__ Wow, that is a going to beҀ¦ a lot of data XD
15:52 πŸ”— GLaDOS kyan__: try emailing him. It works better.
15:52 πŸ”— GLaDOS (jscott@archive.org)
15:52 πŸ”— kyan__ GLaDOS, ah ok. Sounds like a plan. Thanks :D
16:12 πŸ”— SketchCow kyan__: Done
16:12 πŸ”— SketchCow But just luck, it's much better to mail me.
16:12 πŸ”— kyan__ SketchCow, Ok, sounds good. :) Thanks!
17:03 πŸ”— DFJustin it would be nice if IA had more anonymity for uploaders, to match what they're doing with reader privacy
17:34 πŸ”— phillipsj wait 127.x.x.x is local loop-back (or was that the joke?)
18:36 πŸ”— DDG phillipsj: to answer your question: that was the joke ;)
20:27 πŸ”— balrog anyone here in .jp / has access to an IP there?
20:27 πŸ”— balrog apparently http://www.emulation9.com/emulators/ (and the whole domain) is blocked elsewhere
20:27 πŸ”— Stary2001 balrog, rdns?
20:27 πŸ”— Stary2001 or ip blocks
20:27 πŸ”— Stary2001 because nobody said you had to own the domain you're rdnsing to :)
20:28 πŸ”— balrog not sure...
20:32 πŸ”— ersi joepie knew some Japanese, who visited this chan
20:40 πŸ”— DFJustin another website like that is http://www.alicesoft.com/
20:48 πŸ”— balrog that one's blocked from IA
22:37 πŸ”— DDG http://urlteam.terrywri.st/ <- and it's down again..
22:38 πŸ”— DDG oh well
22:38 πŸ”— DDG I have to go for now

irclogger-viewer